Migrating to Gateway API
I finally started migrating the cluster from the now deprecated Nginx Ingress Controller to Gateway API. There are several robust options to pick from, but I settled on Envoy. I did not want to have any downtime, so I needed to be careful about this migration and do it in small increments. After some reading and exploration, I came up with a plan of action:
- Deploy the Envoy Gateway and configure an external Envoy Proxy
- Add an HTTPRoute for a single hosted sub-domain, alongside the Ingress
- Switch cloudflared routing from the ingress controller to the proxy for the sub-domain
- Verify
- Repeat steps 2-4 for each deployed app
- Clean up
1. Deploy the Envoy Gateway and Envoy Proxy
This is divided into multiple sub-steps, and I’ll explain them as I go. Also note that I override the image locations to use the GitHub mirror, since it is superior to Docker Hub in virtually all ways.
Deploy Envoy Gateway helm chart as per the instructions. I used a pretty simple configuration, and I install it in the networking namespace.
helm install eg oci://mirror.gcr.io/envoyproxy/gateway-helm --version v1.6.3 -n networking -f values.yaml
With the content of values.yaml being:
# values.yaml
config:
envoyGateway:
provider:
type: Kubernetes
kubernetes:
deploy:
type: GatewayNamespace
global:
imageRegistry: mirror.gcr.io
Deploy a basic EnvoyProxy. I configured it with some reasonable limits and two replicas to give some basic redundancy.
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
name: envoy
namespace: networking
spec:
logging:
level:
default: info
provider:
type: Kubernetes
kubernetes:
envoyDeployment:
replicas: 2
container:
imageRepository: mirror.gcr.io/envoyproxy/envoy
resources:
requests:
cpu: 100m
limits:
memory: 1Gi
shutdown:
drainTimeout: 180s
telemetry:
metrics:
prometheus:
compression:
type: Gzip
Deploy a GatewayClass. Not much of interest to cover here - again, pretty simple.
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
name: envoy
namespace: networking
spec:
controllerName: gateway.envoyproxy.io/gatewayclass-controller
parametersRef:
group: gateway.envoyproxy.io
kind: EnvoyProxy
name: envoy
namespace: networking
Deploy BackendTrafficPolicy configured to allow various compression types, and disable request timeouts.
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
name: envoy
namespace: networking
spec:
targetSelectors:
- group: gateway.networking.k8s.io
kind: Gateway
compression:
- type: Brotli
- type: Gzip
- type: Zstd
tcpKeepalive: {}
timeout:
http:
requestTimeout: 0s
Deploy ClientTrafficPolicy configured to send the X-Forwarded-For header for your internal trusted CIDRs. I use FluxCD’s variable replacement here. This also configures some basic tls stuff.
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: ClientTrafficPolicy
metadata:
name: envoy
namespace: networking
spec:
clientIPDetection:
xForwardedFor:
trustedCIDRs:
- ${TRUSTED_INTERNAL_CIDRS}
http2:
onInvalidMessage: TerminateStream
http3: {}
targetSelectors:
- group: gateway.networking.k8s.io
kind: Gateway
tcpKeepalive: {}
tls:
minVersion: "1.2"
alpnProtocols:
- h2
- http/1.1
Deploy a Gateway for external traffic. I use Cilium and ask it for a static IP. You can make curl requests directly against this address to verify that routing is happening correctly. I’ll show an example later.
# yaml-language-server: $schema=https://kube-schemas.pages.dev/gateway.networking.k8s.io/gateway_v1.json
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: envoy-external
namespace: networking
spec:
gatewayClassName: envoy
infrastructure:
annotations:
lbipam.cilium.io/ips: 192.168.1.100
listeners:
# The http listener will only allow routes in the networking namespace, to allow our https redirect route
- name: http
protocol: HTTP
port: 80
allowedRoutes:
namespaces:
from: Same
# The https listener will allow routes from all namespaces
- name: https
protocol: HTTPS
port: 443
allowedRoutes:
namespaces:
from: All
tls:
certificateRefs:
# include a list of certs for all domains/sub-domains going through this gateway - the gateway will figure out the mapping
- kind: Secret
name: default-cert
Deploy a global redirect HTTPRoute for http to https. This is optional, but we should always be using https.
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: https-redirect
namespace: networking
annotations:
external-dns.alpha.kubernetes.io/controller: none
spec:
parentRefs:
- name: envoy-external
namespace: networking
sectionName: http
- name: envoy-internal
namespace: networking
sectionName: http
rules:
- filters:
- type: RequestRedirect
requestRedirect:
scheme: https
statusCode: 301
2. Add HTTPRoute for Single Sub-Domain
Create an HTTPRoute for the sub-domain of one of the apps.
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: app1-route
namespace: default
spec:
parentRefs:
- name: envoy-external
namespace: networking
hostnames:
- app1.example.com
rules:
- backendRefs:
- name: garage-cluster
port: 3902
As I mentioned, the route itself can be tested directly against the gateway by using curl like so (recall the static IP defined earlier):
curl -v -k -H 'Host: app1.example.com' https://192.168.1.100
If the gateway is working correctly, then this will hit the underlying service, and will show up in the gateway log.
3. Switch Cloudflared Routing
When I started, my cloudflared was pretty simple. I define the origin server as a global default. Then I forwarded the root domain, and all subdomains to my ingress controller:
originRequest:
originServerName: "external.example.com"
ingress:
- hostname: "example.com"
service: https://external-ingress-nginx-controller.networking.svc.cluster.local:443
- hostname: "*.example.com"
service: https://external-ingress-nginx-controller.networking.svc.cluster.local:443
To add an override, just add a new rule to the top to route a sub-domain to the gateway.
originRequest:
originServerName: "external.example.com"
ingress:
- hostname: "app1.example.com"
service: https://envoy-external.networking.svc.cluster.local
originRequest:
http2Origin: true
keepAliveTimeout: 3600s
tcpKeepAlive: 7200s
- hostname: "example.com"
service: https://external-ingress-nginx-controller.networking.svc.cluster.local:443
- hostname: "*cloud*.example.com"
service: https://external-ingress-nginx-controller.networking.svc.cluster.local:443
4. Verify
Verifying is as easy as curling the new domain and tailing the logs of the gateway.
Watch the logs for the proxy:
kubectl logs -n networking -l gateway.envoyproxy.io/owning-gateway-name=envoy-external --all-containers=true -f
And in a separate shell:
curl -v https://app1.example.com
Clean-Up
After it was all said and done I simplified the cloudflared routing rules again to have a wildcard for all sub-domains. And then went back and removed all of the lingering Ingress definitions. It all went quite smoothly, with only a few hiccups when I occassionally mis-specified a service name in an HTTPRoute.