I finally started migrating the cluster from the now deprecated Nginx Ingress Controller to Gateway API. There are several robust options to pick from, but I settled on Envoy. I did not want to have any downtime, so I needed to be careful about this migration and do it in small increments. After some reading and exploration, I came up with a plan of action:

  1. Deploy the Envoy Gateway and configure an external Envoy Proxy
  2. Add an HTTPRoute for a single hosted sub-domain, alongside the Ingress
  3. Switch cloudflared routing from the ingress controller to the proxy for the sub-domain
  4. Verify
  5. Repeat steps 2-4 for each deployed app
  6. Clean up

1. Deploy the Envoy Gateway and Envoy Proxy

This is divided into multiple sub-steps, and I’ll explain them as I go. Also note that I override the image locations to use the GitHub mirror, since it is superior to Docker Hub in virtually all ways.

Deploy Envoy Gateway helm chart as per the instructions. I used a pretty simple configuration, and I install it in the networking namespace.

helm install eg oci://mirror.gcr.io/envoyproxy/gateway-helm --version v1.6.3 -n networking -f values.yaml

With the content of values.yaml being:

# values.yaml
config:
  envoyGateway:
    provider:
      type: Kubernetes
      kubernetes:
        deploy:
          type: GatewayNamespace
global:
  imageRegistry: mirror.gcr.io

Deploy a basic EnvoyProxy. I configured it with some reasonable limits and two replicas to give some basic redundancy.

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
  name: envoy
  namespace: networking
spec:
  logging:
    level:
      default: info
  provider:
    type: Kubernetes
    kubernetes:
      envoyDeployment:
        replicas: 2
        container:
          imageRepository: mirror.gcr.io/envoyproxy/envoy
          resources:
            requests:
              cpu: 100m
            limits:
              memory: 1Gi
  shutdown:
    drainTimeout: 180s
  telemetry:
    metrics:
      prometheus:
        compression:
          type: Gzip

Deploy a GatewayClass. Not much of interest to cover here - again, pretty simple.

apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: envoy
  namespace: networking
spec:
  controllerName: gateway.envoyproxy.io/gatewayclass-controller
  parametersRef:
    group: gateway.envoyproxy.io
    kind: EnvoyProxy
    name: envoy
    namespace: networking

Deploy BackendTrafficPolicy configured to allow various compression types, and disable request timeouts.

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  name: envoy
  namespace: networking
spec:
  targetSelectors:
    - group: gateway.networking.k8s.io
      kind: Gateway
  compression:
    - type: Brotli
    - type: Gzip
    - type: Zstd
  tcpKeepalive: {}
  timeout:
    http:
      requestTimeout: 0s

Deploy ClientTrafficPolicy configured to send the X-Forwarded-For header for your internal trusted CIDRs. I use FluxCD’s variable replacement here. This also configures some basic tls stuff.

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: ClientTrafficPolicy
metadata:
  name: envoy
  namespace: networking
spec:
  clientIPDetection:
    xForwardedFor:
      trustedCIDRs:
        - ${TRUSTED_INTERNAL_CIDRS}
  http2:
    onInvalidMessage: TerminateStream
  http3: {}
  targetSelectors:
    - group: gateway.networking.k8s.io
      kind: Gateway
  tcpKeepalive: {}
  tls:
    minVersion: "1.2"
    alpnProtocols:
      - h2
      - http/1.1

Deploy a Gateway for external traffic. I use Cilium and ask it for a static IP. You can make curl requests directly against this address to verify that routing is happening correctly. I’ll show an example later.

# yaml-language-server: $schema=https://kube-schemas.pages.dev/gateway.networking.k8s.io/gateway_v1.json
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: envoy-external
  namespace: networking
spec:
  gatewayClassName: envoy
  infrastructure:
    annotations:
      lbipam.cilium.io/ips: 192.168.1.100
  listeners:
    # The http listener will only allow routes in the networking namespace, to allow our https redirect route
    - name: http
      protocol: HTTP
      port: 80
      allowedRoutes:
        namespaces:
          from: Same
    # The https listener will allow routes from all namespaces
    - name: https
      protocol: HTTPS
      port: 443
      allowedRoutes:
        namespaces:
          from: All
      tls:
        certificateRefs:
          # include a list of certs for all domains/sub-domains going through this gateway - the gateway will figure out the mapping
          - kind: Secret
            name: default-cert

Deploy a global redirect HTTPRoute for http to https. This is optional, but we should always be using https.

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: https-redirect
  namespace: networking
  annotations:
    external-dns.alpha.kubernetes.io/controller: none
spec:
  parentRefs:
    - name: envoy-external
      namespace: networking
      sectionName: http
    - name: envoy-internal
      namespace: networking
      sectionName: http
  rules:
    - filters:
        - type: RequestRedirect
          requestRedirect:
            scheme: https
            statusCode: 301

2. Add HTTPRoute for Single Sub-Domain

Create an HTTPRoute for the sub-domain of one of the apps.

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: app1-route
  namespace: default
spec:
  parentRefs:
  - name: envoy-external
    namespace: networking
  hostnames:
  - app1.example.com
  rules:
  - backendRefs:
    - name: garage-cluster
      port: 3902

As I mentioned, the route itself can be tested directly against the gateway by using curl like so (recall the static IP defined earlier):

curl -v -k -H 'Host: app1.example.com' https://192.168.1.100

If the gateway is working correctly, then this will hit the underlying service, and will show up in the gateway log.

3. Switch Cloudflared Routing

When I started, my cloudflared was pretty simple. I define the origin server as a global default. Then I forwarded the root domain, and all subdomains to my ingress controller:

originRequest:
  originServerName: "external.example.com"

ingress:
  - hostname: "example.com"
    service: https://external-ingress-nginx-controller.networking.svc.cluster.local:443

  - hostname: "*.example.com"
    service: https://external-ingress-nginx-controller.networking.svc.cluster.local:443

To add an override, just add a new rule to the top to route a sub-domain to the gateway.

originRequest:
  originServerName: "external.example.com"

ingress:
  - hostname: "app1.example.com"
    service: https://envoy-external.networking.svc.cluster.local
    originRequest:
      http2Origin: true
      keepAliveTimeout: 3600s
      tcpKeepAlive: 7200s

  - hostname: "example.com"
    service: https://external-ingress-nginx-controller.networking.svc.cluster.local:443

  - hostname: "*cloud*.example.com"
    service: https://external-ingress-nginx-controller.networking.svc.cluster.local:443

4. Verify

Verifying is as easy as curling the new domain and tailing the logs of the gateway.

Watch the logs for the proxy:

kubectl logs -n networking -l gateway.envoyproxy.io/owning-gateway-name=envoy-external --all-containers=true -f

And in a separate shell:

curl -v https://app1.example.com

Clean-Up

After it was all said and done I simplified the cloudflared routing rules again to have a wildcard for all sub-domains. And then went back and removed all of the lingering Ingress definitions. It all went quite smoothly, with only a few hiccups when I occassionally mis-specified a service name in an HTTPRoute.