FQDN Egress Control in Kubernetes

Author: Joshua Fox

Bio: Joshua Fox has been a software architect in innovative technology companies for 20 years. Now at DoiT International, he advises tech startups and growth companies about Google Cloud and AWS. He is a Google Developer Expert, a certified Kubernetes Certified Administrator and has eight certifications on Google Cloud and AWS.

Future standards and current solutions

When you build a secure application, you often deny it permission to connect out of its Virtual Private Cloud (VPC). But in some cases, you may need to open it up, for example, if a third-party API needs to be invoked by your application – and you may want to allow only some parts of the application to make that invocation. For example, you may want to allow egress only to a list of domains, and only from pods that are in a certain Kubernetes namespace, or that have certain labels.

The usual way to control egress is to allow connections only to the relevant IP addresses: You might accomplish that with the Firewall on Google Cloud or with Network ACLs on AWS.

Yet in practice, you don’t want to define access by IP address: You want to allow access to stable, publicly-known endpoints which are Fully Qualified Domain Names (FQDN), say api.example.com. Since the IP addresses associated with the FQDN can change over time, the normal IP-based approach won’t do it.

In this article, I will review several ways to permit egress only to specific FQDNs, with the advantages and disadvantages of each approach. We will look for options that are inexpensive, easy to maintain, robust, and simple.

Different Approaches to FQDN Egress Control

Some approaches work on the VPC layer; while others are Kubernetes-aware, allowing control to follow the lines of your application. For example, you may have a microservice which runs on certain pods and is responsible for invoking external APIs; you want to allow egress only to these. At the same time, following the principle of least permissions, you want to deny all access to other pods.

The options I review here include:

VPC-level solutions

Self-managed proxies like Squid.
AWS Network Firewall, with deep packet inspection, which allows control to the level of the domain, even with virtual hosting.
A new feature in Google Firewall, which gives FQDN control with the convenience of a serverless service
The new Google Secure Web Proxy, which adds granular control down to the level of the domain and even the URL.

Kubernetes-aware solutions

Cilium, an open-source network layer that can be plugged into Kubernetes
Istio, an open-source service mesh.
An upcoming standard that will make FQDN Egress control into a supported feature of every Kubernetes cluster.

(A note on OSI Layers: Because Layer 7, the “application layer” of the OSI network model, is the layer that includes domain names and DNS, FQDN Egress Control is sometimes called “Layer 7 Egress Control.” In contrast, Layer 3 is where IP addresses are defined; it is controlled in typical firewalls or Network ACLs.)

Implementing FQDN Egress Control

Let’s consider how you might implement this egress control yourself. I don’t recommend doing that, given the fine solutions discussed below, but it can clarify how these solutions work under the hood.

Even though the configuration specifies the FQDN of the third-party API, the domain names are only relevant before the connection is attempted, when the client looks up the IP address based on the domain name. From that point on, all network traffic uses IP addresses, so our solution still needs to block packets on the level of the IP address.

First, you configure the Firewall or Network ACL to block all traffic; only relevant IP addresses will be opened up.

You write and deploy an application that checks DNS periodically to find the current IP address or addresses for the API’s FQDN, api.example.com. A good choice for running this at low cost is GCP Cloud Functions, or AWS Lambda, on a periodic trigger. Your application then updates the Firewall or Network ACL to allow egress to these IP addresses. In the rare cases that the IP address has changed, your application then changes the configuration of the Firewall or Network ACL.

Many to one, one to many

A single FQDN can correspond to multiple IP addresses. This is not a problem for the implementation, as a normal DNS lookup will return multiple addresses as needed, and so configuring egress to that domain will include all these IP addresses.

Conversely, a single IP address can front multiple domains, as with name-based virtual hosting, so just converting the domain name to IP addresses and allowing access to those might not be granular enough to distinguish calls to various domains. Below, I will mention a few solutions that address that case as well.

Self-managed reverse web proxies

One classic solution to FQDN Egress Control is a reverse web proxy. There are many options; Squid is the best-known open-source solution. You use the routing service of your cloud to direct all outbound traffic through a VM that runs the Squid proxy; this then checks that the IP address matches the domain name — as set in an ACL whitelist — and proxies it forward if there is a match.

Squid is available in the AWS Marketplace; see this architecture discussion. It is also available in the GCP Marketplace; see this networking setup. See also this discussion of FQDN egress control in Squid and other proxies, like DiscrimiNAT and Aviatrix.

Proxies like Squid running on a VM carry a maintenance overhead, for example, in upgrading VM operating systems and dealing with crashes. Because the entire flow of network traffic passes through one VM (unless you go to the additional effort of setting up a load-balanced deployment), the load can be heavy, which endangers robustness. This may require the expense of a larger VM running 24×7–even when not fully needed.

AWS Network Firewall

AWS Network Firewall does deep packet inspection and so gains more filtering power. This means that it is not strictly comparable to Google Firewall, which more closely resembles AWS Network ACLs.

AWS Network Firewall supports FQDN Egress Control using stateful domain list rule groups. Using the Server Name Indicator (SNI) sent in negotiating a TCP connection for HTTPS traffic, it can distinguish between domains in virtual hosting scenarios.

Network Firewall can be integrated with Route 53 DNS Firewall, which blocks DNS resolution attempts, so that, for example, a DNS query for api.example.com from an application inside the VPC does not resolve to an IP address. But the DNS Firewall does not actually prevent access to that IP address; this is provided by Network Firewall.

Network Firewall is a good choice, but can get expensive: it is intended for complex multi-network enterprise environments. (See my article on the DoiT blog comparing the use cases for the many firewall-like services on AWS.)

Google Firewall

It’s easier to use a solution that is fully managed by the cloud provider, rather than running it on a VM that you manage yourself. Google is just now coming out with some solutions to make this happen.

Google Firewall recently got a limited preview release of an FQDN Objects feature. It uses Cloud DNS every 30 seconds to look up the current IP address for the outside service.

Secure Web Proxy

Yet another limited preview service allows control on the domain level: Secure Web Proxy (known through Feb. 2023 as Secure Web Gateway). You give it access to your SSL certificates in the GCP Certificate Manager, so it can decrypt/encrypt your HTTPS traffic. This deep access gives it the power to control egress on the granularity of the domain, for example, in virtual hosting scenarios where a single IP address can expose multiple domains. Because it sees the full HTTP request, it goes further and allows you control on the level of the URL.

Cilium

The above solutions work on the level of the VPC. But Kubernetes applications benefit from the fine-grained control allowed by the Kubernetes object model, and you can achieve this for FQDN egress control as well.

For this, you can use the eBPF-based Cilium network layer, which we wrote about on the DoiT blog (part 1; part 2). With a Cilium network policy, a Custom Resource Definition (CRD) that resolves domain names to IP addresses, you can block or allow traffic on the Cilium network layer. The CRD is Kubernetes-aware so that you can distinguish pods by namespace or label: Some that are allowed access to the external API, some that are not.

apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
  name: "fqdn"
spec:
  endpointSelector:
    matchLabels:
     group: api-invocation
egress:
- toFQDNs:
  - matchPattern: "*.doit.com"

Another CRD, Cilium Clusterwide Network Policy, does the same, but its configuration applies cross-namespace, to the entire cluster; you still have the granularity provided by labels.

This approach adds some complexity to your cluster because of the need for the additional Cilium network layer. At some time soon, as stated in Cilium docs, “all of the functionality will be merged into the standard resource format [see below] and this CRD will no longer be required.”

Istio

The most featureful solution is provided by the Istio service mesh. Istio fully controls traffic, allowing you to block all egress (set meshConfig.outboundTrafficPolicy.mode to REGISTRY_ONLY). You then allow egress to specified domains only using a ServiceEntry.

apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
  name: external-svc-https
spec:
  hosts:
  - api.doit.com
  location: MESH_EXTERNAL
  ports:
  - number: 443
  name: https
  protocol: TLS
 resolution: DNS

When you specify resolution: DNS, you ask Istio not to rely on the IP address that the client in a pod) is connecting to, but rather to periodically resolve the domain name using DNS. The Istio-service can be exposed to just those namespaces that you choose.

Istio gives you the greatest control, but also adds more complexity than Cilium, with the full power and functionality of a service mesh layer.

Future FQDN Standards for Kubernetes

Network policies are a part of the Kubernetes standard, but these allow control only on the IP level, not FQDN. This means you have to reach for a plug-in solution like Cilum or Istio. Kubernetes standards are researched, discussed, and finalized, by a collection of Special Interest Groups, and in particular, the Kubernetes Networking Special Interest Group is now working on this feature.

Once FQDN functionality is added to the network policy standard, and then implemented by Kubernetes providers, any compliant Kubernetes cluster will have the feature out-of-the-box.

Which egress control option should you choose?

You have a range of options: The battle-tested Squid or other reverse web proxy; the new managed services, such as Google Firewall FQDN Objects, AWS Network Firewall, and Google Secure Web Proxy

But if you want the solution to distinguish pods by Kubernetes namespace or label, the simplest solution is a Cilium CRD; adopt Istio if other parts of the service mesh functionality are valuable to you. This will provide the needed functionality, but once the Kubernetes-native standard comes out, go with that.