Extending Kubernetes With Service Mesh

In this article, we talk with Andrew Jenkins, CTO at Aspen Mesh, about how service mesh can extend Kubernetes to even better manage microservice architectures. For more on service mesh, consider attending KubeCon + CloudNativeCon EU, November 18-21, 2019 in San Diego.

Q: Kubernetes has taken the enterprise by storm. Companies continue to evaluate and adopt Kubernetes to solve containerized application challenges. In your mind, what has been the catalyst behind this growth?

A: It’s a combination of a well-crafted API and a good controller architecture. Systems like Kubernetes face a tradeoff between extensibility and opinionated-ness. Enterprises want their platforms to be extensible, but not so far to the extreme that they’re really just an extension API with no “core” functionality that you can rely on. Enterprises want their platforms to have some opinion, especially if it matches their own, but not so much that the enterprise can’t sprinkle in some of their own choices, maybe to incorporate legacy systems or maybe to address compliance.

So it feels like Kubernetes has opinion in all the right places. For example, Pods are core, and you can’t really say you have Kubernetes if you don’t adhere to the Pod concept. But if you want to layer on top of Pods – how they’re scheduled or upgraded, who’s allowed to define them, how they’re replicated for scale – you have a lot of flexibility there.

Q: How do you think Kubernetes and other cloud native technologies are driving enterprise architecture modernization and the move to massively distributed systems?

A: Kubernetes is lingua franca for how you run containerized apps. And containerization is the path for architecture modernization because it shrinks gaps between dev and prod. Deployment gaps, design gaps, quality gaps, time-to-delivery gaps. Enterprises need to shrink those gaps to compete effectively. They need to focus engineer time on business value. To me, Kubernetes and related cloud-native tech is a great enabler for that refocusing.

Q: Kubernetes solves many of the build and deploy challenges of containers. What does it leave unsolved?

A: Kubernetes itself does a great job of implementing a description of what containers you want running, how many, how you want to group them into services. Containers are supposed to change more than just deploy-time semantics, though. Hopefully, they reach further and further back on the development chain in your organization, eventually all the way to changing the way you design new features – favoring smaller microservices and rapid development.

So we see organizations (including ourselves) exploring different techniques to test and deliver software (progressive delivery, mirroring, experimentation in production or semi-prod). We see evolutions around quantifying system health as an input to budgeted disruption/SRE. The mindset around security is changing to account for the common pattern that Kubernetes-based apps depend on data from other services often residing outside of Kubernetes.

Q: How does service mesh address those gaps?

A: I think the dataplane proxy part of service mesh is a good place to implement the functionality to support these new approaches. For instance, on the security aspect, the service mesh dataplane should implement the authentication and encryption using mutual TLS.

Above the dataplane, there’s space for a Kubernetes-style declarative control plane formed of controllers with different responsibilities. Extending the security example, a simple deployment may just want to get up-and-going quickly with a self-signed SPIFFE trust chain. They want to know their traffic is encrypted and authenticated to the workload. A more extensive deployment may already have certificate bundles for other internal or semi-internal services, they want fine-grained control over egress traffic. But both deployments use the same service mesh dataplane.

It seems like there’s emerging consensus on the functionality expected from the dataplane component: common TLS implementation, HTTP/gRPC layer 7 protocol support for advanced routing, circuit breaking. The use cases are still evolving…

Q: There are many different tools and methods that can address things like load balancing, service discovery, canary testing and cluster security. Why pick service mesh instead of an API gateway or APM tool?

A: There can be some overlap with similar tech like API gateways and APM tools, but my perspective is that service mesh is especially great as a transparently-injected measurement and enforcement point.

A service mesh is well-suited for service-to-service (“east-west”) traffic – they don’t have to opt in to using it, and also don’t get to opt out (subject to platform controls), which opens up security use cases around service-to-service authentication and authorization. Also, a service mesh can close the loop beyond just measurement – once you’ve identified a performance or health problem you can take action to mitigate or correct it with the service mesh. This means you see a problem, understand a problem, make a modification without having to touch many different systems.

Q: What are some of the top use cases you are seeing service mesh used for?

A: First, a consistent approach to encrypting service-to-service communication that spans clusters and organization domains: mutual TLS and workload-based security. I think of it as a network security “easy button” – you get a single TLS stack with all the features you need; the operations and lifecycle get much simpler (one upgrade if there’s a CVE; one config option for cert rotation, etc.).

Next, not to be overlooked, is an at-a-glance view of what services are communicating to what services, for what URLs, and how healthy that is. We find users that have had fits and starts at building or buying this kind of visibility and when we show a service mesh that can grab this information everywhere without app modification, it’s like a fulfillment of a vision that they’ve always had, they’ve made incremental progress towards, but now they get it in one fell swoop.

Finally, all the advanced L7 routing and resiliency stuff used to build canaries and progressive delivery. Very powerful; we see a lot of users moving in this direction but not totally clear yet exactly what they want out of this layer and experimenting with different approaches, sometimes on a per-app basis. (Incidentally, that’s a good fit for service mesh dataplane as the enforcement point, with different controller layers on top)

Q: What do you see for the future of service mesh and Kubernetes?

A: First I’ll borrow from Janet Kuo and say I hope the future of Kubernetes itself is boring. It does what it does and does it very well. Extensibility and the surrounding ecosystem is the future for Kubernetes. Wrangling large clusters and large numbers of clusters, multi-tenancy, balancing hardware acceleration or isolation against universality.

For service mesh, there’s a lot to come. I think starting soon we’re going to see multiple personas interacting with one service mesh. Up until now, the focus has been on platform teams. But I think developers are going to use service mesh to help debugging, quality engineers are going to use it for fault injection and failure reproductions, API architects and release managers will rely on service mesh for novel delivery/lifecycle approaches.

To learn more about containerized infrastructure and cloud native technologies, consider coming to KubeCon + CloudNativeCon NA, November 18-21 in San Diego.

Extending Kubernetes With Service Mesh

With Kubernetes, Ceph provides a near-infinite capacity for sustainable growth

DXC Technology Launches New Cloud Management Solution; Acquires Bluleader

With Kubernetes, Ceph provides a near-infinite capacity for sustainable growth

DXC Technology Launches New Cloud Management Solution; Acquires Bluleader

You may also like

Why AI Agents Fail in Production Without Trusted Telemetry | Shahar Azulay, groundcover | TFiR

Why OpenTelemetry Is Now the Foundation for AI and Cloud Observability | Chris Aniszczyk, CNCF | TFiR

Why HA Health Checks Fail as Clusters Grow | Trey Isaac, SIOS Technology | TFiR

Why Cloud Development Feedback Loops Fail and How to Fix Them | Waldemar Hummer, LocalStack | TFiR

How Kubernetes 1.36 Handles GPU Scheduling, DRA, and Kubelet Security | Ryota Sawada, Kubernetes | TFiR

Your HA Backup System Has Hidden Gaps — SIOS Technology’s Trey Isaac Explains How to Find Them | TFiR