If you’ve administrated or used shared Kubernetes clusters, you probably have some stories and possibly some scars from that process. The initial Kubernetes design was pretty minimal when it came to multi-tenancy (Role-based Access Control wasn’t even GA until Kubernetes 1.8), and while things have improved some over time, there are still many challenges concerning multi-tenancy.
Let’s start off by looking at some of the challenges with operating shared clusters, and later we’ll share some open source tools that can help.
The Multi-tenancy models are both lacking
Typically teams use one of two models when deciding how to share Kubernetes clusters. The first is namespace-based isolation, where teams operate in shared clusters and are restricted to one or more namespaces. The second option is what we’ll refer to as cluster-based isolation, where teams or even individuals have their own clusters that are not shared with other tenants. Both of these approaches have advantages and limitations.
With namespace-based isolation, tenants share clusters, which cuts down on the number of clusters admins have to manage. Tenants are isolated to their namespaces using tools like Role-based Access Control (RBAC) and network policies. For some applications, this approach may work fine, but it falls down when the teams need to be able to manage global resources, which exist outside of a namespace. If your team has global objects like Custom Resource Definitions (CRDs) that accompany its applications, then you need to rely on people with more access in the cluster to manage those.
Using cluster-based isolation has its own pitfalls. While it can avoid some of the headaches of trying to lock down a shared cluster, it can create other headaches. Cluster sprawl is a big problem for many teams. It makes it harder to manage environments and creates additional costs. As the number of clusters increases, it can be harder to keep track of what’s actually being used, which means that resources are wasted, which can even impact our planet.
Choosing between these two models can be like picking your poison. Neither are ideal for many use cases.
Isolation is difficult
If your organization uses cluster-based isolation, you may run into difficulties isolating workloads. Two of the primary tools for isolation are RBAC and network policies.
RBAC in Kubernetes is powerful but can also be complicated to manage. Teams in shared clusters may have many roles and role bindings to manage. It can also be difficult for the cluster administrators to know which permissions specific applications need. Many organizations would like to operate based on the principle of least privilege, but in some cases even the application developers may not know which APIs their apps need access to. This can result in lots of trial and error to create the appropriate permissions.
Network policies share a lot of the same issues as managing RBAC, and the difficulty increases for more complex Kubernetes networking setups.
If hard multi tenancy is required where teams not from the same organization are working alongside each but but users from different companies are operating software in a shared Kubernetes cluster, things get even trickier. For more detailed information on hard vs soft multi tenancy and how to tackle isolation and access control topics in either environment, Daniel Thiry’s “Kubernetes Multi-Tenancy Best Practices Guide” is definitely worth checking out.
Controlling costs and resources is also hard
Managing costs and resources are likely to be significant challenges, whichever multi-tenancy model you adopt. There’s no built-in tooling for managing Kubernetes costs and it’s not something you’ll want to roll yourself. Also, many clusters run in one of the major cloud providers, and there are entire companies built on the fact that cloud provider bills can be inscrutable.
Managing resources is made more complex by the fact that quotas for resources like CPU and memory can only be assigned for individual namespaces. There’s no way to set an overall quota for the resources that a user or team can consume in a cluster.
And if your organization is using cluster-based isolation, you have the additional complexity of many more clusters to manage both costs and resources for. More clusters, more problems.
Open source tools that can help
There are a number of tools that can help with various aspects of multi-tenancy pain, but here are a few we recommend.
vcluster is a tool we open sourced at Loft Labs to allow anyone to create virtual Kubernetes clusters with ease. A virtual cluster runs inside of a namespace on a shared host cluster but appears to the users as if it’s a full-blown, dedicated cluster. This is achieved by running a Kubernetes API server and some other tools inside the namespace on the host cluster. Users connect to the API server of the virtual cluster to deploy workloads and run kubectl commands, but the pods they create run on the underlying host cluster.
By default, users are admins in the virtual cluster, allowing them to manage any global objects like CRDs that their applications may depend on. This gives users the ability to do their work without bothering platform teams with more access while avoiding cluster sprawl at the same time. Creating virtual clusters with vcluster is also fast, which allows your users to experiment rapidly and discard unused clusters. This makes vcluster the perfect tool to spin up ephemeral environments in Kubernetes.
While vcluster is currently the most popular solution for virtual Kubernetes clusters, the SIG Multi-tenancy group has created an alternative called Cluster API Provider Nested. We expect to see much more innovation around virtual clusters in the next few years.
Kubecost is a tool for controlling your Kubernetes spend. The Kubecost cost models and a plugin for kubectl are open source, allowing teams to track their Kubernetes spend by service, namespace, labels, and more. The open source Kubecost also integrates with billing APIs for AWS, GCP, and Azure. There’s even an open source tool called Cluster Turndown for scaling clusters up and down based on schedules or other criteria.
kiosk is a multi-tenancy extension for Kubernetes that we also created at Loft Labs, and it fills in some of the multi-tenancy gaps we’ve discussed. kiosk is focused on making it easier to provide self-service access to clusters for developers, and we know that self-service provisioning can reduce developers’ cycle times and increase their happiness. kiosk lets platform teams make templates for new namespaces that are created and even set resource quotas for users across multiple namespaces.
Another popular open source tool focused on providing self-service access to clusters is Capsule. Capsule creates a new primitive called a Tenant that allows teams to manage things like RBAC, network policies, and resource quotas at that higher level of abstraction.
While managing multi-tenant Kubernetes clusters could leave some scars in the past, the pain involved is driving more and more innovation. The ecosystem of tools around multi-tenancy has grown a lot recently, and we expect to see more and more tools and commercial products focused on reducing this pain.