In this episode of TFiR Let’s Talk, Swapnil Bhartiya sits down with Manik Sidana, Principal Architect at Coredge, to discuss the key Kubernetes operations from Day 0 to Day 2, going on to explain the challenges people are facing with Kubernetes. He goes on to share his insights into what organizations should look for when moving towards Kubernetes and what to look for in a platform managing Kubernetes operations.
As the adoption of Kubernetes continues to grow in production, there are still many organizations that are on the bandwagon without fully assessing the whole situation and hence do find themselves up against a number of challenges. Kubernetes is known for its complexity and while there are a number of tools designed to help manage the complexity, it is important to know whether Kubernetes is right for your workloads and your organization as a whole before jumping in.
What are Day 0 to Day 2 Kubernetes operations?
Sidana takes us through the key operations from Day 0 to Day 2. Day 0 refers to the day or phase when you get a Kubernetes cluster from a build team or deployment team, and it is handed over to the Ops team. That cluster would have been deployed through a lot of automation pipelines with a couple of different sets of automations running on it. Sidana explains that it could have certain OS patches on top of the nodes, or in the use case of telcos they will need a lot of network acceleration features.
Before rolling out the cluster further, the initial step is to validate the cluster, ensuring the target workloads that are supposed to be running on the cluster have everything and are ready for the applications to consume from the cluster. For 5G applications, this could be advanced features like HugePages or with enterprise applications, this could mean integration of the cluster with a kind of object storage or block storage.
Day 1 of the cluster typically involves ensuring adequate tooling in the cluster, such as setting up certain monitoring stacks, configuring the law collection, and tracking various metrics. Sidana tells us that this is where policy management comes in, where you would define policies. This could be network policies enabling the communication between the pods and services, and ensuring that the applications are isolated from each other, and putting in place backup policies. Day 1 is where you ensure the clusters have everything rolled out for compliance and governance.
Day 2 comprises the actual application rollouts and upgrades, the cluster upgrades, where the actual operation starts. One of the key operational challenges of Day 2 is upgrades since there are a lot of components that people install on top of Kubernetes, which have strict dependency on Kubernetes versions.
Key challenges of Kubernetes
Sidana believes that a lack or absence of proper security is a main pain point with Kubernetes. He explains that having multiple clusters and maintaining security by providing adequate roles and access to the user of the clusters can be complicated. The cluster admin will have full rights since he is the one managing the cluster and needs to have full access to each and every operation. However, application owners should not have a lot of privileges, just enough so they can run their application and not hamper other name spaces or cost disruptions in the clusters. SREs will also be accessing the clusters for fixing production issues. So having the right security model is critical when managing a cluster, otherwise it can lead to potential problems.
According to a CNCF survey, 55% of the respondents said that they are facing problems because they do not have in-house skills, or are not able to hire the right talent. This is due to the fast changing nature of Kubernetes and the huge landscape of CNCF, where there are so many open-source projects and people embracing open-source tools, that finding people who have the skills to use the wide plethora of these tools can be difficult.
Sidana discusses the complexity of Kubernetes saying that one of the challenges is it does not provide anything on the tenancy aspect, needing layers of management on top of it. So, you cannot manage multiple tenants on Kubernetes and rather need to do it via name spaces, where you have to write your management layer. While there are interest groups working in Kubernetes on multi-tenancy, they have their own way of solving this which leaves the approach fragmented. He believes that open source has the power to bring everyone together and provide a more consolidated approach.
How should organizations approach Kubernetes?
Moving towards Kubernetes needs to be a well thought-through decision, evaluating the application workloads you have. Sidana recommends assessing the nature of the application, its needs, and what you need from the underlying platform. He believes that it is important to consider all the different aspects of the workload and see if there is something missing in your current approach and whether Kubernetes can fill that gap. He emphasizes the importance of doing POCs first, and establishing them in the lab to assess the performance of the application.
Many organizations are using a hybrid approach where they put their production workloads on public Kubernetes providers like EKS and GKE, which provides a lot of elasticity. Organizations may choose to do this to accommodate peak loads on the infrastructure, such as an eCommerce site over Christmas. Public providers can scale on demand, although you cannot reserve so much hardware just for a month of peak load. Sidana explains how this helps organizations judge the cost tradeoffs and it takes care of some of the operational complexity.
There are a lot of compliance requirements, for instance, if you want to serve European customers you will need GDPR compliance. However, if your clusters are in the US area, you will not be able to serve European customers because of the compliance. Due to this, many people are moving towards multi-cluster or multi-cloud to take care of these compliance issues. Sidana feels that it has the ability to make or break a product just because users do not like it when they get too much latency.
What to look for in a platform for managing Kubernetes operations?
There are three core things a platform should have for managing Kubernetes operations. The platform should have zero trust security, which means that you will never trust, always verify. While it should allow users coming, it should also allow lots of customizations in the roles or RBAC access. Sidana believes that if security is not configured correctly, then it is potentially a big problem.
Additionally, he says that any platform should offer a good centralized management with complete visibility of all the clouds and clusters on a single pane of glass. You should be able to see your inventory, how many VMs, and how many clusters or pods you have across all the different clusters or clouds. This will enable you to better plan wherever a new application or demand arises.
Furthermore, the platform should also have a seamless app delivery mechanism to enable seamless roll out of applications across different clusters or clouds, while simultaneously handling the application lifecycle management.
Connect with Manik Sidana (LinkedIn)
Learn more about Coredge (Twitter)
The summary of the show is written by Emily Nicholls.