Navigating The Challenges Of Cross-Cluster Migration Of Kubernetes Workloads With CloudCasa

Cross-cluster migration of Kubernetes workloads continues to be challenging since workloads are isolated from each other by design. There are a number of reasons why you may want to separate your workloads, whether it is to reduce complexity or to have the cluster closer to the user base. However, this can be complex as Kubernetes has many components.

In this video, Kamlesh Lad, VP of Engineering & Chief Architect at CloudCasa by Catalogic, sits down with Swapnil Bhartiya to discuss the different scenarios where developers may want cross-cluster migration of Kubernetes workloads and the associated challenges. He also takes a deep dive into the available tools and how CloudCasa is helping simplify cross-cluster migration of Kubernetes workloads for developers.

Key highlights from this video interview are:

There are several reasons why enterprise Kubernetes deployments have so many clusters. There are also different reasons for why developers may want to separate them. Lad goes into detail about the complexities of dealing with lots of clusters and the different scenarios why you may want to separate them.
Even if you have on-prem, developers may want to have some clusters on-prem and some in the cloud. Alternatively, they may want clusters with different cloud vendors, or to segregate the clusters by different accounts in a single cloud. Lad describes the different infrastructures where developers may want to separate clusters.
Lad goes into some of the common use cases he sees in production for data migration across clusters, from wanting to move workloads on a cluster to new storage, to cloning production workloads to the development and QA environments.
Kubernetes has so many different components, but there are several that you cannot migrate the workload without moving them, such as the metadata and persistent volumes. However, Lad explains which components need to be kept in order to migrate the workload.
One of the challenges of moving or duplicating workloads across different clusters is that there is not any communication between them, since the storage is not shared. Taking the persistent volume or the etcd data in a way so it can be moved to a different cluster is challenging. Lad discusses how you might want to navigate these difficulties.
There are several open source and proprietary tools to help developers migrate workloads. Lad goes into detail about the available tools.
Lad goes into depth about how CloudCasa is helping to make this easier for developers. He explains their fully managed SaaS service, and its features and benefits.
Lad gives Swapnil a demonstration of how CloudCasa helps with migrations, taking him through the workflow of migrating a cluster from one AWS account to another AWS account.

Connect with Kamlesh Lad (LinkedIn, Twitter)

Learn more about CloudCasa by Catalogic (LinkedIn, Twitter)

The summary of the show is written by Emily Nicholls.

Here is the automated and unedited transcript of the recording. Please note that the transcript has not been edited or reviewed.

Swapnil Bhartiya: Hi, this is your host, Swapnil Bhartiya, and welcome to another episode of TFiR Let’s Talk. And today we have with us Kamlesh Lad, VP of Engineering & Chief Architect at CloudCasa by Catalogic. Kamlesh, great to have you on the show.

Kamlesh Lad: Great. Thanks for having me.

Swapnil Bhartiya: Today, in general, we are going to talk about use cases and challenges for cross cluster migration of Kubernetes workloads. But before we get started, I would like to hear your insights, your thoughts into b

Kamlesh Lad: Yeah. So several reasons why. Just from an operations standpoint, you’re current cluster, you could have hit scaling limits. So you want to separate your workloads out to a different cluster. You may want to reduce your complexity. So you’re running your applications in siloed fashion or even geo separations. So you want to keep your clusters in different locations, maybe closer to where user base is. And also maybe for redundancy have different clusters. In addition to operations, you also have your CI/CD pipeline and workflow where you’ll have a separate production, separate staging, separate development, QA environment. You want to keep those separate. Obviously you don’t want your devs to go play in your production environment. And then you have hybrid environments as well, where you may have clusters on-prem in the cloud. A lot of reasons there.

Swapnil Bhartiya: Perfect. Now, when we do talk about a lot of clusters, we also live in a multi cloud, hybrid cloud work. So are these clusters also spread across different clouds as well?

Kamlesh Lad: Yeah, definitely. So even you can have, on-prem going to some clusters on-prem, some clusters in cloud. And even within the cloud, there’s several cloud vendors that provide Kubernetes services. And then even within the single cloud, you may want to even segregate your clusters by different accounts. You don’t want all your eggs in one basket, and you keep things separate by application.

Swapnil Bhartiya: Excellent. And as these clusters are created, are deployed. Can you also talk about what are some of the common use cases that you see in production for data migration across clusters?

Kamlesh Lad: Yeah, lot of use cases there. So just one example is … Let’s say you want to upgrade your cluster infrastructure that’s a compute and storage. You may want to move your workloads you’re running on that cluster to the new storage. You may also want to move your workloads to a new data center or a region or, examples. You are on-prem you want to migrate to the cloud, or even move between cloud vendors. And then another big use case is like dev test. Part of your continuous integration introduced development is you may want to clone production workloads to your development and QA environments. This way developers and QA folks can actually test on real world data. And this way it’s partitioned off. You want to give them access to the production data directly, but this is a good use case for migrating and cloning workloads. Another case: let’s say if your current cluster is overloaded, you may want to granularly move certain workloads to a different cluster. So that’s among the primary use cases there for migration.

Swapnil Bhartiya: Excellent. Thanks for explaining that. Now, if you look at Kubernetes, it is already quite complicated. There are so many moving parts. There are so many sub-products that are within there that helps [inaudible 00:03:45] started. When we look at these kind of migration, considering the fact that Kubernetes’ clusters have so many different components, what are the components of the Kubernetes that need to be moved, that you cannot migrate without moving them?

Kamlesh Lad: Oh, right. So it does several … Application usually is made up of metadata. That’s stored in your etcd, your config maps, your CGrids, et cetera, that’s your metadata. Then you have the persistent volumes. That’s where your persistent data is kept, and you want to keep all those as a unit when you’re trying to migrate it over. So, and there’s some challenges there as well.

Swapnil Bhartiya: Right. So let’s talk about what challenges are. What kind of patterns you have seen where customers run into those specific problems when they try to move or duplicate workloads across different clusters?

Kamlesh Lad: Yeah. So some of the challenges are … Kubernetes clusters don’t really talk to each other. They’re almost like islands unto themself. There’s no communications between them, no networking. The storage is usually not shared and that’s usually by design, right? You want to keep applications isolated. So you definitely have challenges in terms of, “Hey, how do I take that persistent volume or that, the etcd data and expose it in a way I can move it to a different cluster?” Most environments, you don’t expose your persistent volumes or your etcd database to the public network. It’s usually internal node network. So you need a way to actually move these. You probably need intermediate storage in between to move this data. And, like I said, there’s no networking between the clusters. So you need to easy way to do that as well. So it could be tedious. And then on top of that, you may not want to move the entire cluster. You may just want to move granular workloads. So you have to pay attention to that as well.

Swapnil Bhartiya: Excellent. Now of course, I’m pretty sure that there are a lot of tools that are there to help assist with migrating these workloads. Can you talk about what are tools that are there and how folks are currently trying to manage up these migrations?

Kamlesh Lad: Okay, sure. Yeah, there’s several open source tools and commercial tools. Just an example of open source tool, there’s Velero and Restic. It does require a lot of manual setup. It requires your own infrastructure. You have to have your own storage that multiple clusters can communicate with. And just a lot of manual tooling there. So in terms of commercial tools, obviously we’re a producer of CloudCasa.io. It’s a fully managed SAS service dedicated for Kubernetes backup and migration. All you do is deploy a lightweight agent to each of your clusters. And we host entire infrastructure, including the storage, and we make things efficient by doing DDU compression and network throttling. So it doesn’t affect your production cluster while we’re doing any of these backup and migrations.

Also, it has a self-service UI [inaudible 00:07:11]. This is pretty important if you’re doing dev test workflows, where you can give your developers granular permissions to, for example, clone your production workload to dev test cluster, and not the other way around. That way, they’re not affecting production clusters. And we also have really tight integration with EKS, for example, an AKS. So in addition to backing up and migrating the etcd and persistent volumes, we also take care of the actual EKS envelope and metadata. So on that restore, migration will automatically create EKS or AKS server clusters.

Swapnil Bhartiya: Perfect. Now, of course, we have talked about the problem area and you also touched upon how CloudCasa is helping, but the whole point is to help developers with this problem. Talk about how Cloud is helping with some of these migration, and to make it easier for, for folks.

Kamlesh Lad: Yeah. So one of the things that CloudCasa does is to fully manage SAS service. So there’s no need for cluster admins or developers to set up their environment. It’s all hosted by us, get an easy to use a web UI. And we do have granular our back permission, so you can go ahead and give certain permissions to your developers or members of your team to do specific tasks. So there’s a lot of self service involved there as well. And again, we make things as efficient as possible, in the sense that when you’re migrating or backing up large amounts of data, you want to make sure your production is not affected. So we have network and storage IO throttling. That way when these large data transfers happen, we’re not affecting the production. We also do smart DDU compression, and also keep your data safe while we’re migrating the full encryption. So again, our main premise is we want to make it as easy as possible, low friction solution, very easy to use.

Swapnil Bhartiya: Excellent. Now, is it possible for you to just show us how CloudCasa kind of does this help with this migration, especially for AWS clusters?

Kamlesh Lad: Yeah, sure. So I’ll go through a workflow where we’ll go migrate a cluster from one AWS account to another AWS account. And [inaudible 00:09:41] things CloudCasa does there. So if you notice on the screen, it’s our web UI, so you should log in, you get a nice dashboard. So let’s go to the next slide here. So the first thing you want to do is just link your AWS accounts to CloudCasa, and we make it very simple. All you do is we launch a simple cloud formation script that gives us permission to access account. So a very simple onboarding. So now once you have your clusters, for example, here we have a source and target cluster. We’re going to go ahead and register those into CloudCasa here. You know, it’s a EKS cluster. And all you do is go to a register button, copy and paste a kubectl command we provide, and that’ll go ahead and push a lightweight agent into your cluster.

So it’s very simple onboarding. So here, we’re going to actually back up the test website from workload through the source cluster. So if you notice here is our backup definition, you can back up the entire cluster, or in this case, we’re just looking at a granular migration. So we just want to move this work test website workload.

So pretty simple, and you run it. Once our backup is done, now we can actually restore or migrate this test website to our target cluster. So you just hit restore, and at this point, you’ll get a nice wizard. And if you don’t already have a EKS cluster, it will actually automatically create one for you, or you can use the existing one. Something I want to point out, something here CloudCasa does for you automatically is our source cluster had a EBS storage class only, and our target only has GP2. So we’ll do the translation between the storage classes while we’re doing the migration. So we have intelligent things like that to make migration as automatic as possible. And then just as simple as that, we’ve moved our workload from source target cluster with a couple of clicks.

Swapnil Bhartiya: Kamlesh, thank you so much for taking time out today and not only talk about the challenges that are there that comes with so many Kubernetes clusters, and, of course, migrate data workloads that guard those clusters. Thanks for sharing those insights, and I would love to have you back on the show. Thank you.

Kamlesh Lad: Great. Thanks for having me.

Read Full Transcript & Technical Deep Dive

Navigating The Challenges Of Cross-Cluster Migration Of Kubernetes Workloads With CloudCasa

StarRocks Cloud Helps Simplify The Delivery Of Real-Time Analytics Projects

Kyligence Announces Support For Amazon EMR Serverless

StarRocks Cloud Helps Simplify The Delivery Of Real-Time Analytics Projects

Kyligence Announces Support For Amazon EMR Serverless

You may also like

Split-Brain Explained: The Application-Level Failure Multi-Region Can’t Prevent | Philip Merry, SIOS Technology | TFiR

How Does JDK 26’s G1 Garbage Collector Optimization Improve Containerized Java Performance? | TFiR

Kubernetes Day-Two Ops Are Bleeding Platform Teams Dry | Hong Wang, Akuity

Three AI Infrastructure Opportunities Enterprises Must Capture in 2026 | Danielle Cook, Akamai | TFiR

Kubernetes Data Services at Scale: Julian Fischer, anynines | TFiR

AWS Testing Costs Kill Velocity: LocalStack Hits 400M Docker Pulls | TFiR