Challenges Of Stateful Workloads In Kubernetes: Meet The Data on Kubernetes Community (DoKC)

Guests: Melissa Logan (LinkedIn, Twitter)
Gabriele Bartolini (LinkedIn, Twitter)
Umair Mufti (LinkedIn, Twitter)
Organizations: Data on Kubernetes Community (DoKC) (LinkedIn, Twitter)
EDB (Linked, Twitter)
Portworx by Pure Storage (LinkedIn, Twitter)
Show: Let’s Talk

Kubernetes is well known for its ability to run stateless workloads. Today it is increasingly being used to run databases and other stateful workloads. The Data on Kubernetes Community (DoKC) was founded in June 2020 to bring people together to tackle the challenges of working with data on Kubernetes. To learn more about the community, we invited three guests from DoKC to our Let’s Talk show.

“DoKC was started to bring people together to talk about the challenges of running stateful workloads on Kubernetes, how do we do this right, and what do we want the future to look like,” says Melissa Logan, Director of DoKC.

Bringing the perspective of database communities, Gabriele Bartolini, VP of Cloud Native at EDB, talks about what it’s like to bring their experience from the Postgres community into a much larger community like the Kubernetes one. “Conceptually people think that Kubernetes is just for stateless workloads,” says Bartolini, “but we want to bring Postgres into the Kubernetes space. We found ourselves with the same challenges that the Data on Kubernetes Community is actually trying to solve.”

The Data On Kubernetes 2021 Report

The Data on Kubernetes 2021 report revealed that as organizations seek to expand their data on Kubernetes footprint, they find a lack of integration and interoperability with existing tools and stacks; skilled staff; quality of Kubernetes operators; and trusted vendors. “And I wouldn’t even say today that Kubernetes is really ready for stateful workloads. And I think that’s really what the goal of the DoK Community is, is to try to make that a reality…the truth of the matter is that everybody’s going about it a different way, and there’s many different ways to solve this problem,” quips Umair Mufti, Director of product management at Portworx by Pure Storage.

The report highlights how widespread the use of stateful workloads on Kubernetes is while underlining key areas for improvement. “What we found is more people are using stateful workloads than we had potentially thought, given the challenges that we had heard anecdotally from different end users we had spoken to. And the survey found that 90% think Kubernetes is ready for stateful workloads, and 70% are currently running stateful workloads, but that does not come without challenges,” adds Logan.

Logan continues, “The number one challenge they have is interoperability with their stack. And then when we talk about operators, additional challenges come up, and that includes interoperability with other operators.”

But what do people really want in the future? A majority believe standards will improve data management and that data should become declarative. “Some of what the group here talked about is the idea of standards. So we asked about data standards, would that help? Majority said yes. A majority also agreed that making data more declarative, things like this would help make standardization on Kubernetes easier. Some of the key benefits that people see from standardizing on Kubernetes are massive gains in productivity,” avers Logan.

Check out the interview above to gain further insight into the Data on Kubernetes 2021 report and more.

Topics we covered in this interview:

What is the Data on Kubernetes Community (DoKC) all about?
What’s the importance of this community?
What importance does Umair Mufti from Portworx by Pure Storage see for stateful workloads there?
Why did Portworx by Pure Storage decide to join the community?
Melissa Logan shares some of the major findings of the Data on Kubernetes 2021 report.
As Portworx focuses on data production and backup, what value does it add to the community?
Melissa Logan talks about exciting use cases that are more inclined towards stateful workloads.
If you just look at the Kubernetes ecosystem, there are so many distributions, but we have been trying to focus on standardizing things. As there are vendors, how big is this challenge for DoKC?
Talk on standardization, especially in the data space?
What governance structure is DoKC building? What other things are in the pipeline? How does the community plan to engage more vendors or users in this space?

The summary of the show is written by Monika Chauhan

[expander_maker]

Swapnil Bhartiya: Hi, this is your host, Swapnil Bhartiya, and welcome to Let’s Talk. Today, we have three guests from the Data on Kubernetes Community. We have Melissa Logan, Director of Data on Kubernetes Community, Gabriele Bartolini, VP of Cloud Native at EDB and Umair Mufti, Director of product management at Portworx by Pure Storage. Melissa, Gabriele and Umair, it’s great to have you all on the show. Welcome.

Gabriele Bartolini: Thank you very much.

Melissa Logan: Thanks.

Umair Mufti: Glad to be here. Thank you.

Swapnil Bhartiya: Melissa, let’s start with you. First of all, of course, we have covered Data on Kubernetes Community, the announcement. But I want to hear from you, what is this community all about and what did you folks see in that space that, hey, because Kubernetes Community, they’re trying to solve so many problems, that you look at a special problem and you’re like let’s build a community around it?”

Melissa Logan: Yeah. The Data on Kubernetes Community has been around for a little over a year at this point. It was originally stewarded by the folks at MayaData who wanted to bring people together to talk about the challenges of running stateful workloads on Kubernetes. So I think we all know that Kubernetes is well known for its ability to run stateless workloads. And then the community pulled together some features that allowed stateful workloads to be run on that. In 2016 operator pattern came out as well, which created that translation layer between the technology and Kubernetes. So we started to see some early patterns of success of people doing this, but there were still some challenges to make it work really well. So the community was started to bring people together to talk about all of this, and how do we do this right, and what do we want the future to look like. And we’ve run over a 100 meetups in the past year, a couple a week at least, and just recently held our second Data on Kubernetes Day at KubeCon just last week in Los Angeles.

Swapnil Bhartiya: Right. And you folks also did a survey, I would talk about the survey and all those things later. I want to hear from Gabriele that, first of all, what value do you see in this community? And when did you folks decide to get involved with the project?

Gabriele Bartolini: For us, I mean, the value is what Melissa described before, we had the same kind of idea and the same feeling that we got this huge technology, this fantastic technology, but it’s underused in terms of data. So, conceptually people think that Kubernetes is just for stateless workloads, but having worked in databases for more than 20 years, and even our company, we have always been developing Postgres even the open source one, we wanted to transition like we did in the past from bare metal to VMs, now we want to bring Postgres in the Kubernetes space.

And we found ourselves with the same challenges that the Data on Kubernetes Community is actually trying to solve. So, it’s actually an innovation kind of frontier, so we are actually in an unexplored territory. So it’s something that hasn’t been done before. So it’s very exciting to be in that space. And we want to bring our experience from the Postgres community into, I believe, a much larger community like the Kubernetes one. So that’s, I think, the right place for us to be.

Swapnil Bhartiya: Excellent. Umair, as Gabriele was saying that the Kubernetes’ early days were more like stateless, but now it’s all stateful. And we hear Kubernetes is being used in so many use cases that we do talk about stateful. So you are coming with Portworx, or Pure Storage, so let’s hear from your perspective, how do you look at this community? What importance do you see for stateful workloads there?

Umair Mufti: Yeah, it’s a great question. And you’re right, I mean in the past, obviously Kubernetes was built to solve a certain use case. And I wouldn’t even say today that Kubernetes is really ready for stateful workloads. And I think that’s really what the goal of the DoK Community is, is to try to make that a reality. There are still many things that people struggle with, but the desire is clearly stated. We’ve seen that time and time again, Melissa released the results of the study last week which show overwhelming demand for people to want to run stateful applications in Kubernetes. But the truth of the matter is that everybody’s going about it a different way, and there’s many different ways to solve this problem. We’ve got things like the operator pattern, like Melissa mentioned that was released in 2016, which is a great framework. But at the end of the day, it’s really just a framework. It still requires people to come in and create the solutions themselves.

And what we’re seeing is every single database vendor, while they recognize the need for an operator, they’re all solving it in different ways. And I’m hopeful that with the DoK Community, we can come together and create a higher level of standards so that everybody can kind of code to the same, let’s say, meta framework and just have things that automatically work in Kubernetes. A lot of those challenges that we see with databases specifically, isn’t so much that they’re stateful. I mean, there’s the MayaDatas and the Portworxs of the world that have come along and kind of solve that problem, having a persistent volume attached to your container. The problem is that these are distributed applications in large part, and that still today is a difficult problem to solve. So we need to create some standards around that.

Swapnil Bhartiya: When and why… Of course, you alluded to a lot of those points, but if I ask you specifically when and why Portworx or Pure Storage decided to join the community, what was the reason?

Umair Mufti: Absolutely. I mean, it’s in our best interest. I mean, so Portworx, as you probably know, was acquired by Pure Storage just over a year ago. Portworx itself has long been… GigaOm has called us the gold standard in storage on Kubernetes. So, this is a problem we’ve long been trying to solve outside of the scope of data services or applications. Just the notion of running data on Kubernetes, that’s our bread and butter. And Pure as a brand, as a company, they are all about modernizing data workloads and making the lives easier for those of us who are dealing with data. So, it’s right in line with our corporate values, our company values and the problems we’re trying to solve as a company.

Swapnil Bhartiya: You folks recently published a survey, tell me a bit about what… I mean, once again, you touched upon it a bit, but what was the goal behind the survey? And if you can also share some of the major findings that, when you looked at the survey, something that kind of even you looked at them and you’re like, “Hey, whoa, we were not even expecting that.” Because these surveys, there are all things that we do know these trends. So we want to just get more insights into them and then suddenly something comes up which surprises us.

Melissa Logan: Yeah. So, I think we’ve all seen the CNCF survey. And in 2020, they did have one stat about stateful workloads in there. I think it was, with going to the Kubernetes user base, they asked who was doing stateful workloads, and it was 55%, or some number like that. And that told us something, but it didn’t tell us all of what we wanted to know. So we wanted to really dig deeper into that and understand who, what, why, how, and that’s why the survey came about. We wanted to understand the data on Kubernetes landscape and say, “Now we understand a little bit better about what people are doing and what challenges they have.” So what we found is more people are using stateful workloads than we thought, than we had potentially thought, given the challenges that we had heard anecdotally from different end users we had spoken to. And the survey found that 90% think Kubernetes is ready for stateful workloads, and 70% are currently running stateful workloads, but that does not come without challenges.

So, the number one challenge they have is interoperability with their stack. And then when we talk about operators, additional challenges come up, and that includes interoperability with other operators. So as Umair was saying, you have one operator for everything that you’re using, and that can create challenges across distributed applications. And the second challenge they identified with the operator as a particular with varying degrees of quality. So again, what kind of standards exist? They don’t. And so that creates challenges for people who are trying to manage these workloads on Kubernetes.

But the future is really what we wanted to understand as well, what do they want in the future? And some of what the group here talked about is the idea of standards. So we asked about data standards, would that help? Majority said yes. A majority also agreed that making data more declarative, things like this would help make standardization on Kubernetes easier. Some of the key benefits that people see from standardizing on Kubernetes are massive gains in productivity. So we looked at a cohort of users that had 75% or more of their workloads on Kubernetes and asked them what the key benefit was. They said standardization, of course, and bringing your stateful workloads together is really important, and as well as productivity. So they saw two times or greater productivity gains. That was a pretty big leap in terms of productivity that they’re getting from Kubernetes.

And one other thing I’ll mention, we had DreamWorks Animation come talk on a panel that DoK Day last week in Los Angeles. And they were talking about how they’re running about 370 databases over 1,200 pods, and that’s just a range of different databases. And prior to Kubernetes, they were doing it Linux containers, and then Docker. But what Kubernetes did was solve some of the challenges around provisioning automation, et cetera, and help them scale to support this large number of database clusters without needing to have an explosion in headcount. So that’s another benefit that we didn’t talk about too much in the survey, but that we’re also hearing people find when they’re using this to standardize, using Kubernetes to standardize.

And he also mentioned this overlooked benefit being this consolidation of technologies within DreamWorks has been helpful to bring the team together so they could collaborate on the same kind of common set of technologies, and advances made from one team help the other team, and that wasn’t possible prior to standardizing on Kubernetes. So, we’re really seeing a lot of good range of different benefit. It’s from the survey and then different things that we heard from the end users last week.

Swapnil Bhartiya: I would talk about the social and people problem, but I want to talk from Umair because at Portworx, you folks do have a lot of focus on data production, backup, and all those things. So if I ask what value do you bring to the Data on Kubernetes Community to bring that perspective there?

Umair Mufti: Yeah. Just kind of piggybacking of Gabriele’s point of sort of the automation, one of the things, one of the benefits that you get from Kubernetes is sort of a reduction in shadow IT, and I think this is what the DreamWorks folks were alluding to last week also. So, you’ve got your base images, you’ve got sort of a standard deployment that you’re sending out for all of your Postgres deployments. You don’t need people to be spinning up their own VMs and running their own Postgres deployments anymore which kind of compromises security, right? You can scan your images and you have that whole notion of automation to prevent some of these issues with security, or these vulnerabilities I should say.

Portworx is a company, as you mentioned, we have a lot of products that specifically deal with data protection. And so when we’re talking about data now, we’re not just even talking about security, we’re also talking about data protection. So, we think of data availability. One of the core… the product that I manage, Portworx data services builds into it this notion of distributing your data across your worker nodes such that if there’s any particular worker node, your data’s always available. Let’s say, in your Postgres cluster, you may have a primary running one place and you have secondaries running other places, so we can leverage both the underlying capabilities of Kubernetes itself in terms of scheduling. And then some of the things that Portworx, as you mentioned, that we bring to the table are things like the STORK orchestrator itself, which manages the orchestration of pods so that they converge with where the data live. And not only where they live or with the convergence, but also being aware of where those data lives, so that you can do things intelligently, like doing anti-affinity rules and spreading your data across your cluster.

And then again, doing things like 321, adhering to the 321 sort of paradigm of making sure you’ve got three copies of your data in two places, one of them being off location. So we have a lot of DR capabilities built into our platforms as well, and that’s the entire stack of Portworx’s offerings.

Swapnil Bhartiya: You talked about stateful workloads, can you share if you saw… Of course, you shared DreamWorks, but any exciting use cases that you saw there which are more inclined towards stateful workloads. We also talk about edge these days and edge is about a data center, not necessarily IoT devices. So was there any, not just necessarily survey, but being at the conference itself where you met a lot of folks who are using Kubernetes with the stateful workloads, and those are the use cases that you were like this is the places that we are going now?

Melissa Logan: We asked people what types of stateful workloads they were using on Kubernetes. Databases was really the number one workload. As you can imagine, that comprises a lot of what organizations have. But what we also saw there when we asked the Kubernetes leaders cohort, that 75% and higher production workloads on Kubernetes, we also saw AI/ML workloads jump up that list for them. So from the number six spot, I think up to the number four or three spot in there. So, we see those workloads coming into more prominence, the more you standardize on Kubernetes.

Swapnil Bhartiya: You also mentioned challenges in terms of interoperability and externalization. If you just look at the Kubernetes ecosystem, there are so many distributions, but we have been trying to focus on standardizing things. But of course there are vendors, so how big is this challenge for Data on Kubernetes? And what do you think of the trend that you’re seeing to standardize things or things that you have to do as a community or ecosystem to make sure that we should not go back to the same world of interoperability challenges?

Melissa Logan: I think part of what we heard in the survey, what people want, as Umair mentioned at the very beginning, are more standards around this. So this is something that we heard loud and clear in the data, and it’s something that we hear pretty loud and clear when we talk to different end users in our community as well. They want some level of standardization, maybe it’s with operators, maybe it’s with data standards, maybe it’s additional things, some things within Kubernetes and some things outside of Kubernetes. And that’s what we’ll be exploring with our end user working group over the next few months to really understand what are those key challenges, let’s drill down into that and come up with some solutions that we can do together, to create together.

Swapnil Bhartiya: Gabriele, if I ask what is your take on standardization, especially in the data space?

Gabriele Bartolini: Yeah. I think it’s an interesting challenge. It’s an unknown kind of answer for now. I mean, writing an operator and trying to map all the manual tasks we were doing, of course, I think we can certainly generalize these kind of features. And it’s definitely interesting, so it would be actually a good thing to sit down with all the other database vendors and projects to write these common set of CIDs, and for example even how basically the API of these operators. Because at the moment, I think that’s the most common pattern. There are other things, for example, in the database operator space, I see that most of the operators rely on StatefulSets. Whereas, for example, we decided to take a different approach to have more control on the persistent volumes, like what Umair was saying before, to actually make sure that the pod knows where the data is and lives very close to that.

There’s not yet, I’d say, an abstraction of this pattern, because I think the closest one is the StatefulSet, but at the moment we actually have these kind of custom way to manage persistent volumes and persistent volume claims with pods. So, it would be interesting to actually learn from what some of us have done, because I think not many projects have decided to go down this path, but it would be interested to learn and maybe abstract this kind of pattern somehow.

Swapnil Bhartiya: Umair, what do you have to say about externalization in this space?

Umair Mufti: It’s a very near and dear topic to my heart. I’ll go back to what you originally were asking about the many different distributions of Kubernetes. I actually don’t think that’s a bad thing. And the reason that it actually exists is because of the standards, we have a standard of what a Kubernetes distribution must be to be certified as Kubernetes distribution. And it’s great that there’s competition and there’s the Ranchers and there’s the OpenShifts of the world that are adhering to that standard, but then adding their own special sauce or offering support and all those sorts of things and differentiating at a different level. But I know, as a vendor, if I want to deploy my StatefulSet or I want to deploy my operator, at the end of the day, I don’t care if it’s Rancher or if it’s OpenShift because we’ve got that standard and I’m guaranteed that it’s going to run, even if it’s K3s. At the end of the day, it’s still a valid Kubernetes distribution.

And I’d like to see the same thing, frankly speaking, with the way that databases or other data services, AI/ML workloads and whatnot, are deployed on Kubernetes. Gabriele mentioned that we’ve got this notion of a StatefulSet, even that has shortcomings even for databases. And it makes complete sense to me why they would go a different route and not leverage StatefulSets. So maybe there needs to be a higher level API that everybody agrees on. And I think the only thing that we can do is come together as a community and try to create these standards.

Swapnil Bhartiya: Thanks Umair for sharing that insight. Now, Melissa, I’ll come back to you. And if you look at this community, of course, as you said, the community has been around for a while, but I do want to understand what governance structure you are building it? What other things are in the pipeline? How do you also plan to engage more vendors or users in this space? So, share what’s in the pipeline.

Melissa Logan: Yeah. So I mentioned early on that the community was originally stewarded by MayaData and then later data stacks, but the intention was always to really make it a bigger [inaudible 00:20:07]. And so we recently announced a new governance structure and had over 20 sponsors sign on to join us, including the folks on the phone here. And we are just getting started with getting everybody together to talk and collaborate about this issue. We’re forming an end user working group to, as I mentioned earlier, have the discussions about what are the challenges they’re seeing on the ground, and then bringing those conversations back to our sponsor working group and saying what can we do about this as a group? So I think by the next KubeCon, we’ll have another DoK Day there, and you’ll see a lot more coming out from the community at that point, what are we going to do about this? Is it talking about standards? Is it other things within Kubernetes? We’ll have a lot more to share at that point. It’s a really good time to join us, if you’re able.

Swapnil Bhartiya: Melissa, Gabriele, Umair, thank you so much for joining me today to talk about Data on Kubernetes Communities. And of course, as Melissa shared, there are so many things in the pipelines, so I would love to have you guys back on the show. Thank you.

Umair Mufti: Would love it, thank you so much.

Swapnil Bhartiya: Okay, thank you.

Gabriele Bartolini: Thank you very much. Bye-bye.

[/expander_maker]

Challenges Of Stateful Workloads In Kubernetes: Meet The Data on Kubernetes Community (DoKC)

SLAs And Four Nines Are Not Enough For High Availability In The Cloud

RackN Digital Rebar 4.8 Introduces Infrastructure Pipelines

SLAs And Four Nines Are Not Enough For High Availability In The Cloud

RackN Digital Rebar 4.8 Introduces Infrastructure Pipelines

You may also like

How to Govern and Observe AI Agents at Scale Without Centralizing All Your Data | Mangesh Pimpalkhare, Cisco Splunk | TFiR

Why Cloud Migration Does Not Guarantee High Availability | Matthew Pollard, SIOS Technology | TFiR

Why Cloud ROI Stalls After FinOps Stage One | Peter Maloney, Azul | TFiR

How to Cut Cloud Dev Costs and Ship More Resilient Code Locally | Waldemar Hummer, LocalStack | TFiR

Why AI Agents Break Enterprise Identity and Security Systems | Miska Kaipiainen, Mirantis | TFiR

How to Run Enterprise IT on a Ship With No Shore Support | Diogo Almeida, AIDA Cruises | TFiR