Cloud Native ComputingDevelopersDevOpsFeaturedNewsroomVideo

Importance of monitoring in the cloud-native world : Payal Chakravarty @Sysdig


Transcript of the interview.

Swapnil Bhartiya: Hi, this is Swapnil Bhartiya and welcome to TFIR newsroom. And today we have with us Payal Chakravarty, VP, Product Management at Sysdig. Payal, first of all, welcome to the show.

Payal Chakravarty: Thank you Swapnil.

Swapnil Bhartiya:  I’m kind of curious to know what is the importance of monitoring in the Cloud Native Landscape?

Payal Chakravarty: So, one thing that’s happening in the Cloud Native Landscape is applications are shipping really fast and as applications ship fast, monitoring needs to be embedded into the DevOps workflow. And the other thing that’s coming with this is these applications are running on the containerized infrastructure. Containers are FMRI dynamic and that leads to a high explosion of operational data. So how do you analyze that data? How do you scale, ingest, gather that data? And how do you derive insights from that data? It becomes very relevant. So monitoring tools in the Cloud Native world need to adapt to handle that high volume of high cardinality data, metrics and being able to ingest, scale and analyze that data.

Swapnil Bhartiya:  There are already a lot of solutions that are available, and they are Open Source, Prometheus is one. What kind of problem users/customers run into when they use Prometheus and suddenly, they have to scale.

Payal Chakravarty: Right. Prometheus is a standardized way to collect metrics. Developers love it. If they want to run a typical Cloud Native application stack where the infrastructure is based on Kubernetes and then they are running applications in Golang or using a stack, which includes Jenkins, MongoDB, Kafka and things like that, all have standard Prometheus Exporters. So, for the first time in the monitoring world, there is a standardized way of getting metrics. And developers get started with installing Prometheus, getting Grafana and getting dashboards and metrics out of the box way quite easily. But what happens eventually is as the company surpasses a handful of apps and it’s not feasible then to manage the scale and growth of Prometheus environments.

In an enterprise, especially running a Prometheus across an enterprise can result in managing and federating several Prometheus servers, figuring out enterprise workflows, like single sign-on access control, adhering to SLEs or compliance and it becomes a reliability issue. For example, one customer mentioned how one application team’s sudden introduction of auto-scaling generated millions of time series and brought down the entire monitoring system and they couldn’t even get long-term data retention to visualize trends over time.

So, this is where Sysdig comes in. Sysdig provides the scale, enterprise controls without forcing developers to move away from Prometheus. With Sysdig, you get Prompt QL, which is a popular query language for Prometheus used by developers. When developers writing Grafana dashboards and their alerts, they are based on Prometheus query language or Prompt QL and Sysdig provides native compatibility with that, which means that your developer’s dashboards, alerts, configs will continue to work as you scale and as you grow in an enterprise environment.

Swapnil Bhartiya: So basically, you allow customers and users to bring their whole toolchain, everything, they don’t have to drop everything else and move to?

Payal Chakravarty: That’s right. Exactly. So, it gives the DevOps teams and Platform Ops teams, they love it because we’re not asking people to move away from the investments they’ve made, the choice of their tools. They continue to use the tools of their choice, but they don’t have to manage the complexity and the backend and the Open Source DIY nature of the solution.

Swapnil Bhartiya: Can you talk about some of either new features or capabilities that you are adding with that kind of iteration of the surface and product?

Payal Chakravarty: Yes. There are three key things. Number one is scale. We have a re-architected time-series database that can ingest tens of millions of times series per customer. So, the scale that we can handle is significant and we’ve proven it with our IBM Cloud partnership. We’ve built a monitoring solution for all of IBM Cloud, for all their external and internal customers. And that’s based on Prometheus.

So, the scale is number one. Number two is, as I mentioned compatibility, so Prometheus query language. We are the only commercial enterprise solution to provide Prometheus query language support end-to-end with our product. And that is where you can retain the developer investment and not have to ask them to redo their work. And the thing which we are launching is very interesting even for developers is the Prometheus catalog. So, the Prometheus catalog is a one-stop-shop, a repository of curated exporters, dashboards, alerts, configurations for Prometheus.

So, for example, if you want to monitor a specific, let’s say, MongoDB or Cassandra, you can go in, search for that integration within a few clicks you can get running with the dashboards and alerts. These are all, as you know, there’s a lot of community support behind it. But it takes the developer’s time to figure out, “Okay, which dashboard should I use? Which exporters should I use? Which version is it compatible with?” And it almost sometimes takes one week to figure out which integration actually works and then to keep up with it as the Open Source landscape changes. Sysdig provides a Prometheus catalog, a completely curated supported set of integrations provided by Sysdig and the community can also contribute to it.

Swapnil Bhartiya: How can a community member, because there are different communities, get involved with the catalog?

Payal Chakravarty: Yeah. So, when we launch, we’ll show you that the Prometheus catalog and have a contribute button and let’s say, somebody came up with a new exporter or a new set of dashboards that are really insightful for a specific integration, they can contribute and submit that. And then Sysdig’s internal team will go through those contributions, they will vet it, test it, curate it and then say, “Okay, this makes sense to be included officially in the catalog and will be supported by us for all upcoming releases.”

Swapnil Bhartiya: Are there specific industries or word kills or companies that either relies too much on it, can you talk about that and who are kind of typical Sysdig customers?

Payal Chakravarty: Yeah, we have customers across the landscape. Financial service is a huge sector. We have almost the key investment banks, key credit card companies as well as several other banks all over the world using us. Media companies often have the need to auto-scale up and down and hence rely on containerized environments.

We are seeing almost every enterprise in healthcare insurance, any enterprise where they are all on this journey towards the Cloud to modernize their applications. They are all moving to Containers, Kubernetes and Sysdig becomes a forced choice for them for monitoring and securing their environments.

Swapnil Bhartiya: There may be a lot of new workload needs to scale. If you look at edge computing, a lot of people are moving. So, what is the role of monitoring in that landscape? And if that’s under Sysdig radar.

Payal Chakravarty: No. Edge is very interesting actually in some way because on edge, you want lightweight metric collectors, you can’t have anything heavyweight there. So actually, we’ve seen users where Prometheus exporters are great where they just emit a few metrics and then the Sysdig can collect that data remotely without disclosing a whole lot. We have customers who are looking into pretty large edge environments where they need very, very lightweight collectors and we are working with them specifically on solving their use cases where reliability is very important, reachability is very important. You have to be able to access those endpoints, but you cannot have anything heavyweight running there.

Swapnil Bhartiya: Payal thank you for taking your time out and talking to me and explaining the whole monitoring landscape.

Payal Chakravarty: Sure Swapnil, stay safe. Thank you, guys. Bye.



AWS Releases Open Source AutoML Library

Previous article

Google Opens Code Search For Angular, Dart, TensorFlow And More

Next article
Login/Sign up