Cloud Native ComputingDevelopersDevOpsFeaturedLet's TalkSecuritySREsVideo

Cortex Helps SREs Gain Visibility Into Microservices For Better Decision Making | Anish Dhar

0

Cortex has set out on a mission to tackle the challenges engineering teams and SREs face with microservices. Companies are struggling to keep a track of their services and teams leading to duplication of work and making it more difficult for engineers to gain the visibility they need to deliver high-quality software.

“Cortex helps enforce best practices for reliability and security to give engineers good visibility into what does actually a good quality service look like, which I think often lives in people’s heads,” says Anish Dhar, Co-Founder and CEO of Cortex, on this episode of TFiR Let’s Talk with Swapnil Bhartiya at KubeCon and CloudNativeCon in Valencia, Spain.

Key highlights of this video interview are:

  • Companies of all sizes are struggling to gain visibility into their services and which teams own which services. Dhar describes the challenges companies are facing with microservices and the effect it has on engineering teams.
  • Dhar discusses the reasons that inspired him to start Cortex and what stage the relatively new company is at right now.
  • Reliability and security remain top priorities for companies across multiple industries and Cortex is helping them enforce best practices to ensure these priorities are met. Dhar explains how their product, Scorecards, is helping to grade the quality of their services across all the tooling and infrastructure.
  • Dhar discusses the kind of adoption they are seeing with Cortex. He details some of the challenges their customers are experiencing and how their product is helping to solve these difficulties.
  • Cortex’s platform aims to help engineering teams understand and improve their services. Dhar discusses the platform’s features in detail, and the different versions of the product. He explains how the different components ensure that standards are met from day one.
  • Dhar discusses the cultural shifts he has seen between silos and microservices. He explains the challenges SREs face with keeping a track of services and understanding ownership and the impact it has on the culture of the engineering organization. He discusses how Cortex bridges these gaps.
  • Finding the balance between how quickly you can ship out new features versus technical debt can be problematic. Dhar feels that technical debt is often overlooked. He explains how Cortex works to track technical debt to help leaders gain good visibility to enable them to make educated decisions.
  • Dhar shares best practices engineering teams can put in place to help them better keep a track of their services.
  • Dhar details Cortex’s roadmap, what features they are working on and how it will help engineers gain insights about the services they are consuming, the infrastructure and associated costs.

Solutions:
Check out more about Cortex Service Catalog
Check out Cortex Scorecard

Connect with Anish Dhar (LinkedIn, Twitter)

The summary of the show is written by Emily Nicholls.

[expander_maker]

Here is the automated and unedited transcript of the recording. Please note that the transcript has not been edited or reviewed. 

Swapnil Bhartiya: Hi. This is your host, Swapnil Bhartiya. Welcome to another episode of TFiR Let’s Talk here at KubeCon and CloudeNativeCon in Valencia, Spain. Today, we have with us Anish Dhar, co-founder and CEO of Cortex. Anish, it’s great to have you on the show.

Anish Dhar: Thanks for having me.

Swapnil Bhartiya: First of all, we are doing it in person. We are at the event. I’m pretty sure that you have been to keynotes, you have seen the booths, you have sponsored [inaudible 00:00:25]. What kind of energy have you seen?

Anish Dhar: Oh, it’s been amazing. I think being in person, it’s just a different energy level altogether. I think my favorite part from yesterday was seeing customers that we had never met across the world come up to our booth and just say hi. I think you can really feel the energy from everyone just meeting different companies and kind of getting back together in person. This is my first KubeCon, so I think I’m pretty lucky that it’s in person and that I get to see people face-to-face, which I think is really exciting.

Swapnil Bhartiya: And because if I’m not wrong, the company was created during the pandemic, right?

Anish Dhar: Exactly.

Swapnil Bhartiya: So there was no in-person interaction either way. Let’s talk about the company, you are a co-founder. Tell us why you created the company? What problem did you see that you wanted to solve? And also, you started the company in kind of rough times, so you decided, “No, this is the right time to do it.”

Anish Dhar: Yeah, absolutely. I was previously at Uber, and Uber is like the classic case of microservices gone wrong. I think on my team itself, there were 500 services in over 5,000 at the company. And it became really difficult to understand which teams owned which services and whether those services were following best practices and things like production readiness and security standards. And I remember we would track a lot of this data in spreadsheets, which would always go out of date when you needed them. And engineers love naming services after TV shows and all sorts of crazy things. And so when there were incidents, it was really difficult to triage and find the appropriate owner. And oftentimes, the engineers who wrote the most critical services would leave the company. And so they take that tribal knowledge with them. And I think what we saw is that it hurts developer productivity in a lot of different ways from new engineer onboarding, because it becomes difficult to understand, “Hey, what’s being built? And what are the downstream dependencies on the services that I’m working with?”, to on call response, to even just building new services.

There was so much duplication of work across the company because people didn’t really have a good understanding of what was going on. And I was talking with two close friends of mine. One was an engineer at Twilio and the other at a smaller startup called LendUp. And we came from three very different sizes of companies, but everyone had the same issues around understanding services and tracking their quality. And so we started Cortex, we went through the Winter 20 Y Combinator batch. And then coming out of that, raised our seed from Sequoia. And most recently, just raised our A co-led by them and Tiger Global. So it’s been super exciting. We were remote first, just like you said, because of the pandemic. But I think it’s been an amazing journey so far. And I think we’ve grown the team a ton too.

Swapnil Bhartiya: What kind of adoption have you seen of… Also, as you said, this is the first time you met a lot of customers or clients. But when I talk about adoption, I’m not talking about necessarily numbers, but use cases.

Anish Dhar: Absolutely. I think if you look across our customer base, we have companies across multiple different industries, but the common theme is these customers care a lot about reliability and security. Most recently we started working with Roblox who bought Cortex really to help, one, give them understanding of what are all the services that are being built in the company. But two, ensure that even 1% of downtime for them results in millions and millions of dollars in losses. So I think Cortex is really brought to help enforce those best practices, and reliability, and security, and give engineers good visibility into what does actually a good quality service look like, which I think often lives in people’s heads.

But we even have customers like StockX, who… We have this product called Scorecards where you can basically grade the quality of your services across all your tooling and infrastructure. And StockX basically has this workflow where they’ll make an API call to our Scorecards product. And if you’re not meeting 80% on your production readiness scorecard, you can’t even deploy your service. So I think there’s some really interesting use cases that we’ve started to see with Cortex, but the real common theme is we need to create that culture of reliability and ownership.

Swapnil Bhartiya: You used a couple of words, you talked about culture. You also talked about Scorecard. I want to break it down. First of all, if you look at Cortex, are you a service company or you’re a product company? Where does it run? Talk about what does your service or product look like?

Anish Dhar: Yeah, absolutely. Cortex basically it’s a platform for engineering teams to understand and improve their services. So we have both cloud and self-hosted versions of the product. Oftentimes, for security-focused companies or financial companies, they run the on-prem version, which is a helm chart you can just deploy in any Kubernetes cluster. We built the product on GCP. And I think we are lucky that we have a pretty sizeable engineering team now who continues to improve the product. And we have the core of it as a service catalog. So it catalogs your services and builds that single pane of glass to understand everything from who’s on call for the service to where’s the documentation for it, to where the Datadog SLOs?

And then you can actually do interesting things with that data once it’s inside Cortex. You can basically grade its quality using Scorecards. And then we also have a scaffold or product that lets you create services directly from Cortex. And so you can templatize, for example, Spring Boot services. And engineers just fill out a form and Cortex will create the service end-to-end, from creating the Git repo, to running the templating engine, to scaffolding the code. So it’s a great way to make sure standards are being met from day one.

Swapnil Bhartiya: Excellent. And the second part is also that I want to understand the cultural part. The cultural part is, I mean, of course, this is all about culture, right?

Anish Dhar: Yeah.

Swapnil Bhartiya: You talk about reliability. And then in the early days, we used to have silos, right?

Anish Dhar: Yeah.

Swapnil Bhartiya: And the whole idea of cloud native or DevOps movement was to break those old silos. But when we look today, we are kind of creating new silos. It depends on somebody will call it federated, but they are still silos. It’s based on the expertise you have.

Anish Dhar: Absolutely.

Swapnil Bhartiya: But in the end you are trying to solve one basic problem for a business, which is to keep running the business, to keep adding new services. Everything else is secondary. So, if I look at just purely from Cortex’s point of view, what kind of cultural shift you have seen? Or second is, how you’re becoming a catalyst to enable that cultural shift?

Anish Dhar: Yeah, that’s a great question. I think microservices force teams into silos and teams have context on what they’re building, but they don’t have context on what other people are building across the company. And what we’ve seen is… Take a 100% engineering team, that’s around the first time you’ll hire an SRE or spin off a team to focus on reliability. And the goal of that team is, let’s make sure our services are reliable and that people are following best practices. But that ends up happening is these teams create these crazy spreadsheets, tracking all of the services. I’ve seen like 500 rows, 15 columns. Each column is a different tool. And ownership is very difficult to understand because these SREs aren’t the ones building the services, but they still are now tasked with this almost impossible thing to make sure everyone’s services are reliable.

And you ping engineers over Slack telling them that their service sucks. And it creates this emotional tension between SRE, product engineering, security, who has the same emotional pain. And then technical leadership sits on top of all of that and is asking questions like, “Where are the areas of risk in my business?” So all of that I think gets exponentially worse as the company grows. I think it really impacts the culture of your engineering organization. So I think that’s really the value prop of Cortex is, let’s build a place where all these different functional groups we can bridge the gap between them and have one centralized place where you can understand exactly what’s being built, who’s building it, and the quality of that software, which is really why we call Cortex the system of record for engineering.

IT had ServiceNow, sales has Salesforce, but engineering has never had a system of record to do exactly that, to bridge the gap between all these functional groups, which is now just the way you build an engineering organization, is people start specializing. So I think it’s been really interesting to see how companies adopt that model. And that’s why I think companies that have been working with us now close to a year, you start seeing really interesting shift happen where now engineers actually care about the quality of their services. I mean, they always cared, but they didn’t always have that visibility. And I think Cortex creates this almost gamification of service quality, bridging that gap, and really lifting this burden off SREs and security, engineers especially.

Swapnil Bhartiya: One more thing you mentioned earlier was tribal knowledge with teams. And then, of course, which also leads to kind of technical debt and then knowledge get blocked. How does you help teams to mitigate or eliminate… I mean, tribal knowledge is something which is hard, but also technical debt that you incur because you’re doing so much within one team. So, can you talk about, first of all, how important or critical that is for a success for a company? And how do you help mitigate that?

Anish Dhar: I think it’s one of the most important things that often gets overlooked is technical debt. I mean, we think about it all the time when we’re thinking about shipping new features. There’s that balance between how fast do you ship versus how much technical debt you take on? And I think technical debt is one of those things that you don’t have to think about it until it’s too late. And I think we’ve seen that story play out, like myself and my co-founders in our previous companies. And I think it’s something that I think Cortex really helps with. And one of the Scorecards people often create are things tracking technical debt. And that could be something as simple as code coverage, like you want to make sure all your code that you’re shipping has maybe over 90% test coverage.

And so that’s something you can actually track with Cortex and get reports that break down it by service, by team, by product, by business line. So now suddenly leadership can get good visibility into, “Hey, these X teams are doing well. These other teams need additional resources.” I think you can make really educated decisions, where before it’s like, I have to go and basically ask every single engineer, or go and look at GitHub myself, or look at SonarQube. It’s not as easy to get those aggregated reports, which I think Cortex helps with.

Swapnil Bhartiya: Excellent. One more thing before we wrap this up is also, of course, you have solutions. But most of these things, as we talked about, is also cultural and process based. So of course, I’m not asking you to share a playbook, but what are the things you would advise that teams should follow so ensure that at least they can take care of some of these problems that we just talked about?

Anish Dhar: Yeah. I think it comes down to communication. And it sounds simple, but I think a lot of teams sometimes under communicate. And especially in a world where everyone is remote, I think you start seeing documentation scattered across six or seven different places, right? Like Google Docs, Confluence, your notes app, your head. I think information becomes siloed in all these different places. And then when engineers go out to build new services, they don’t have full context on even what the service they’re using is, or what are the dependencies of that service? So I think that impacts the quality of that software, but then also makes it hard. Like if that service goes down, who do I talk to? Where’s the documentation for it? Things like that. So I would say that I think building a healthy culture of over communicating, having a really good process around where to document things. And then just do a lot of retros thinking like, “Hey, the things that we shift, are they actually meeting the quality centers for our team?” I think all of those things would really help.

Swapnil Bhartiya: So, we are almost in the middle of 2021… No, it’s 2022, right?

Anish Dhar: Yeah.

Swapnil Bhartiya: It’s hard to say which year we are in, but I think we’ll get used to it. But in the middle of it, so company is relatively new. But can you also share, not exactly your roadmap, but one of the things in the pipeline, one of the things you’re working on, what are the problems that you’re looking to solve in future?

Anish Dhar: Yeah, absolutely. I think one of the interesting things we found about Cortex is that we started the company to help engineering teams with microservices, but people put everything in Cortex. They put resources, databases, S3 buckets, libraries, pipelines. At the end of the day, all of these things are pieces of your software that have their own owner and come with their own set of tribal knowledge. So one of the things that we’re going to be supporting in Cortex soon is this concept of a resource catalog where you’ll be able to connect your AWS account, for example, and we’ll automatically ingest and pull in and link services with S3 buckets or resources in AWS. And then actually be able to visualize that in your dependency graph, and maybe even layer cost insights on top of that.

So you’ll be able to see, “Hey, this pool of services and all of the infrastructure around it is costing me X dollars per month.” And then reporting that by team, product, business. So I think it’s going to give incredible insight to engineers about the services that they’re consuming, but what’s the infrastructure look like? And then I think give businesses a really good insight into what are things actually costing.

Swapnil Bhartiya: Anish, thank you so much for taking time out today and, of course, talk to me about the company. It’s good to hear the story as well, and also the larger problem you’re supposed to solving. And as usual, I would love to have you back on the show in person or remotely. So I’m looking forward to our next conversation.

Anish Dhar: Absolutely. Thank you so much.

[/expander_maker]

Don't miss out great stories, subscribe to our newsletter.

Top Global Systems To Deploy Servers Built With NVIDIA Grace, Grace Hopper

Previous article

New ‘OpenCost’ Open Source Project Helps With Kubernetes Cost Monitoring And Optimization

Next article
Login/Sign up