Cloud Native ComputingDevelopersDevOpsFeaturedLet's TalkSREsVideo

Former Slack and Netflix Engineer Builds Jeli.io for Incident Analysis

0

Guest: Nora Jones (LinkedIn, Twitter)
Company: Jeli.io
Filed in: SRE, Chaos Engineering, Cloud Native Computing

Jeli is an end-to-end incident management platform that covers the incident process from initial identification to post-mortem review. While tools like Slack and Zoom are frequently used by companies for incident response, they were not built for that purpose. It can be difficult to bring in the right people and coordinate the response.

In this episode of TFiR Let’s Talk, Swapnil Bhartiya sits down with Nora Jones, Founder and CEO of Jeli.io, to discuss why she created the incident response analysis platform and what makes it unique. She goes in depth about some of the challenges companies are facing with regard to incident response.

Key highlights from this video interview are:

  • Jones has spent a lot of her career in SRE roles at Slack, Netflix and startups like jet.com. She says many of these companies were scaling quickly with lots of users, but communication and coordination during incidents were not being analyzed fully due to lack of tooling. Jeli was created to help get all the data from incidents and to help create a psychologically safe workspace.
  • Jeli’s incident response bot allows you to select stages of your incident: establishing that there has been an incident, diagnosing, repairing, and mitigating. While Zoom and Slack are often used for incident response, they were not built for it.
  • Jones announced that they have a free incident response bot, which can be used within Slack. It helps you bring the right people into the room. Furthermore, their incident analysis platform helps companies find the stories behind the incidents to make things easier in the future.
  • Other tools on the market often sell a process without fully understanding what an SRE was going on in these incidents, or the organization. Without taking the time to understand the process, you can end up with the same issues again.
  • Having an incident response strategy in place is essential; otherwise, companies risk setting themselves up for failure. Jones says many companies wait until they have an incident or get in the headlines for the wrong reasons to do something, which is too late.
  • When you are in the midst of an emergency, having a bot that is simple to use is critical. Jones explains that many bots on the market require too much setup. Jeli’s bot is free to use, helping to broadcast out to people that an incident is occurring and to assist with communication throughout the incident. Jones explains how the bot and the Jeli analysis platform work together.
  • Customers are using Jeli in a variety of ways. Jones tells us that one customer ingested every incident they had had in a particular quarter into a single investigation to see how much individual people were using a particular technology. Jeli also takes all the places where people were talking about an incident and puts them in one display so that customers can understand the coordination costs of the incident. She also discusses how the narrative builder helps people better understand incidents.
  • Jones feels the market is ready for incident tooling and that there is a need for what they are building.

The summary of the show is written by Emily Nicholls.