CloudDevOpsFeaturedLet's Talk

Rookout Live Logger Optimizes Logging Cost And Performance

0

Guest: Oded Keret
Company: Rookout
Show: Let’s Talk

Swapnil Bhartiya chats with Oded Keret, Director of Product Management at Rookout.

Rookout is a developer-first platform for collecting and troubleshooting problems for production and cloud-native applications. The company provides Live Debugger that makes it possible for developers to fetch data with the click of a button.

One subject that is crucial to cloud-native developers is debugging. Traditional debugging has been around since the dawn of software engineering. To that point, Oded says, “When we used to think about traditional debugging, we either used to think about printing log lines or about setting a breakpoint that stops the application.”  Loglines and breakpoints work fine, so long as the application in question is running locally. But, “in cloud-native applications, in remote applications, in very dynamic and elastic applications, setting a breakpoint and stopping the application is just not something that you can do,” adds Oded.

He then shifts the debugging conversation to modern applications to say, “When a problem happens in modern applications, the expectation is to be able to solve it within minutes. And you don’t have the hours we used to have for trying to reproduce the issue locally and trying to add a log line and waiting for another release.” Unlike the past, where there was a clear separation between developers and Q&A, the transition to DevOps has blurred those lines. Software engineers are now expected to ensure an application is running 24×7.

To simplify the process, Rookout developed Live Logger. Rookout’s bytecode manipulation gives them the ability to easily switch on debug log lines. So instead of an application running only in warning level (such that only lines of warning or level 0 are printed to save costs), it’s now possible to switch on log lines dynamically all-around an application. With this, it’s possible to expose logging and debugging data that was otherwise hidden. And to avoid rising costs, Live Logger makes it possible to dynamically (and with high granularity) filter those log lines that are enabled. According to Oded, “We provide the ability to only turn on, for example, log lines that contain a specific string. Or to only turn on log lines that pertain to a specific user or account or service by integrating with existing tracing tools.”

With Live Logger, it’s now possible to pinpoint a specific line from a log, so developers are empowered to more quickly and reliably locate problems. Another benefit is that users no longer have to worry so much about the costs associated with traditional logging (space consumed as well as CPU, RAM, and I/O resources used). And Live Logger also integrates with tools like Log4j, Winston, and log4net.

Summary for this interview/discussion was written by Jack Wallen


Here is the edited transcript of the interview.

Swapnil Bhartiya: I’m your host Swapnil Bhartiya and welcome to TFiR Newsroom. Rookout has announced Live Logger to complement their existing Live Debugger for dynamic observability into modern applications. To talk about this new product, we have with us, Oded Keret, Director of Product Management at Rookout. Oded, it is great to have you on the show.

Oded Keret: Hello! Thank you for having me.

Swapnil Bhartiya: Before we talk about this new announcement, please quickly remind our viewers what is Rookout all about? And how are you leveraging Rook?

Oded Keret: Rookout is a developer-first platform for collecting data and troubleshooting problems in production and cloud-native applications. We provide a Live Debugger application that lets developers fetch data with the click of a button, even from applications that are considered far and even dangerous, at times.

Swapnil Bhartiya: I also want to go a bit deeper into debugging. As much as we like to create new applications, new services, and new products, what is more, important is business continuity. So talk about the importance of debugging for modern workloads. Also, debugging is something that has been around forever. So there is traditional debugging and because of cloud-native and all those things, we also have modern debugging. So talk about its importance, also talk about the difference between traditional and modern debugging, Rookout’s approach, and how you help even the legacy workloads.

Oded Keret: Sure thing. So, as you said, traditional debugging has been around since the dawn of software engineering. And when we used to think about traditional debugging, we either used to think about printing log lines or about setting a breakpoint that stops the application. And those are things that would work as long as the application was running on my own desktop. In cloud-native applications, in remote applications, in very dynamic and elastic applications, setting a breakpoint and stopping the application is just not something that you can do. And printing a log line and trying to fetch it from hundreds or thousands of Kubernetes clusters is not something that is expected of an engineer to do, definitely not in real time.

And when a problem happens in modern applications, the expectation is to be able to solve it within minutes. And you don’t have the hours we used to have for trying to reproduce the issue locally and trying to add a log line and waiting for another release. So modern debugging comes to fix all of that. Modern de-bugging is about getting the old experience, the same experience that an engineer is used to getting, of setting a breakpoint and getting data. But we also provide the ability to do that with a remote, within an elastic application, where the Rookout SDK is deployed in hundreds of dynamically created pods. And when you set a breakpoint, you immediately start getting data through your application. So you get the user experience of traditional debugging, but the power of applying it to modern and complex applications.

Swapnil Bhartiya: Excellent! Now let’s just quickly talk about the cultural or people or team aspect. Considering the changing role of developers in today’s digital world, where does the buck for debugging stop? Which team is responsible for it? Or is it the entire organization’s responsibility?

Oded Keret: I grew up in a software engineering world where there was a very clear separation between developers and Q&A and Ops. And the transition into DevOps has blurred that line. And software engineers who developed the application are more than ever expected to be on call and to make sure that their app is running 24×7. So when a problem happens in a running application, yes, sometimes the first response will be the Ops team, the IT teams, just monitoring and making sure that we know when something happens.

But the actual solving of the problem, the actual reaching the root cause and understanding what change has caused the problem, is more and more a problem of the software engineer who developed the application. So that means that software engineers need to take into account that their code is going to run in production, that they are going to have to be able to troubleshoot it. And that when the PagerDuty wakes them up at 2:00 AM in the middle of the night on the weekend, they will have to be able to instantly reach the line of code that is causing a problem, to find the fix for it, and to deploy it as soon as possible.

Swapnil Bhartiya: There are certain terms that Rookout loves to use,  for instance, dynamic observability and understandability.. So can you talk about what all these terms mean? Are there really different disciplines for each term? Or are we just using different terms for different teams? 

Oded Keret: Thank you for asking that. So observability used to mean being able to see what’s going on. Being able to have the right log line printed, have the right metric printed, have the right distributed tracing view graph generated and made available to engineers, and being able to see what’s happening in the level of log lines, is an important and challenging test. Understandability takes that challenge one step further. Understandability means not just seeing what happened, but also understanding how my application behaves. Understanding how my code behaves in real time. Being able to actually see my code in motion and not just see the aftereffect of my code running, namely the printed log lines and the collected metrics.

Swapnil Bhartiya: Now, let’s talk about the elephant in the room, which is the announcement of Rookout’s latest product Live Logger. What is it? And let’s also kind of compare it to other logging tools offered by the company. So talk about what it is, why you created it, what kind of unique problem it’s going to solve for your users?

Oded Keret: So Live Logger, the idea for it came out of conversations that we had with customers as we were pitching our classic offering the Live Debugger. And very often, these conversations move into logging challenges. The customer will speak about how much they are spending, how much money they are spending on logs. Or how they are worried about the performance impact of adding a new log line. Or how much time it takes them to expose new logs, not to mention existing logs, which are already written in the code, but in order to switch them on and off, it will require a restart of the whole application. And that is something that you just cannot do, especially when you’re handling a production incident.

And from those insights, we thought we could take our inherent ability of bytecode manipulation and change the way the code runs and add prints into a live application. And we could take that one step forward, and rather than doing it for a single line of code by adding a single line of log where a breakpoint is set, we could do that in a variable burst way into every single existing line of code that already prints a log line. In modern applications, very often the DevOps engineers will say, “This application is running in warning level and only log lines of warning or 0 level are printed because we want to save costs, because we want to reduce the performance impact.”

But Rookout’s bytecode manipulation ability actually gives us the ability to switch on debug log lines. To switch on trace log lines, to switch on log lines dynamically all around the application and expose data that was hidden so far, but is now needed in order to troubleshoot a problem. Now, the second question that a customer would ask us, “Well, now you’re going to cost me a lot of money. Now you’re going to impact performance because every single debugging trace log line is going to be printed.” And to address that we also added the ability to very dynamically and in high granularity, filter which log lines will be turned on.

We provide the ability to only turn on, for example, log lines that contain a specific string. Or to only turn on log lines that retain a specific user or account or service by integrating with existing tracing tools. So we provide the ability to instantly switch on, turn on the light, all over your application, without further hurting your application.

Swapnil Bhartiya: Since you already have a lot of tools in your arsenal, if I may use it as a word, how does it compliment your existing tools? Live Debugger is a great example. So, talk about it. How does it integrate with your whole stack and how it helps and compliments these tools?

Oded Keret: So we saw when working with customers that Live Debugger is something that will help you pinpoint a specific line of log. And it will be very effective when you already know where the problem is. If you are the engineer who wrote the code and are familiar with the code and can set a breakpoint at the right place, it will add a log line that was missing before that, and it will help you fetch the data you need. But in very advanced applications, in modern applications, more often than not, you only write one of dozens or hundreds of microservices, and you do not know where to even start the search.

In those cases, just switching everything on and seeing what looks suspicious is a much more robust and helpful way for engineers to start collecting data. You look at the logs in a very familiar log tail user interface. You see where you think the problem is, and then you zoom in on that. So  it gives more of a macro approach compared to the micro approach provided by our Live Debugger and you start by searching the macro, then you zoom into the micro. So the tools really compliment each other and help you solve both parts of the search for the problem.

Swapnil Bhartiya: Right. You alluded to this earlier, but I want to just reiterate that. Can you talk about  traditional logging challenges that you are trying to address?

Oded Keret: So traditional logging means that you have to worry about logging cost. We speak to customers who know exactly how many megabytes, or gigabytes, or terabytes, or petabytes they are paying for and take tremendous efforts to invest entire sprints in reducing logging costs, in deleting logs, in hiding logs, that would have helped them because logging is expensive and it’s getting more expensive as applications scale up. Logging has also always had a performance impact. When my application is busy printing a log line and filtering it and shipping it to my logging service, it consumes CPU and memory and IO resources that could otherwise be invested in serving my customers. So these are traditional logging challenges that we are trying to address.

The third traditional logging challenge that we are trying to address is just that you can print everything. You can write everything. You can write very detailed log lines with every single variable printed just in case something happens. But then when something happens, you end up searching through a very seemingly infinite number of nearly identical log lines. And being able to find that needle in the haystack has always been a challenge of logging. And we’re trying to address that by providing a more dynamic and flexible and robust approach to only printing the log lines that are needed. So, when you do need them, you can quickly and easily search through them and find the data point that you need.

Swapnil Bhartiya: Now, when we do look at debugging, the thing is that we are looking at systems that are always running. You don’t shut the system down. So is it also possible to manipulate code on the fly while the code is still running and add locks and other content there so that your workload is not slowing down and developers can go ahead and continue to do that?

Oded Keret: I may have mentioned earlier because we love speaking about the Rookout Live Debugger, our traditional tool, which uses Bytecode manipulation to dynamically change the code that is actually running while a developer’s application is running. We use that to create new log lines. Live Logger uses a similar technology, but it also adds an integration with existing logger services. We integrate with tools like Log4j, Winston, and log4net in order to understand where the other log lines are going to be printed. And we use the same bytecode manipulation technology to instrument these tools in real time and make sure that they start printing new log lines on the fly. So that allows us to change how the application runs, how the application prints logs, without requiring the developer to change code and restart the application and wait for the new data to flow.

Swapnil Bhartiya: Oded, thank you so much for talking about not only observability and understandability, debugging, live logging, but also the holistic approach it. And of course, the new tool that you folks are announcing. Thank you for your time. And I would love to have you again on the show. Thank you.

Oded Keret: Thank you for having me. It’s been a pleasure.

Don't miss out great stories, subscribe to our newsletter.

Login/Sign up