In this episode of TFiR Let’s Talk, Dmitry Petrov, Co-Founder and CEO of Iterative.ai, sits down with Swapnil Bhartiya to discuss the recently released DVC Extension for Visual Studio Code. The open source project by Iterative aims to bring a full machine learning (ML) experimentation platform to Visual Studio Code (VS Code).
Although there are currently experiment tracking tools available, they are separate services. Whereas, Iterative aims to bring everything together into a one-stop shop. The DVC extension enables data scientists to manage their data, run and track experiments, create plots, and view metrics from their IDE.
“The purpose of the extension is to provide for experiment tracking right on your code editor for VS Code,” adds Petrov.
Key highlights from this video interview are:
- There are currently experiment tracking tools available, both open source and proprietary, but they are a separate service. Petrov explains why having the DVC Extension for VS Code improves the development experience. He takes Swapnil through the key features of the open source extension.
- Petrov believes that having a shorter feedback loop and being able to choose the development environment, whether in the cloud or local machine, help create a great experiment tracking experience. He takes a deep dive into the main reasons why he feels the extension was needed.
- Petrov discusses the benefits of DVC extension for data scientists and the value it brings, such as a quick feedback loop, and having a system that automatically tracks all the experiments automatically.
The summary of the show is written by Emily Nicholls.
Here is the automated and unedited transcript of the recording. Please note that the transcript has not been edited or reviewed.
Swapnil Bhartiya: Hi, this is your host, Swapnil Bhartiya, and welcome to another episode of TFiR Let’s Talk. And today we have with us, once again, Dmitry Petrov, co-founder and CEO of Iterative. And today we are going to talk about the DVC extension for Visual Studio code that is now available on VS Code Marketplace. It’s an open source project by Iterative. Dmitry, it’s good to have you on the show.
Dmitry Petrov: Oh, thank you. Happy to be here.
Swapnil Bhartiya: First of all, tell us a bit about DVC Extension for Visual Studio code, what it is, and where it’s available, and why are you launching it now?
Dmitry Petrov: The DVC Extension for Visual Studio Code is an extension for the code editor, right? If you are a VS Code user, you can install VS Code, and add the extension. It’s open source. In addition to the extension that you can install for free, you can find a source code, how we build the extension.
The purpose of the extension is to give you experiment, track, and experience right on your code editor for VS Code. So today you’re probably familiar with multiple different experiment tracking tools, some open source, some proprietary, but that’s a very common approach to have. When you do experimentation, you use experiment tracking to understand many experiments you run, like what metrics you got, how metrics are growing right now when the experiment is running. So is it time to stop it? I still need to wait and waste … and spend your GPU resources on this.
So a lot of questions that experiment tracking tools can answer. And today there are a bunch of experiment tracking tools that you can deploy and use, or use a SaaS in the cloud. What our biggest question is, do you really need a special separate service for that? Why don’t you have this experience right on your ID, right on your … like next to your code editor? So, and together with the code, you will have all the experience with hundreds of experiments that you run, all the metrics, all the beautiful graphs, and everything right on your development experience. So that’s the motivation for this, to improve development experience, to improve the way people work with model development.
A full VS Code native experimentation platform. Control data sets, run experiments, view metrics, create plots, and much more. It’s a one stop shop for everything relating to your ML experiments, all in one place in your IDE. The DVC extension uses Command Palette, reducing the need to learn syntax by heart.
Want to run a new experiment, or pull data from your remote repository? Give the word in the Command Palette, and the DVC extension will guide you all the way. Manage parameters, and compare both metrics and plots for different models. Easier analysis of your experiments, and finding the best model.
See at a glance which data sets and models have been changed with the DVC Tracked Explorer, and navigate through all the files contained in your DVC project. On top of Git version control, source control management lets you manage data sets and DVC track models. See artifact changes, and synchronize versions with your remote repository. Use checkout, commit, add, push, and pull straight from the interface.
Swapnil Bhartiya: So basically you allow them to run machine learning experiments in the VS Code, without needing any external services. It’s like fully self-contained, is that true?
Dmitry Petrov: Absolutely, yeah. So you install VS Code, you install the extension, you run your code, everything is here, no external services. That’s the beauty of this tool. At the same time, you have all this great visual experience, right on your machine.
Swapnil Bhartiya: What’s the problem that you’re trying to solve? Why do you feel that, “Hey, you know what? Do we need to offer that?”
Dmitry Petrov: There are two reasons. First of all, you need to have this experience [inaudible 00:04:17] to you. It’s important to have a shorter feedback loop. It’s important not to have a similar experience with a coding ID and the similar design, similar experience of experience tracking. So it’s all about the best developer experience, how to provide the best UI for developers.
And when I say about local experience, it does not necessarily mean like your local machine, your laptop, right? Because in many cases people work with the clouds, but this is the beauty of VS Code, because VS Code can work in a cloud through a web browser. And many companies, many users use VS Code in this way, like through web browsers. And we are working with some clients, and especially it’s true about enterprises, they’re saying, “We don’t want to give any piece of data to a local machine. We want to work 100% on cloud when it comes down to data.” And this is Jupyter Notebooks and VS Code extension is a way to do this, right? You can have a work bank somewhere with a Jupyter lab, with VS Code extension, and people can pick and choose which development environment they prefer, and work with this in the cloud, with all the great experiment tracking experience, without additional services.
Swapnil Bhartiya: So what value or what benefit is there for data scientists through this extension?
Dmitry Petrov: Yeah. So the experience in general is pretty much the same as many other experiment tracking tools, right? When you run your experiment, you’ll see your metrics live. What is happening with your, let’s say, loss function? What is your accuracy right now, after like one minutes of training? Because maybe your accuracy is not growing fast enough, you can cancel your experiments, improve the model, or maybe fix some bugs, change the data set, or whatever. And you need this feedback loop very quick. You need not just numbers, you need a kind of image in the plot, dynamic, to see what is happening. That’s like one value of the experiment track.
And the second value is when you run, sometimes, like dozens of experiments a day, in some cases, even hundreds and even thousands experiments a day, you need to remember what exactly you have done today morning, or last week. And you need to find this experiment and say, “Okay, this experiment was good for that metrics. And very likely it happened because my hyper parameter had that head this way, right? You need to remember, you need kind of like a bookkeeping system, what you have done, what works, or what worked well, or whatnot. And this is a good way of replacing your pen and pencil. This is what I’d done before, when I was in academia 10 years ago.
And now you have a system which tracks all the experiments automatically. What we are doing special here is, we are bringing this experience to your VS Code code, to your ID first, and second, we make those experiments reproducible, fully reproducible. Because there is a DVC under the hood. There is a Git under the hood, and when you run new experiments, we are not just taking your hyper parameters, and metrics, and life metrics. We’re also tracking your code with the Git. And when you need to get back to your previous experiments from the previous week, for example, you know the state of hyper parameters you used, and you know exactly what code change you made. Sometimes, there’s like no connection between hyper parameters and code, but we make the connection, using Git and DVC under the hood.
Swapnil Bhartiya: Dmitry, thank you so much for taking time out today, and talk to me about this project. And as usual, I would love to have you back on the show. Thank you.
Dmitry Petrov: Thank you. It was a pleasure talking to you.