In this episode of TFiR Let’s Talk, Swapnil Bhartiya sits down with Dmitry Petrov, Co-Founder and CEO of Iterative.ai, to discuss the recently released machine learning engineering management tool, MLEM. MLEM aims to bridge the gap between ML engineers and DevOps teams by using the Git-based approach that developers are familiar with.
In discussing the need to break silos between ML teams and DevOps teams, Petrov says, “We need to break this wall, and the way how it can be done is we need to give tools which are compatible with both software development stacks. But basically, we don’t want to manage two different stacks for software developers, for DevOps, and a different technology stack for AI vault, for machine learning engineers, for data scientists.”
Key highlights from this video interview are:
- MLEM is an open source Git-based tool needed for model management and deployment. It enables users to save their model and then deploy it to some deployment platform like SageMaker.
- Petrov discusses how MLEM is helping to build a model registry on top of the Git repository and better understanding models and helping the deployment system to deploy them. He goes into detail about these benefits and how MLEM is helping to solve these challenges.
- Petrov explains who MLEM by Iterative is targeting with the product, such as machine learning teams and DevOps teams, and others who are responsible for the life cycle of the models.
- MLEM aims to bridge the gap between machine learning engineers and DevOps teams. Petrov goes into the silos of ML teams building models that the DevOps teams then need to deal with. He explains why it is important to break down those walls and how MLEM is helping tackle these problems.
- Petrov explains the reasons why they decided to make MLEM compatible with the software development stack. He describes how it is enabling data scientists to speak the same language as DevOps by giving them a set of commands that simplify this process, while using best practices.
- MLEM is an open source product, already hosted on GitHub as the company feels strongly about sharing their products so that anyone can use them. Petrov explains how MLEM fits in with its other offerings in DVC Studio.
The summary of the show is written by Emily Nicholls.
Here is the automated and unedited transcript of the recording. Please note that the transcript has not been edited or reviewed.
Swapnil Bhartiya: Hi. This is your host, Swapnil Bhartiya and welcome to another episode of TFiR Let’s Talk. And today we have with us once again, Dmitry Petrov, Co-Founder and CEO of Iterative.ai. You folks recently released machine learning engineering management, MLEM. Tell us a bit about what it is.
Dmitry Petrov: Oh, that’s an open source tool that is needed for model deployment. It’s a way you can save your model and then deploy it to some deployment platform like SageMaker, Herod or a custom platform, that’s one part of functionality. Second part of functionality is a prerequisite, it’s a part of the model registry that you may have on top of your Git experience, model registry with Git repositories.
Swapnil Bhartiya: So talk about why you release it, what problem are you trying to solve?
Dmitry Petrov: Yeah. This particular problem… This particular product is about two part of machine learning deployment workflow. One part is a small registry, when we need your… You need to know what models you have. You need to know the information about models, it’s about metadata management around machine learning models. It’s needed for visibility, it’s needed for productionization. You need to answer a question like, “What is in production right now?” And you might not believe but majority of ML leaders cannot answer the question, “How many models you have in production right now?” You need a special type of automation and that’s what model registry can help with. And MLEM is a way how to build model registry on top of your Git repository.
We are saying, “Do you need a small registry as a separate SaaS tool somewhere in the cloud? Why don’t you build your model registry right on your repository, on your GitHub or GitLab? And they can become true for your model registry for statuses. What is in production? What is on stage and what set of models do you have?” And this is part of best engineering practices that people use for application development, people use GitOps, Gitflow for this, and that is what MLEM provides.
And the second part of Zoom functionality, it’s about the model itself. You need to know what is inside a model file, what libraries are needed for deploying this model? What versions of the… For example, if it [inaudible 00:02:47], you need to know what version of [inaudible 00:02:49] needed to deploy this model. And this tool helps you to save this information and when the model is ready for deployment, so your deployment system can extract this information and properly deploy and start showing the model. So it’s like two values of the product, one is a small registry on top of GitOps practices, and second is understanding models and helping deployment systems to deploy those.
Swapnil Bhartiya: Who are you targeting with this project and or product?
Dmitry Petrov: We are targeting machine learning teams and especially people who are responsible for life cycle of the models, for productionization of the model and visibility and management of this model. So in many cases, it’s ML ops or DevOps team. In other cases, it’s ML team because in general good practice to own your own models and life cycle of your own models.
Swapnil Bhartiya: There’s a lot of things that we preach. We talk about DevOps, the whole thing is shifting left. Things are moving towards developers or DevOps, but the fact is that there are still soft silos. You just talked about machine learning engineers. So talk about how you democratize it so that the wall between ML engineers and DevOps is getting shorter and shorter, trying to bridge the gap between two different disciples or teams.
Dmitry Petrov: Yeah. I think this is the biggest challenge in AI in general, because today many teams live on this silo when they have ML teams who build models, and then DevOps teams who need to deal with those models with data sets, with deployment, or maintaining the data artifacts people have. So that’s kind of… And today we see that there is a bridge between those teams and what is needed, what industry needs. We need to break this wall between the team and the way how it can be done is we need to give tools which are compatible with both software development stack. But basically we don’t want to manage two different stacks for software developers, for DevOps, and a different technology stack for AI vault, for machine learning engineers, for data scientists.
But right now this is what is happening, if you look at different AI platforms, they’re saying, “Okay, that’s a platform for AI engineers.” And they usually work and exist separately from the software development stack. Our vision is the platforms, the AI platforms have to be built on top of the software development stack. They have to use all the principles, all the best practices that I used in software development. What does it mean, it means to use a starter version to your code, at least code that you actually productionize or share, version of your data sets, version of your models and codify them properly in your repository, use Git as a source of truth for your model that goes to production at least, and use a regular CI/CD based deployment workflow instead of inventing your own on the AI side.
So we just want to reuse the same set of best practices for AI teams, so they can speak the same language, the DevOps folks and machine learning folks. And in this sense, you can unify the processes. You can simplify your organization and get [inaudible 00:06:42] faster from your ML models.
Swapnil Bhartiya: Excellent. Can you also talk about what is the place, what is the role of MLEM for a Git based ML model registry?
Dmitry Petrov: If you’re talking what is specific about MLEM and how it’s related to Git? That’s the few design decisions we made to make it compatible with the software development stack. First, when we extract information about models, we put all the models… We do simple things, we put all the information in meta files in a text… A human readable meta file, which people can understand and downstream systems can pick up. So that’s a principle of codification, principle of infrastructure as a code that we use for extracted meta information for your models. The second principle is for statuses of your models, what is in production? What is in development? What is staging? “Oh, we use Git-techs, we don’t use a separate set of IPiSS as many other model registries.”
We are saying, “If you put some model on production, just create a tech, a Git-tech, and this is the way you can notify your downstream systems, which is usually your CI/CD, about deployment, about the need that this model needs to be deployment.” That’s the best practice in software development. Why don’t we use… Why don’t we deploy models? Absolutely, the same way. This is how we can make data scientists speak the same language with DevOps. And we simplify this, we are saying there is a set of commands that simplify you to do this, to use the best practices.
Swapnil Bhartiya: Excellent. And this is an open source product which is already hosted on GitHub. How is [inaudible 00:08:41]… Once again, I want to understand the commercial aspect of it. How are you offering it or what kind of relationship is there?
Dmitry Petrov: Yeah. So what I described is open source, you can go and use a project. On top of the project, and this is our general strategy, if something is needed for an individual, for a team, it has to be open source. It should be available, everyone can use this. What we built as a company for companies, for clients, we built a collaboration layer on top, which is called DVC Studio. And also we build a management part, data management part as a part of this studio experience. So what this means, it means with commercial offering with the studio, you can have an overall picture on your organizations, not just which model in this repository is in production, which is on the station, but you can look at the picture on organization wide.
What kind of models do we have today? It might be hundreds of models with dozens and hundreds of repositories. How many models do we have in production? Again, those might be different repositories, different products from different people, different organizations. And what we help you, we help you to build these dashboards and nice UI at the cross organizational level, which is needed for big companies. It’s probably not needed for a team of five data scientists and for DevOps engineers for a small startup setting. But on a company level that’s a special part, that’s needed for efficient collaboration.