If it weren’t for data scientists and their proclivity for experimentation, we likely would not have some of the most innovative technological solutions available today. Predictive analytics, the Internet of Things, and even aspects of AI itself are the results of data scientists asking questions, probing for answers, and developing practical solutions using insights gleaned from structured and unstructured data.
But many times, those experiments stay right where they began–in a lab, separate from the developers and operations managers who can bring data scientists’ work into the light.
That’s because, traditionally, there hasn’t been an easy way to get ideas from concept to deployment. Unlike in a DevOps environment, in which developers and operations managers openly collaborate, data scientists and developers usually work separately, without a connective pipeline that connects the work of data scientists to the other teams.
But, as we move into the next generation of data processing, companies must be able to deliver a consistent workflow experience that makes it easy for everyone involved in the development of AI-powered applications to solve their own unique challenges. Today, that development process includes data scientists, developers, and operations managers. And they all need to work together to deliver applications that power the success of the companies they work for.
That’s why we have MLOps. MLOps adds data scientists to a traditional DevOps environment, providing the opportunity to create a true AI and machine learning development lifecycle. The practice incorporates model development using continuous integration/continuous deployment (CI/CD) methodologies, and model monitoring and retraining. With MLOps, concepts and experiments that begin in a lab are turned into actual, practical applications that can be deployed in the same agile manner as any other type of application.
But MLOps is really only part of the story.
In addition to building an open landscape that welcomes collaboration and connectivity, it’s important to make it easier for data scientists to experiment on their own with different tools and develop and train their models within a trusted environment. Meanwhile, developers need to be able to take the models that data scientists provide, integrate them into their intelligent applications, and deploy them quickly, all while keeping their data scientist colleagues in the loop. Finally, operations managers need to be able to govern and ensure that any tools scientists and developers are using are secure and acceptable to their organizations.
Self-service, but together
Because data scientists are experimental by nature, most are likely going to want to use different tools to come up with the answers they’re seeking. And there are certainly a lot of data science and modeling tools for them to choose from, such as JupyterHub to Spark to Kubeflow and beyond.
Many of these tools originate from the open source community, so it’s possible for scientists to go out and download whatever they opt to use. But, IT operations managers would likely prefer the tools they select be chosen from a curated and secure allotment of technologies.
Still, we need to make it possible for data scientists to readily deploy tools they need without friction. They don’t have the time or inclination to track down and install the latest versions of Jupyter notebooks (for example). They need to be able to focus on their jobs, which means accessing secure technologies in a self-service manner–preferably through a common shared platform that operations managers can also access. This helps reduce or eliminate any potential friction that may exist between the data science and operations aspects of the development lifecycle.
A shared platform, serving as connective tissue between science and operations teams, benefits both parties in different ways. Data scientists can use the common tools they need, when they need them, without having to wait on operations. Simultaneously, operation managers can rest assured their scientist colleagues are using operationally sanctioned tools, not shadow IT applications that could pose a risk to their organizations, and don’t have to worry about continually addressing requests for additional data provisioning.
What about developers?
Usually, developers have to wait for data scientists to build their models and send them over to development (if they even get that far). But this process is often time consuming and inefficient, and undermines the benefits of agile development. A modern, shared application platform using technologies like container orchestration systems also brings these formerly disparate teams together, enabling them to interact and collaborate through the development process.
Closer interaction can yield a couple of big dividends. First, it increases the likelihood that the models that data scientists are building will be put into action. Companies will be better able to realize returns from their AI investments because those investments will be incorporated into deployable applications. Second, it provides a more seamless and consistent workflow experience between data science and development. This makes it easier for developers to integrate the models into their intelligent applications and make them part of their automated, repeatable development pipelines. AI-powered applications can be brought to market faster than ever before and deployed on-premises, in the cloud, at the edge, or any combination of those environments.
A cultural change with real-world implications
Of course, all of this requires a cultural adjustment, since many data science teams are used to working on their own, sort of like artists developing their own masterpieces. Incorporating them into a larger DevOps team will take some effort.
But since MLOps is really a subset of DevOps, organizations that have already created their own DevOps teams should simply be able to integrate their data scientists into the mix. Most data scientists will likely welcome the opportunity to work more closely with their development and operational colleagues, knowing that it will allow them to see their work be applied to real-world applications. They’ll be able to transform their work from experimental, so-called “artisanal AI” into deployable, scalable applications.
The more the merrier
While we didn’t talk about data preparation, here’s where things can start to get really interesting. Data engineers can be added to the collaborative atmosphere, giving them access to data lakes, databases, and other components that can be added to the platform. Line of business leaders can also become part of the team (MLDevEngineerLeaderOps, anyone?).
Things can get complicated, but that’s all the more reason to have a common and curated open source tooling platform that everyone can access, work from, and collaborate over. It can make things much easier on data science, development, operations, and other teams. It also enables organizations to see their AI and ML investments through from conception to reality.
By Will McGrath, Red Hat