IBM has announced CodeFlare, an open-source framework for simplifying the integration and efficient scaling of big data and AI workflows onto the hybrid cloud. CodeFlare is built on top of Ray, an emerging open-source distributed computing framework for machine learning applications.
CodeFlare extends the capabilities of Ray by adding specific elements to make scaling workflows easier.
To create a machine learning model, researchers and developers have to train and optimize the model first. This might involve data cleaning, feature extraction, and model optimization. CodeFlare simplifies this process using a Python-based interface for what’s called a pipeline—by making it simpler to integrate, parallelize and share data.
The goal of the new framework is to unify pipeline workflows across multiple platforms without requiring data scientists to learn a new workflow language.
CodeFlare pipelines run on IBM’s new serverless platform IBM Cloud Code Engine, and Red Hat OpenShift. It allows users to deploy it just about anywhere, extending the benefits of serverless to data scientists and AI researchers.
It also makes it easier to integrate and bridge with other cloud-native ecosystems by providing adapters to event-triggers (such as the arrival of a new file), and load and partition data from a wide range of sources, such as cloud object storages, data lakes, and distributed filesystems.