Alluxio has announced the immediate availability of version 2.7 of its Data Orchestration Platform. This new release has led to 5x improved I/O efficiency for Machine Learning (ML) training at lower cost by parallelizing data loading, data preprocessing and training pipelines, the company said. Alluxio 2.7 also provides enhanced performance insights and support for open table formats like Apache Hudi and Iceberg to more easily scale access to data lakes for faster Presto and Spark-based analytics.
Also today, in a separate announcement, Alluxio announced $50M in Series C financing.
NVIDIA’s Data Loading Library (DALI) is a commonly used python library which supports CPU and GPU execution for data loading and preprocessing to accelerate deep learning. With release 2.7, the Alluxio platform has been optimized to work with DALI for python-based ML applications which include a data loading and preprocessing step as a precursor to model training and inference. By accelerating I/O heavy stages and allowing parallel processing of the following compute intensive training, end-to-end training on the Alluxio data platform achieves significant performance gains over traditional solutions. The solution is scale-out as opposed to other solutions suitable for smaller data set sizes.
At the heart of Alluxio’s value proposition is data management capabilities complimenting caching and unification of disparate data sources. As the use of Alluxio has grown for compute and storage spanning multiple geographical locations, the software continues to evolve to keep scaling using a new technique for batching data management jobs. Batching jobs, performed using an embedded execution engine for tasks such as data loading, reduces the resource requirements for the management controller lowering cost of provisioned infrastructure.
Alluxio now supports a native Container Storage Interface (CSI) Driver for Kubernetes, as well as a Kubernetes operator for ML making it easier than ever before to operate ML pipelines on the Alluxio platform in containerized environments. The Alluxio volume type is now natively available for Kubernetes environments. Agility and ease-of-use are a constant focus in this release.
An intelligent new capability, called Shadow Cache, makes striking the balance between high performance and cost easy by dynamically delivering insights to measure the impact of cache size on response times. For multi-tenant Presto environments at scale, this new feature significantly reduces the management overhead with self-managing capabilities.
Free downloads of Alluxio 2.7 open source Community Edition and of Alluxio Enterprise Edition are now generally available.