The growing adoption of Kubernetes frameworks by VMware, Red Hat and other major infrastructure and cloud players is making it easier to support cloud-native analytics applications. This is an important development, as it enables organizations to unlock additional value from their data.
Due to the volume and velocity of data generated at the edge, the legacy model of moving the data to a central location to be analyzed is broken. Instead, the data analysis must occur as close as possible to where the data is generated. That’s why Kubernetes is key: one of its biggest benefits is extreme agility, and by leveraging Kubernetes, organizations can swiftly bring analytical workloads to run directly on data residing in their data lakes.
Moreover, Kubernetes makes it easy to deploy the same workloads in different environments, including private clouds, public clouds and the edge. For example, a machine learning algorithm can be developed and deployed in the public cloud, then also deployed at the edge.
Kubernetes has significant potential to accelerate the use of analytics applications. Thanks to its inherent agility and portability, Kubernetes has seen increasing adoption to support an expanding range of use cases. According to a CNCF survey published earlier this year, 96% of organizations are either using or evaluating Kubernetes.
However, for organizations to maximize Kubernetes’ value for data analytics, they need to build a storage strategy that successfully bridges the gap between data repositories and application workloads.
Key Storage Considerations
By adopting a common storage platform across the Kubernetes ecosystem, IT teams can deliver the persistent capabilities that application developers need, make life easier for IT administrators and minimize storage costs. As stated by Enterprise Strategy Group (ESG), “maximizing the potential of container-based applications . . . mandates a modern persistent storage foundation — one that can deliver consistent cloud-like access across both on-premises data center infrastructure and the public cloud.”
In particular, an object storage system with S3 API compatibility that can readily scale up and down addresses key challenges in container-based environments such as storage infrastructure costs, overall storage performance, hybrid cloud storage management, data availability and data protection.
S3-based object storage provides limitless scalability because of its horizontal, scale-out architecture. This architecture enables organizations to increase deployments by adding nodes whenever and wherever needed. Because S3-based storage uses a single, global namespace, this scaling can also be done across multiple geographic sites at once.
In addition, S3 compatibility is also important for bolstering Kubernetes’ portability. As noted earlier, portability is one of the technology’s key benefits — organizations look to Kubernetes to help deploy the same apps immediately across different environments, such as on-premises and public clouds. S3-based storage enables seamless integration between private and public cloud deployments, allowing enterprises to quickly move apps and data between the two. This maximizes the agility of Kubernetes apps, making it easier to overcome data gravity and move workloads to data lakes.
Integration with enterprise analytics applications such as data warehouse solutions is another key factor to consider. Leading data analytics platforms — including Greenplum, Teradata, Vertica, Apache Druid, Microsoft SQL Server 2022, Splunk, Elastic and Cribl — now support running workloads directly on data housed in S3 object storage-based data lakes. Along with the mobility of Kubernetes, such an architecture enables separate scaling of compute and storage. Storage can be expanded by simply adding nodes or pods, in one site or across multiple sites. Users can manage all this storage within a single namespace, from a single management console, and search metadata across all sites with a single query.
Enterprise-grade security is also a crucial consideration for Kubernetes environments using persistent storage. Kubernetes has security functionality at multiple layers (e.g., node, cluster, network, role-based access control, etc.), and with persistent storage, additional security options need to be considered. For example, protection against ransomware attacks can be enhanced using storage system capabilities, including data immutability policies such as S3 Object Lock and encryption. The use of immutable backup data prevents cybercriminals from being able to alter or delete that data and enables quick recovery of the unchanged backup copy. Meanwhile, by encrypting all sensitive data, both in flight and at rest, it becomes impossible for cybercriminals to read or expose that data in any intelligible form.
Broadly speaking, to ensure their storage systems provide comprehensive protection for Kubernetes apps, organizations should look at storage platforms that have earned major security and compliance certifications. These include the Common Criteria for Information Technology Security Evaluation, the Federal Information Processing Standard (FIPS) and SEC Rule 17a-4, among others.
Conclusion
Kubernetes provides new opportunities for organizations to extract greater value from their data, but fully capitalizing on these opportunities requires a robust, modern storage foundation that supports cloud-native analytics applications. S3-compatible object storage solutions provide this foundation, which explains why such solutions are increasingly being used for primary workloads and not just data protection.
-Gary Ogasawara, CTO, Cloudian