The Big Picture: Maintaining Application Consistency in Containers

The modern application is dynamic, spanning across virtual machines, containers, and cloud services. As such, enterprises shouldn’t take an outdated approach to data protection. Attempting to back up an entire application by only focusing on one of the application’s components, like a single virtual machine, is like only sharing the game plan with one player on the team.

A Different Type of Beast

Containers are ephemeral – they pop up and disappear constantly. They are also stateless, meaning they do not retain any valuable data inside them when they disappear. All of this is by design; but because enterprise data doesn’t take this same approach, it can make the process of backing up and securing modern application data feel like an endless game of whack-a-mole. In fact, more than half of IT professionals say they’ve delayed deploying Kubernetes applications into production due to security. Below, we’ll explore why the ephemerality and statelessness of containers are a challenge to data security along with a proven approach to make data protection more effective for the modern enterprise app. While you may not have complete control in preventing security events from occurring, recovery from these events can be entirely controlled with proper planning.

Ephemerality: Here One Minute, Gone the Next

Containers go in and out of existence all the time, whether due to scaling events, like adding or removing replicas, or host events, like host failures or cluster events. This isn’t problematic from an operational standpoint because of inherent design, but it can be challenging from a protection point of view. It means we must rely on the underlying taxonomy of the application to capture important details needed to recreate the container (like the container source images, application configuration, and persistent storage data) as opposed to looking at the container as a sole unit itself.

Fortunately, in the case of Kubernetes, the running application taxonomy is built into its platform. Since Kubernetes operates on desired state management, the entire taxonomy of an application is ready to use as code. Kubernetes cleanly separates the application’s container-based images, application configuration, and state – secrets, persistent storage, and other elements. This benefits DevOps and people interested in backing up enterprise data. However, organizations must be able to work through this unique taxonomy for each microservice (and its related data) that comprises the entire app in order to meet data protection and governance standards. Otherwise, it can lead to partial data recovery.

Statelessness: Keeping Tabs on Data

The second significant contributing factor to the cloud-native application challenge is containers’ statelessness. To capture an application’s state, we need to keep tabs on application runtime (the container images), as well as the application’s most current configuration and the application’s persistent data across on-premises and cloud storage. All these moving parts can lead to data fragmentation across heterogeneous storage services and locations, including across availability zones and on-prem data centers. Simply redeploying an application’s containers to a different geographic cluster in the event of a recovery from a security event will not work unless that target cluster has all the above elements present, consistent, and unharmed first.

A “Snapshot” Approach to Data Protection Won’t Work

Now that we’ve demonstrated the intricacies of containers—how, by nature, they’re made up of constantly moving, ever-changing parts—it should be clear the pitfalls of traditional data protection techniques. If we don’t capture the entire application (even state from the stateless ones), we cannot meaningfully recover data from a backup. Instead, organizations should focus on achieving application consistency. Application consistency is the process of analyzing and grouping the entire state of an application, including its running configuration and any persistent data connections.

To guarantee a single source of truth, teams can create application-level consistency groups by simply annotating the desired elements of the application that needs to be protected and recovered including the persistent data sources. Journaling of the persistent data and incorporating every journaled point with corresponding application state gives a granularity of seconds, which ensures the app owners can rewind the entire application (including the persistent data) to any previous point in time. If that point is not satisfactory, they can discard it and pick another one quickly.

Journaling data has the benefit of maintaining write-order fidelity across multiple persistent volumes, which makes sure that any changes to the entire application, regardless of its microservices architecture, are protected in the same order, guaranteeing protected data is consistent with the production application on a second-to-second basis. This beats the data loss in the eventual consistency model which applications must adopt when designing protection inside the application natively. It also beats having to adopt an active-active design pattern for every single application, reducing infrastructure and development costs.

In an enterprise setting, trying to apply traditional backup methods to cloud-based or containerized applications will result in lack of organizational resilience and disconnect between the teams that lead application design and data protection. In the real world, this will lead to more potential data loss and/or downtime. Application consistency allows the modern enterprise to obtain sufficient data protection for backup, disaster recovery and data mobility in the modern DevOps practice.

Author: Deepak Verma, VP of Product, Zerto, a Hewlett Packard Enterprise company.

Bio: Deepak Verma is VP of Product at Zerto, a Hewlett Packard Enterprise company. He has 20 years of experience in the IT industry with a focus on disaster recovery and data protection. He has lead product management teams responsible for building and delivering products for cloud platforms at multiple companies. He has also architected, deployed and managed technologies for data protection and disaster recovery in various industry verticals during previous roles. Deepak holds a Master of Computer Science in Data Science and a Bachelor of Engineering. He is certified in AWS, Microsoft Azure and Google Cloud.

To hear more about cloud native topics, join the Cloud Native Computing Foundation and cloud native community at KubeCon+CloudNativeCon North America 2021 – October 11-15, 2021

You may also like

Open Platform for Enterprise AI (OPEA) aims to foster collaboration in Enterprise AI

Why AWS backs Valkey, an open source alternative to Redis | David Nalley

LF Energy leads digitalization efforts to tackle decarbonization challenges

Carbon Data Specification Consortium helps drive climate solutions with carbon data standardization

Tackle data complexity with Hasura v3

Acorn Labs’ GPTScript aims to redefine coding for AI applications