How AI Is Pushing Kubernetes Storage Beyond Its Limits

Wait 5 sec.

As enterprises rush to deploy AI and data-intensive applications in Kubernetes environments, standard Container Storage Interfaces (CSIs) aren’t enough to meet business requirements in the new operating model.A decade ago, when Kubernetes first burst onto the scene, the majority of containerized workloads were stateless, saving no context across different sessions. A typical Node.js or NGINX application would be reinstantiated based on available metadata, but it wouldn’t read and write data to or from a persistence store.The Rise of Stateful Applications in KubernetesThese patterns are relatively easy to apply to stateless web applications, and designing your microservices to be as stateless as possible produces highly reliable, manageable systems.However, as Brendan Burns, Joe Beda, Kelsey Hightower and Lachlan Evenson wrote in “Kubernetes: Up and Running,” “Nearly every system that has any complexity has state in the system somewhere, from the records in a database to the index shards that serve results for a web search engine. At some point, you have to have data stored somewhere.”Integrating this data with containers and container orchestration solutions is often the most complicated aspect of building a distributed system. The “Kubernetes: Up and Running” authors suggest that this complexity stems from the fact that “the move to containerized architectures is also a move toward decoupled, immutable and declarative application development.”Around five years ago, at Nutanix, we started to see an uptick in the number of stateful applications using containerized databases, such as Cassandra, Redis, PostgreSQL, MySQL and Kafka. With that shift already underway, the recent rapid adoption of AI in enterprises significantly accelerated the process.The acceleration is unsurprising. As Phil Winder, CEO and founder of Winder.AI, noted in “Reinforcement Learning,” AI is “a child of data science, which is an overarching scientific field that investigates data generated by phenomena.”In other words, your organization’s data is fundamental to the success of any initiative you might pursue using AI.While data is important because of AI, it is also foundational to nearly every application for things like personalized recommendations for better user experience, user behavior analytics, security, observability (e.g., logs and metrics), Internet of Things (IoT) and edge.A corollary comes from Gartner analyst Julia Palmer, who predicts that, “By 2027, 80% of Kubernetes deployments will require advanced features for persistent containers storage, compared to 30% in early 2023.”Understanding CSI, the Foundation of Kubernetes StorageThe Kubernetes CSI is the standard mechanism for dealing with persistence in Kubernetes. This layer consists of a set of APIs that applications can use to perform reads and writes to the underlying storage system.Since CSI is a standard, every storage vendor has its own implementation — Nutanix CSI, Dell CSI, Red Hat OpenShift CSI, Portworx CSI and so on — and every CSI driver has vendor-specific attributes offered via the built-in CSI extension mechanism.Nutanix CSI provisions Nutanix Unified Storage (NUS) to containerized stateful applications. NUS is a software-defined data services platform that consolidates file, object and block storage into a single, high-performance, dense and cost-optimized platform, packaged according to a customer’s needs:Nutanix CSI for stateful applications using Nutanix Unified Storage. (Source: Nutanix)Limitations of CSI for Enterprise WorkloadsCSI is fine for providing persistent storage to a single cluster, but beyond that, it has some limitations. Chiefly, it doesn’t provide a mechanism for data protection or business continuity and disaster recovery (BCDR). This is particularly important in heavily regulated industries such as financial services and healthcare. The need for BCDR in regulated industries is not new, but it is becoming more pertinent alongside the increasing number of applications running within Kubernetes clusters.Regulations also dictate where data must reside. In regions like EMEA, policies may mandate that all data copies remain within national boundaries, adding a layer of geospecific compliance to an already complex technical challenge.For any application, persistent data needs to reside as close as possible to where the application is running, necessitating data replication for BCDR and related use cases such as workload rebalancing and high availability. This is particularly important in heterogeneous deployment models, for example, cloud bursting from on premises to a public cloud to handle transient spikes in demand, such as Black Friday, university admissions deadlines, online ticket sales or media streaming surges. Cloud bursting requires rapid, consistent replication of the application environment and associated data to and from the cloud.Synchronous vs. Asynchronous Data ReplicationData replication can be either synchronous or asynchronous, depending upon how write operations are managed:Synchronous data replication means the data is constantly copied from the main server and to all replica servers simultaneously.Asynchronous data replication means that data is first copied to the main server and then copied to replica servers as per the preconfigured protection policy dictating the frequency of data replication and duration of data retention.Although synchronous replication ensures no data is lost, asynchronous replication requires substantially less bandwidth and is less expensive.Filling the GapsNutanix Data Services for Kubernetes (NDK) can fill in the gaps left by CSI, letting you manage, control and operate the disparate worlds of virtual machines (VMs) and containerized apps as a single entity, from one unified platform.NDK uses familiar Kubernetes mechanisms to help reduce the learning curve. It is shipped as a Helm chart, and users interact with it from the command line using kubectl. Data services are Kubernetes-distribution agnostic. While we would prefer that customers use our Kubernetes distro, the data services will work with alternatives, such as Red Hat OpenShift or Amazon EKS Anywhere. NDK supports both synchronous and asynchronous data replication.In NDK, asynchronous replication can be performed at a maximum frequency of once per hour. The policy is set at an application level rather than a cluster level, so different applications within a single cluster can run different data replication strategies.Asynchronous replication is used in BCDR. In a typical example, you might have two data centers in different countries — say, a primary in Spain and a backup in Germany — so that you can switch from one to the other in the event of a major disaster.Alongside BCDR, Nutanix also supports high availability using synchronous replication.High availability using synchronous replication. (Source: Nutanix)Synchronous replication guarantees zero data loss in the event of a failure, but it requires both data centers to be located in close physical proximity. This means it is unsuitable for protection against natural disasters such as earthquakes or hurricanes, but depending on your business, it can be a valuable approach.One of our customers, for example, runs cruise ships and has two separate data rooms. They are in separate locations but physically close to each other, connected via a high-bandwidth network with a latency of under 10ms. The benefit is that if one data room fails, perhaps due to a power outage or flooding, the ship can switch to the other one and continue to operate.Beyond CSI: Why AI Demands More From Kubernetes StorageThe convergence of VMs and containers into a unified platform is a practical necessity for enterprises navigating the complexities of distributed, data-intensive applications. As stateful applications continue to proliferate in Kubernetes environments — a trend accelerated by AI adoption — the need for enterprise-grade data services becomes critical.While CSI provides the foundation for persistent storage, solutions like NDK are essential for organizations that require the data protection, compliance and operational flexibility demanded by enterprise containerized workloads. NDK is offered as part of the Nutanix Kubernetes Platform (NKP) solution, a complete full-stack platform that combines infrastructure, Kubernetes orchestration, storage, data services and application life cycle management in a single platform.The post How AI Is Pushing Kubernetes Storage Beyond Its Limits appeared first on The New Stack.