How To Get Bare-Metal GPU Performance in Confidential VMs

Wait 5 sec.

PARIS — At OpenInfra Summit Europe 2025, NVIDIA wanted to make it very clear to AI developers, operators and users: If you want to run sensitive AI workloads on GPUs anywhere — on premises, in public clouds or at the edge — you need both virtual machine (VM)-level sandboxing and hardware-backed memory confidentiality. That means, said Zvonko Kaiser, NVIDIA principal systems engineer, you should combine Kata Containers (lightweight VMs for containers) with Confidential Computing to preserve bare-metal GPU performance while preventing the cloud operator from inspecting your model and data.Kata, for those of you who don’t know, is an open source project that combines lightweight VMs with container runtimes. It uses hardware virtualization technology to launch a separate VM for each container, providing strong isolation between containers. Each container, in turn, runs a minimal, stripped-down Linux kernel. Kata Containers aim to offer the performance benefits of containers along with the security and workload isolation of VMs.Understanding Kata Containers and Lightweight VMs“Kata is the micro-VM … it just fits into the cloud native space,” Kaiser told the audience. He argued that Kata gives the isolation container runtimes lack while still integrating with Kubernetes workflows.What Confidential Computing brings to the table is in-memory data and application encryption. We’ve long had security by encryption when data is at rest or in transit on the network. Now, we have it in memory as well.The point of combining them, Kaiser explained, is a flip of the traditional threat model. Classic Kata usage assumes the workload is untrusted, so it protects the host from the container. Confidential Computing, using CPU security features such as SEV/TDX, holds that: “We do not trust the infrastructure.” Thus, by encrypting the VM, even your cloud provider cannot snapshot or inspect guest memory.The Role of Confidential Computing and AttestationTo make sure this actually works, he emphasized the importance of attestation as the mechanism that glues the stack together. Only after a cryptographic proof that the VM and its boot/guest state match an expected configuration should secrets or keys be released to a workload. This enables a full-stack trust model across the control plane, worker nodes and pods. “The process of proving that your state … is really the state that you are measuring” is core to confidential deployments, said Kaiser.Where AI and NVIDIA come together is by using these to enable you to use GPUs like bare metal inside confidential VMs. Kaiser explained how NVIDIA is working to make GPU workloads “lift-and-shift” into Kata/confidential VMs without losing performance or functionality.Achieving Bare-Metal GPU Performance for AI WorkloadsTo do this, NVIDIA leverages Kubernetes building blocks, the GPU Operator and Container Device Interface (CDI) — so that drivers, libraries and device mappings are presented to containers exactly as they would be on bare metal. “We just took this pattern that we have already on bare metal and just put it into the end so that the container that’s running in Kata will feel and behave the very same as running on bare metal.”That effort includes support for PCIe pass-through, Single Root IO Virtualization (SR-IOV), GPUDirect Remote Direct Memory Access (RDMA) and per-pod runtime configurations so one pod can use PF pass-through while another uses SR-IOV. Crucially, Kata’s reliance on the guest kernel decouples user space from host kernel changes. This reduces the risk that a host update will break GPU drivers inside the workload VM.Solving PCIe Topology Challenges With NVIDIA’s VRAThat may sound complex, but, according to Kaiser, the real hard part is the topology. NVIDIA’s answer is its Virtualization Reference Architecture (VRA). NVIDIA will soon be publishing in more detail this approach of addressing the thorny problem of PCIe topology and peer-to-peer GPU communication inside VMs. It supports two approaches:Flatten the hierarchy: In this approach, you simplify topology to make provisioning easier. Cloud providers are already sometimes using this for confidential AI deployments, but it comes at the cost of hiding useful peer-to-peer links.Host-topology replication: Detect the host’s PCIe/input–output memory management unit (IOMMU) layout and mirror it inside the guest, preserving PCIe Address Translation Services (ATS) and PCIe Access Control Services (ACS) flags, which enables GPU peer-to-peer DMA and GPUDirect behavior.Why two? So “You can either flatten the hierarchy because you say you don’t care about the hierarchy … or you can say ‘I want host replication because I’m doing P2P objects.’ So both modes are supported,” Kaiser explained.NVIDIA also explained practical workarounds for IOMMU grouping and PCIe slot limits. For example, you can selectively map only required GPU devices to guest root ports while leaving unrelated peripherals on bridge ports. This avoids unnecessary device pass-through and complexity.Kaiser said NVIDIA is collaborating with Red Hat, IBM and the open source Kata community to upstream the VRA and tooling, including host-topology detection and performance guides. Other upcoming publications covered CPU pinning, ACS/ATS settings, and GPUDirect/RDMA tuning for confidential VMs, and emphasized avoiding nested virtualization so operators can run VM as a Service patterns at L1 with consistent attestation across layers. In short, “We want to upstream everything so that people can replicate it as a reference architecture,” said Kaiser.Open Source Collaboration and Upstreaming EffortsAll that sounds great, but Kaiser was careful to note trade-offs. Combining Kata with Confidential Computing is not a silver bullet. VM breakouts remain a theoretical risk; confidential VMs reduce a provider’s ability to inspect memory but do not eliminate all attack surfaces. Still, the combined approach substantially reduces the opportunity for cloud operators or co-tenants to access sensitive model artifacts or training data.Still, once published and available, NVIDIA’s approach to running sensitive AI workloads at scale will almost certainly lead to a new AI stack that combines lightweight VM isolation (Kata), hardware memory encryption and attestation (Confidential Computing) and GPU device mapping abstractions (CDI + GPU Operator) with careful handling of PCIe topology and IOMMU constraints to preserve security and performance.The post How To Get Bare-Metal GPU Performance in Confidential VMs appeared first on The New Stack.