
Tech • IA • Crypto
Nvidia has introduced Vera, a data center CPU designed to eliminate bottlenecks in agentic AI systems and improve GPU-driven performance.
The rise of agentic AI is redefining system architecture, positioning the CPU as a coordinator rather than the primary compute engine. GPUs now handle most heavy workloads, while CPUs orchestrate data flow, task execution, and system responsiveness. This shift makes CPU efficiency critical to overall system performance.
Traditional CPUs, optimized for virtualization and core density, struggle to keep up with GPU demands. This mismatch can limit token throughput, increase latency, and degrade user experience in AI applications, particularly those involving real-time inference and multi-step agent workflows.
Nvidia Vera is purpose-built for agentic AI loops, combining a custom CPU architecture with a scalable coherency fabric. It is designed to balance compute performance and bandwidth, ensuring GPUs remain fully utilized in AI “factory” environments.
At the core of Vera is the Nvidia Olympus CPU core, optimized for modern data center tasks such as Python-based runtimes, tool orchestration, and sandboxed execution. Features include a neural branch predictor capable of evaluating two branches per cycle, a 10-wide decode engine, and a large out-of-order execution system to sustain throughput.
Vera is the first CPU to adopt LPDDR5X memory in this context while maintaining strong error correction without sacrificing bandwidth. It delivers up to 40% lower peak memory latency than x86 systems, improving performance in data-intensive tasks like retrieval and analytics.
A second-generation scalable coherency fabric links all 88 cores in a unified mesh, avoiding chiplet fragmentation and enabling 50% faster core-to-core communication. Separate dies for memory and I/O further optimize data flow across the system.
NVLink chip-to-chip connectivity allows GPUs to directly access the CPU’s coherent fabric, improving coordination between compute units. The same technology enables multi-socket scaling with high-bandwidth CPU-to-CPU communication.
Vera delivers up to 1.8× higher performance in agentic sandbox workloads compared to traditional x86 CPUs. It is designed to handle orchestration tasks such as tool execution, data pipelines, and contextual processing alongside GPUs.
Combined with Rubin GPUs and BlueField-4 STX for networking and storage, Vera forms part of an integrated AI infrastructure stack. This approach targets end-to-end optimization across compute, memory, and data movement.
Nvidia’s Vera signals a shift toward CPUs optimized for orchestration in GPU-centric AI systems, aiming to remove bottlenecks and improve performance in emerging agent-driven workloads.
Agentic AI changes the role of the CPU. The CPU is now the conductor and the GPU is the orchestra. Traditional CPUs [music] were built for a different era, maximizing cores per socket. Slice them up, virtualize, [music] rent by the hour. In the age of agents, the CPU is now a bottleneck to GPU utilization, directly affecting token throughput, latency, and user experience. [music] Nvidia Vera is the CPU built for the agentic loop, combining Nvidia's custom [music] data center CPU core with a scalable coherency fabric for the right balance of performance cores [music] and bandwidth to maximize AI factory output. At the heart of Vera is the Nvidia Olympus [music] core, built for modern data center workloads, branch-heavy Python runtimes, tool calls, and sandboxed code execution. Each core is tuned for throughput. A neural branch predictor evaluating two taken branches per cycle. A 10-wide decode engine brings in more work each cycle. A large out-of-order engine keeps instructions moving. Advanced prefetchers with a novel graph engine anticipating the next data path. But fast cores only matter when data [music] arrives correctly and on time. Vera is the first CPU to use LPDDR5X memory while correcting multiple errors simultaneously without compromising bandwidth. Vera achieves 40% lower peak memory latency versus X86, keeping cores fed on time through retrieval, analytics, and sandbox execution. Nvidia's second-generation scalable coherency [music] fabric unifies all 88 Olympus cores on a monolithic mesh with separate dies for memory and IO. Cores are not split across chiplets, enabling 50% faster core-to-core communication than traditional [music] CPUs. And memory coherent NVLink chip-to-chip connects GPUs directly to the fabric. Beyond GPUs, NVLink chip-to-chip can scale Vera up to multiple sockets enabling massive bandwidth between CPUs. Vera delivers 1.8 [music] times the agentic sandbox performance of x86 CPUs. Standalone Vera racks run agent sandboxes, tools, code, and data pipelines. Tightly coupled to Rubin GPUs, Vera keeps accelerated workflows moving. NVIDIA Vera BlueField [music] 4 STX powers context memory and AI storage. Compute, networking, storage, Vera is the CPU for the age of agents.