F5 and NVIDIA are partnering to deliver a next-generation platform that makes AI infrastructure faster, more secure and more cost‑efficient.
The collaboration combines F5 BIG‑IP Next for Kubernetes with NVIDIA BlueField‑3 DPUs to create an intelligent, telemetry‑aware infrastructure layer that optimises how AI workloads are managed and scaled across enterprises and GPU‑as‑a‑Service providers.
At the centre of this collaboration is a focus on improving the economics of AI inference — the process of turning trained models into real‑time outputs, or tokens. Tokens have become the key measure of AI system performance and a fundamental indicator of infrastructure efficiency and revenue potential.
As enterprises shift from experimentation to monetisation, metrics such as throughput, time to first token, cost per token, and revenue per GPU are emerging as the primary indicators of business success.
F5’s enhanced platform capitalises on NVIDIA telemetry and runtime data to make smarter, inference‑aware routing decisions before workloads are executed.
By automatically matching each request to the most suitable accelerator in real time, the system reduces latency, eliminates redundant processing, and improves sustained GPU utilisation.
“AI infrastructure is no longer just about access to GPU or scaling their deployments. It has evolved into maximising economic output per accelerator,” said Kunal Anand, Chief Product Officer of F5.
Testing by The Tolly Group showed a 40 percent increase in token throughput, a 61 percent improvement in time to first token, and a 34 percent reduction in overall request latency. These gains are made possible by offloading networking, encryption and traffic management tasks to NVIDIA BlueField‑3 DPUs, freeing GPUs to focus exclusively on high‑throughput inference.
As modern AI workloads become more persistent, context‑aware and agent‑driven, F5’s platform introduces a series of new capabilities designed to meet these evolving requirements. These include inference‑aware routing for agentic workflows, integration with NVIDIA’s DOCA framework for simplified DPU lifecycle management, secure multi‑tenant segmentation through EVPN‑VXLAN with dynamic VRFs, and built‑in security, governance, and observability for Kubernetes‑based AI environments. The innovations allow enterprises and cloud providers to safely share GPU resources across teams or customers while maintaining performance isolation and predictable service levels.
“NVIDIA’s accelerated computing infrastructure coupled with F5’s AI-aware application delivery and security platform unlocks superior AI factory tokenomics — delivering scalable and cost-effective inference without making any changes to the models,” said Kevin Deierling, SVP of Networking at NVIDIA.
