F5 and NVIDIA Advance AI Factory Economics with New Inference Capabilities

Technology

F5 and NVIDIA Advance AI Factory Economics with New Inference Capabilities

F5 has announced expanded capabilities in collaboration with NVIDIA to enhance AI inference performance and economics, marking a significant step forward in the evolution of AI factory infrastructure.

The joint solution integrates F5 BIG-IP Next for Kubernetes with NVIDIA BlueField-3 DPUs, enabling enterprises to increase token throughput, reduce latency, and lower cost per token while supporting secure, multi-tenant AI platforms.

Driving efficiency through AI “tokenomics”

At the core of this advancement is a focus on AI token economics, where tokens—units of AI-generated output such as words or data—serve as a key measure of performance, cost, and revenue generation.

As organizations shift from experimentation to monetizing AI services, success is increasingly defined by:

Token throughput
Time to first token (TTFT)
Cost per token
Revenue per GPU

Kunal Anand emphasized that AI infrastructure is now about maximizing economic output per accelerator, not just scaling GPU deployments.

Intelligent infrastructure boosts performance

The enhanced system uses real-time telemetry and AI-aware routing to optimize workload distribution across GPUs. By leveraging NVIDIA technologies, including NIM and Dynamo runtime signals, the platform ensures efficient resource utilization and faster processing.

According to validated testing by The Tolly Group, the solution delivers:

Up to 40% increase in token throughput
61% faster time to first token
34% reduction in latency

These gains are achieved without requiring any changes to existing AI models, making deployment immediate and scalable.

Enabling secure, multi-tenant AI platforms

The collaboration also addresses the growing demand for agent-driven AI workflows, which require more advanced infrastructure than traditional systems.

New capabilities include:

Inference-aware routing for dynamic AI workloads
Integration with NVIDIA DOCA Platform Framework
Secure multi-tenancy using EVPN-VXLAN and dynamic VRFs
Built-in security, observability, and token governance

This allows enterprises and cloud providers to share GPU infrastructure securely while maintaining performance and isolation.

A new control plane for AI factories

Kevin Deierling noted that the collaboration enables scalable, cost-effective AI inference without modifying models.

By combining NVIDIA’s accelerated computing with F5’s application delivery and security platform, the solution acts as a control plane for AI factory economics, helping organizations:

Optimize infrastructure ROI
Reduce operational costs
Increase revenue per GPU

Shaping the future of AI infrastructure

As AI adoption accelerates globally, the F5–NVIDIA collaboration positions enterprises to transform their infrastructure into efficient, monetizable AI factories.

With a focus on performance, scalability, and cost efficiency, the new capabilities are designed to support the next generation of agentic AI systems, enabling sustained growth in an increasingly competitive digital economy.