Technology
F5 and NVIDIA Advance AI Factory Economics with New Inference Capabilities
F5 has announced expanded capabilities in collaboration with NVIDIA to enhance AI inference performance and economics, marking a significant step forward in the evolution of AI factory infrastructure.
The joint solution integrates F5 BIG-IP Next for Kubernetes with NVIDIA BlueField-3 DPUs, enabling enterprises to increase token throughput, reduce latency, and lower cost per token while supporting secure, multi-tenant AI platforms.
Driving efficiency through AI “tokenomics”
At the core of this advancement is a focus on AI token economics, where tokens—units of AI-generated output such as words or data—serve as a key measure of performance, cost, and revenue generation.
As organizations shift from experimentation to monetizing AI services, success is increasingly defined by:
- Token throughput
- Time to first token (TTFT)
- Cost per token
- Revenue per GPU
Kunal Anand emphasized that AI infrastructure is now about maximizing economic output per accelerator, not just scaling GPU deployments.
Intelligent infrastructure boosts performance
The enhanced system uses real-time telemetry and AI-aware routing to optimize workload distribution across GPUs. By leveraging NVIDIA technologies, including NIM and Dynamo runtime signals, the platform ensures efficient resource utilization and faster processing.
According to validated testing by The Tolly Group, the solution delivers:
- Up to 40% increase in token throughput
- 61% faster time to first token
- 34% reduction in latency
These gains are achieved without requiring any changes to existing AI models, making deployment immediate and scalable.
Enabling secure, multi-tenant AI platforms
The collaboration also addresses the growing demand for agent-driven AI workflows, which require more advanced infrastructure than traditional systems.
New capabilities include:
- Inference-aware routing for dynamic AI workloads
- Integration with NVIDIA DOCA Platform Framework
- Secure multi-tenancy using EVPN-VXLAN and dynamic VRFs
- Built-in security, observability, and token governance
This allows enterprises and cloud providers to share GPU infrastructure securely while maintaining performance and isolation.
A new control plane for AI factories
Kevin Deierling noted that the collaboration enables scalable, cost-effective AI inference without modifying models.
By combining NVIDIA’s accelerated computing with F5’s application delivery and security platform, the solution acts as a control plane for AI factory economics, helping organizations:
- Optimize infrastructure ROI
- Reduce operational costs
- Increase revenue per GPU
Shaping the future of AI infrastructure
As AI adoption accelerates globally, the F5–NVIDIA collaboration positions enterprises to transform their infrastructure into efficient, monetizable AI factories.
With a focus on performance, scalability, and cost efficiency, the new capabilities are designed to support the next generation of agentic AI systems, enabling sustained growth in an increasingly competitive digital economy.
📢
Advertisement Space
750x200 pixels
Click to book this space
Comments (0)
Please log in to post a comment
Login to CommentNo comments yet. Be the first to share your thoughts!