DigitalOcean Launches Inference Engine with New Capabilities for Production AI, Including Inference Router for Efficient Scaling of Agentic Workloads
Key Terms
inference router technical
batch inference technical
serverless inference technical
dedicated inference technical
mixture of expert technical
tensorrt technical
p99 latency technical
Built alongside early design partners, the Inference Engine gives AI developers unified control over performance, cost, and scale — with customers reporting up to
DigitalOcean’s Inference Engine is built around four core capabilities: Inference Router, Batch Inference, Serverless Inference, and Dedicated Inference, giving development teams a single engine to match every workload type to the right performance and cost profile, without stitching together separate providers.
New Capabilities: Built for How AI Actually Runs in Production
Inference Router is designed to solve one of the biggest inefficiencies in agentic AI: sending every request to the most expensive model. With Inference Router, AI builders can define a model pool, describe tasks and priorities in natural language mapped to that model, and optimize each request for cost and latency. Powered by DigitalOcean’s purpose-built MoE (Mixture of Expert) router model, Inference Router matches each request to the right model, helping teams improve performance and unit economics without the need to build or manage routing infrastructure themselves. Customers like LawVo are already benefitting from this new capability:
"DigitalOcean's Inference Router gives us the kind of intelligent model selection we would otherwise have had to build ourselves. It routes each request to the right model based on complexity, helping us reduce inference costs by more than
Dedicated Inference delivers predictable performance and exceptional unit economics for teams running high-scale, sustained workloads, with reserved capacity that eliminates the variability of shared infrastructure.
Serverless Inference provides a single API key to access dozens of models, with scale-to-zero elasticity and the industry’s first off-peak pricing, giving teams instant access to leading open-source models without managing infrastructure or paying for idle capacity.
Batch Inference reduces the cost of offline AI workloads by
“Most teams building agentic systems today make a single model decision and apply it uniformly across their agentic workflows. They default to a frontier model and pay the generalization tax: premium prices and higher latency for work that often does not require the most expensive closed source model. Inference Router is the essential AI middleware that removes that tax by intelligently matching requests to the right model based on task, context, and developer-defined preferences. The result is a smarter operating model for inference - one that gives developers more control over quality, speed, and cost while helping AI-native builders move faster and build more durable businesses on DigitalOcean.” — Vinay Kumar, CPTO, DigitalOcean
Performance Benchmarks: Independent Validation
The new Inference Engine was built around three core advances: hardware and software integrations, including vLLM, TensorRT, and SGLang to maximize token throughput; request-path and model-level optimizations that improve unit economics without compromising quality; and distributed scaling designed for the bursty, uneven demands of production AI applications.
According to Artificial Analysis, an independent AI inference benchmarking platform, the results demonstrate DigitalOcean leading across key inference performance metrics, including 3x faster time-to-first-answer-token and 3x higher output speed than Amazon Bedrock on DeepSeek V3.2 at 10,000 input tokens. DigitalOcean also delivers stronger performance across output speed and latency consistency compared with most hyperscaler and neo-cloud providers, and is one of only three providers ranked in the Most Favorable Quadrant on Artificial Analysis's Latency vs. Output Speed chart, with Amazon, SambaNova, Nebius, and six others falling outside it.
Customers Report Significant Cost and Performance Gains
The Inference Engine was co-developed alongside early design partners running real production workloads, and the results are already showing at scale.
Hippocratic AI, which runs safety-critical healthcare agents on the platform, achieved 2x production throughput and
"In healthcare AI, a node going down isn't just an SLA issue, it impacts patient experience. We've pressed DigitalOcean hard on reliability, access to the newest hardware, and the ability to scale efficiently. They've delivered." — Debajyoti Datta, Co-Founder, Hippocratic AI
Workato's Research Lab, which processes over 1 trillion automated workloads, saw meaningful performance and cost improvements, achieving
"Through close collaboration on performance optimization, DigitalOcean helped us accelerate our inference performance and overall progress by two to three times." — Oscar Wu, AI Research Scientist, Technical Lead, Workato
At Deploy in
About DigitalOcean
DigitalOcean is the Agentic Inference Cloud built for AI-native and digital-native enterprises scaling production workloads. The platform combines production-ready GPU infrastructure with a full-stack cloud — all built on open source at every layer — to deliver operational simplicity and predictable economics at scale. More than 640,000 customers trust DigitalOcean to power their cloud and AI infrastructure. Learn more at digitalocean.com.
View source version on businesswire.com: https://www.businesswire.com/news/home/20260428279648/en/
Investor Relations
Radu Patrichi, CFA
investors@digitalocean.com
Media Relations
Meghan Grady
press@digitalocean.com
Source: DigitalOcean