DigitalOcean’s Inference Cloud Platform, Powered by AMD Instinct GPUs, Delivers 2X Production Inference Performance for Character.ai

Rhea-AI Impact

(Moderate)

Rhea-AI Sentiment

(Very Positive)

Key Terms

inference technical

Inference is the process of drawing a conclusion from available evidence or data, like a detective piecing together clues to form a likely story. For investors it matters because these judgments turn raw reports, test results, or market signals into expectations about future performance, risk, or regulatory outcomes—so how someone infers from the same facts can change investment decisions and valuation.

latency technical

Latency is the time delay between when information or an instruction is created and when it is received, processed, or acted on by a market system or data feed. For investors, that delay can alter the price you receive, cause missed trading opportunities, or increase execution risk — like sending a text to buy an item and the seller acting a few seconds later after the price has changed.

gpu technical

A GPU (graphics processing unit) is a specialized computer chip designed to handle many calculations at once, originally for rendering images and video but now widely used for tasks like artificial intelligence, data analysis and high-performance computing. Investors watch GPU demand and prices because strong sales often signal growth for chip makers and their customers, affect profit margins and capital spending, and can forecast wider trends in gaming, AI adoption and cloud services.

throughput technical

Throughput is the amount of stuff, like products or data, that a system can handle or move through in a certain period of time. For example, a factory’s throughput is how many items it produces each hour, and it matters because higher throughput usually means things are running efficiently and meeting demand quickly.

transformer workloads technical

Transformer workloads are the computing tasks that run on a class of artificial-intelligence models called transformers, which handle language, images and other complex data. Think of them as digital factories that take raw input (like text) and turn it into useful output (summaries, answers, translations) — they matter to investors because they drive demand for specialized chips, cloud services and software, influence operating costs, energy use and scalability, and can affect revenue potential across tech and service providers.

distributed inference technical

Distributed inference is the process of running a trained artificial intelligence model’s prediction or decision step across multiple computers, devices, or locations instead of a single central server. For investors, it matters because this approach can speed up responses, lower operating costs, reduce reliance on a single data center, and better protect sensitive data — like having several cooks finish a meal at different stations so dishes arrive faster and fresher.

total cost of ownership financial

The total cost of ownership is the full lifetime expense of acquiring and keeping an asset or product, not just its sticker price — it includes purchase, installation, operating, maintenance, financing and disposal costs. For investors, it reveals the real economic burden or savings of a purchase so they can judge profitability, cash flow and competitive strength; like comparing cars by adding fuel, insurance and repairs, not just the sale price.

01/13/2026 - 08:00 AM

Platform-level inference optimization enables higher throughput, lower latency, and improved cost efficiency by 50% for Character.ai

BROOMFIELD, Colo.--(BUSINESS WIRE)-- DigitalOcean (NYSE: DOCN) today announced that its Inference Cloud Platform is delivering 2X production inference throughput for Character.ai, a leading AI entertainment platform operating one of the most demanding production inference workloads in the market handling over a billion queries per day, through a tightly integrated software and hardware collaboration with AMD.

Character.ai leverages both proprietary and open source models to power its high-volume, high-concurrency, latency-sensitive applications. By migrating these workloads to DigitalOcean’s inference cloud platform, Character.ai achieved significantly higher request throughput while adhering to rigorous latency targets. Compared to standard, non-optimized GPU infrastructure, this transition reduced the cost per token by 50% and substantially expanded usable capacity for their end users.

David Brinker, Senior Vice President of Partnerships at Character.ai, said the results exceeded expectations. “We pushed DigitalOcean aggressively on performance, latency, and scale. DigitalOcean delivered reliable performance that unlocked higher sustained throughput and improved economics, which directly supports the growth of our platform.”

This performance milestone builds on DigitalOcean’s growing momentum with large-scale AI customers like Character.ai, supporting platform expansion and richer multimodal experiences.

Platform and Hardware, Working Together

DigitalOcean worked closely with Character.ai and AMD to deploy AMD Instinct™ GPUs optimized specifically for inference workloads. Rather than treating GPUs as interchangeable infrastructure, DigitalOcean’s platform integrates hardware-aware scheduling and optimized inference runtimes to extract higher sustained performance per node. AMD has invested heavily in ROCm™, its open end-to-end AI software stack. Through our deep collaboration, the teams optimized ROCm with vLLM, AITER – AMD’s inference-focused runtime and optimization framework for transformer workloads – and deployment configurations for Character’s workloads on DigitalOcean AMD Instinct™ MI300X and MI325X GPUs, contributing to throughput improvement.

"This work demonstrates what’s possible when platform and silicon teams partner deeply to solve real production challenges for customers, as our collaboration with DigitalOcean helped Character.ai unlock higher sustained inference throughput and improved efficiency,” said Vamsi Boppana, Senior Vice President of Artificial Intelligence at AMD. "By combining AMD Instinct™ GPUs, the open ROCm™ software stack and platform-level optimization, DigitalOcean’s Inference Cloud is delivering a scalable, cost-effective foundation for running large-scale, latency-sensitive AI workloads in production. Together, we are accelerating the builders who are defining the next generation of AI applications.”

In collaboration with Character.ai, DigitalOcean engineers tuned distributed inference configurations to balance latency, throughput, and concurrency. In some production scenarios, these optimizations increased throughput by 2X under the same latency constraints, directly improving total cost of ownership.

Operating large-scale AI inference under real production constraints

This approach reflects DigitalOcean’s broader strategy: GPUs matter, but outcomes matter more. DigitalOcean is designing, operating, and optimizing systems that can yield significantly more reliable performance for its customers.

Unlike traditional cloud approaches that emphasize GPU availability alone, DigitalOcean’s Inference Cloud is designed to operate AI applications in production. The platform delivers a unified hardware-software paradigm, where orchestration and system-level tuning work together to deliver cost-efficiency, observability, and operational simplicity across production AI workloads at scale.

“Character.ai runs one of the most demanding real-time inference workloads in the market,” said Paddy Srinivasan, Chief Executive Officer of DigitalOcean. “This work shows what happens when advanced hardware meets a platform designed specifically for production inference. We’re not just delivering faster models, we’re making large-scale AI applications easier and more economical to run.”

The Character.ai deployment reflects a broader shift in how AI infrastructure is built and evaluated. As inference workloads scale, customers are prioritizing predictable performance, operational simplicity, and cost efficiency over raw hardware specifications. For additional information on the specific testing methodologies, hardware configurations, and performance benchmarks used to achieve these results, as well as important information regarding performance variability, please see our technical deep-dive here.

Learn more about the DigitalOcean Inference Cloud.

About DigitalOcean

DigitalOcean is an inference cloud platform that helps AI and Digital Native Businesses build, run, and scale intelligent applications with speed, simplicity, and predictable economics. The platform combines production-ready GPU infrastructure, a full-stack cloud, model-first inference workflows, and an agentic experience layer to reduce operational complexity and accelerate time to production. More than 640,000 customers trust DigitalOcean to deliver the cloud and AI infrastructure they need to build and grow. To learn more, visit www.digitalocean.com.

View source version on businesswire.com: https://www.businesswire.com/news/home/20260113554559/en/

Media Relations

Julie Wolf: press@digitalocean.com

Investor Relations

Melanie Strate: investors@digitalocean.com

Source: DigitalOcean

DigitalOcean’s Inference Cloud Platform, Powered by AMD Instinct GPUs, Delivers 2X Production Inference Performance for Character.ai

Key Terms

NYSE:DOCN

DOCN Rankings

DOCN Latest News

DOCN Latest SEC Filings

DOCN Stock Data

DigitalOcean’s Inference Cloud Platform, Powered by AMD Instinct GPUs, Delivers 2X Production Inference Performance for Character.ai

Key Terms

NYSE:DOCN

DOCN Rankings

DOCN Latest News

DOCN Latest SEC Filings

DOCN Stock Data

Related Articles