STOCK TITAN

CoreWeave Achieves #1 Ranking for Inference Speed and Price-Performance for Moonshot AI’s Kimi K2.6 Model in Independent Benchmark

Rhea-AI Impact
(Neutral)
Rhea-AI Sentiment
(Positive)
Tags
AI

Key Terms

serverless inference technical
Serverless inference is a way to run artificial intelligence models on demand without a company owning or managing the underlying servers; the cloud provider automatically supplies computing power when a request comes in and bills only for actual usage. For investors, it matters because it lets businesses add AI features quickly, scale up or down with customer demand, and convert large upfront infrastructure costs into smaller, predictable operating expenses — which can improve margins and speed product rollout.
dedicated inference technical
Dedicated inference is the use of reserved computing resources or specialized hardware specifically for running AI models that make predictions or analyze data, separate from the machines used to build and train those models. For investors it matters because dedicated inference can speed up responses, improve reliability and security, and make costs more predictable—like owning a private delivery van instead of sharing a crowded service—which can affect a company’s product performance, operating expenses and competitive position.
kubernetes technical
Kubernetes is an open-source system that automates running and managing many pieces of software across groups of computers, like a conductor coordinating musicians so each piece plays at the right time and place. For investors, it matters because companies that use it can deploy updates faster, scale services up or down automatically, and cut infrastructure costs — factors that influence growth, reliability and operating margins.
quantization technical
Quantization is the process of turning continuous numbers or signals into a limited set of discrete values, like converting a smooth gradient into a set of visible steps. For investors this matters because it changes how price data, risk measures or model outputs are represented and can affect trading decisions, model accuracy, rounding errors and the apparent volatility of an asset — much like reducing a photo’s colors can hide fine detail or create banding.
speculative decoding technical
Speculative decoding is a technique used in AI language models where a fast, rough model proposes likely next words and a slower, more accurate model double-checks or corrects them, allowing the system to produce results faster and cheaper. For investors, this matters because it can lower operational costs and improve response speed for AI products, but it may also change quality or reliability trade-offs that affect customer trust and regulatory risk.
mlperf technical
A standardized set of performance tests for machine learning systems that measures how fast and efficiently hardware and software can train and run AI models. Like a car mileage test for computers, MLPerf lets investors compare different vendors on speed, energy use and cost for AI workloads, helping predict which technologies will be more competitive, scalable and profitable as demand for AI grows.
agentic technical
"Agentic" describes the quality of being proactive and capable of making independent decisions that influence outcomes. It reflects a person's or entity's ability to act with purpose and control, rather than passively accepting circumstances. For investors, recognizing agentic behavior can signal confidence and initiative, which may impact market dynamics and decision-making strategies.
See more from StockTitan in Google Search and AI answers. Adds StockTitan as a preferred source · opens Google
Add on Google

 Full stack optimization across memory architecture, runtime, and interconnect translates into the speed and economics enterprises need to run open-source AI in production

LIVINGSTON, N.J.--(BUSINESS WIRE)-- CoreWeave, Inc. (Nasdaq: CRWV), The Essential Cloud for AI™, today announced it has achieved the strongest combination of speed and price-performance1 for Moonshot AI’s Kimi K2.6 in independent inference benchmarking conducted by Artificial Analysis. Across 11 inference providers evaluated on the current top open-source model, CoreWeave simultaneously delivered the highest output speed at the most cost-efficient performance level measured.

CoreWeave ranked first in the most attractive quadrant for inference speed and price-performance on Kimi K2.6, as independently measured by Artificial Analysis.

CoreWeave ranked first in the most attractive quadrant for inference speed and price-performance on Kimi K2.6, as independently measured by Artificial Analysis.

As AI applications move from training into production, inference efficiency increasingly determines real-world product viability. For organizations running the full AI loop from training to inference to continuous improvement, throughput, latency, and cost per request directly shape how reliably and economically AI can scale in the real world. This is especially significant where performance is non-negotiable, like coding assistants, agentic systems, and real-time enterprise copilots.

“Training launched the first wave of AI, and inference will define the next one. That’s why the effectiveness and economics of inference are becoming critical to organizations bringing AI into the products people use every day,” said Chen Goldberg, Executive Vice President of Product and Engineering at CoreWeave. “This benchmark reflects the investments we’ve made across our full stack, and the deep expertise of CoreWeave engineers in optimizing performance and efficiency. This is a clear signal that speed, responsiveness, and predictable economics are attainable for customers today.”

"Performance gains in inference systems come from optimization across the full stack, including hardware, inference runtime, and model configuration,” said George Cameron, Co-founder at Artificial Analysis. “Artificial Analysis benchmarks are intended to give organizations transparency in how inference offerings perform. CoreWeave performed strongly across speed and price-performance dimensions in our benchmarking of providers of Kimi K2.6. For those deploying agents in production, inference speed and price are critical to user experience and to making open source models a viable choice at scale."

The gap between theoretical compute capacity and actual production throughput is influenced by how well hardware, model optimization, and runtime execution are tuned together. CoreWeave has optimized its platform across all three layers.

The benchmark result, as validated by Artificial Analysis, reflects the company's investment in full stack infrastructure optimization for production AI workloads. CoreWeave Inference and Applied Training teams achieved top speed by training an in-house NVFP4 Quantization with Eagle3 Speculative decoding on NVIDIA GB300 NVL72 hardware delivering 205 token/sec at $0.7 per million tokens blended (7:2:1 agentic blend) price. Teams can access this performance directly through CoreWeave Inference offerings:

  • Serverless Inference, which provides immediate API access to optimized models with no infrastructure to manage.
  • Dedicated Inference, which provides a predictable path to production with explicit control over the number of GPUs for the required scale, while all inference services are still managed by CoreWeave.
  • Inference on CoreWeave Kubernetes Service (CKS), which means developers can work with direct, bare-metal access to AI infrastructure, allowing for deep control over the entire stack.

Artificial Analysis is an independent platform that benchmarks and analyzes AI models, API providers, and infrastructure. It provides data on model quality, speed, cost, and reliability, helping users (developers/enterprises) compare and select AI technologies. Artificial Analysis independently benchmarked Moonshot AI’s Kimi K2.6 by testing its performance across 10+ core metrics – including MMLU-Pro, GPQA, and agentic coding tasks –to evaluate speed, cost, and reasoning capability.

The Artificial Analysis result is the latest in a series of independent validations of CoreWeave. The company is the only AI cloud to earn the top Platinum ranking in both SemiAnalysis ClusterMAX™ 1.0 and 2.0, which evaluate AI cloud performance, efficiency, and reliability, and also demonstrated record-breaking MLPerf® benchmark results.

Learn more about CoreWeave’s recognition on our blog or on Artificial Analysis’s website.

1Price performance is measured in Speed vs. Price

About CoreWeave
CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to move at the pace of innovation, building and scaling AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave serves as a force multiplier by combining superior infrastructure performance with deep technical expertise to accelerate breakthroughs. Established in 2017, CoreWeave completed its public listing on Nasdaq (CRWV) in March 2025. Learn more at www.coreweave.com.

press@coreweave.com

Source: CoreWeave, Inc.