CoreWeave Sandboxes Launches to Accelerate Reinforcement Learning, Agent Tool Use, and Model Evaluation

Rhea-AI Impact

(Moderate)

Rhea-AI Sentiment

(Neutral)

Key Terms

reinforcement learning technical

A type of artificial intelligence that learns by trial and error, receiving feedback from its actions to favor choices that lead to better outcomes. Think of it like a salesperson learning which pitches close deals by trying different approaches and keeping the ones that work. For investors, reinforcement learning matters because it can power smarter trading systems, optimize business operations, or improve products—potentially boosting efficiency and profits while also introducing model and execution risks.

serverless runtime technical

A serverless runtime is a cloud service that runs a company’s code on demand without the company having to provision or manage servers, much like taking a taxi instead of owning and maintaining a car. For investors, it matters because it can lower upfront costs, make software rollouts faster, and let computing scale automatically with customer demand, all of which can affect a business’s operating costs, growth speed, and reliance on specific cloud providers.

python sdk technical

A Python SDK is a ready-made set of computer tools and code written for the Python programming language that lets developers connect to a service, pull data, or automate tasks without building everything from scratch. For investors, it’s like a trusted toolbox or remote control that speeds up getting market data, placing trades, or running analyses, reducing errors and time-to-action which can directly affect trading speed, costs, and decision quality.

kubernetes technical

Kubernetes is an open-source system that automates running and managing many pieces of software across groups of computers, like a conductor coordinating musicians so each piece plays at the right time and place. For investors, it matters because companies that use it can deploy updates faster, scale services up or down automatically, and cut infrastructure costs — factors that influence growth, reliability and operating margins.

gpu technical

A GPU (graphics processing unit) is a specialized computer chip designed to handle many calculations at once, originally for rendering images and video but now widely used for tasks like artificial intelligence, data analysis and high-performance computing. Investors watch GPU demand and prices because strong sales often signal growth for chip makers and their customers, affect profit margins and capital spending, and can forecast wider trends in gaming, AI adoption and cloud services.

mlperf technical

A standardized set of performance tests for machine learning systems that measures how fast and efficiently hardware and software can train and run AI models. Like a car mileage test for computers, MLPerf lets investors compare different vendors on speed, energy use and cost for AI workloads, helping predict which technologies will be more competitive, scalable and profitable as demand for AI grows.

inference technical

Inference is the process of drawing a conclusion from available evidence or data, like a detective piecing together clues to form a likely story. For investors it matters because these judgments turn raw reports, test results, or market signals into expectations about future performance, risk, or regulatory outcomes—so how someone infers from the same facts can change investment decisions and valuation.

05/14/2026 - 08:00 AM

Secure, isolated environments for running AI tool use and evaluation at scale

LIVINGSTON, N.J.--(BUSINESS WIRE)-- CoreWeave, Inc. (Nasdaq: CRWV), The Essential Cloud for AI™, today announced CoreWeave Sandboxes, an execution layer that gives AI researchers and platform teams secure, isolated environments for running reinforcement learning (RL), agent tool use, and model evaluation. The new offering is available on a customer's own CoreWeave infrastructure or as a serverless runtime through Weights & Biases (W&B).

As AI systems evolve from generating outputs to taking actions, training them requires more than compute alone. Advanced AI workflows such as RL and evaluation require isolated execution environments that run code safely, maintain information across steps, and scale across concurrent workloads.

What’s more, most organizations lack a unified execution layer for RL, agent tool use, and model evaluation. Instead, they rely on custom-built systems, loosely integrated tools, or third-party sandbox products that sit outside their core infrastructure. As scale, concurrency, and workflow complexity increase, those disconnected approaches become harder to manage, less reliable, and more difficult to govern.

CoreWeave Sandboxes provides that unified execution layer through two access models: on-cluster for platform teams running training on CoreWeave Kubernetes Service (CKS) and serverless through W&B for researchers and applied AI teams who want enterprise-grade isolation without the infrastructure overhead.

Designed for scale, simplicity, and control
Available now through the Cloud Console and the Python SDK, CoreWeave Sandboxes runs directly within a customer’s CKS cluster, allowing teams to run RL, agent tool use, and model evaluation workloads alongside their AI jobs without adding a separate execution stack. At launch, it includes a Python SDK for creating and managing isolated, secure environments that can handle complex back-and-forth tasks and run multiple jobs at the same time. Built-in session management, storage integration, and monitoring tools help teams run these workflows with less operational overhead.

For teams without an existing CoreWeave cluster, or those looking to extend their current compute, CoreWeave Sandboxes is also available as a serverless runtime through Weights & Biases. Researchers authenticate with an existing W&B API key, install the Python client, and can start running sandboxes in minutes with no cluster provisioning or infrastructure decisions required. Every sandbox runs in its own fully isolated virtual environment by default – meaning a failure, memory spike, or runaway process in one sandbox cannot affect any other. When something does go wrong, teams don't have to hunt across disconnected systems to find out why: sandbox activity is captured directly in the same W&B run view as training metrics, so debugging happens in context rather than across tools.

"CoreWeave Sandboxes solves a real gap in our AI research stack: secure, isolated code execution at scale directly in our existing compute," said Brian Belgodere, senior technical staff member, AI/ML Systems, IBM Research. "Our reinforcement learning workflows spin up thousands of sandboxes in parallel per training step, each with its own container image and resource boundaries. Researchers run sandboxes within minutes of a pip install cwsandbox, with no infrastructure knowledge required."

"As agent tool use and evaluation move to production scale, teams need an execution layer that behaves like the rest of their infrastructure — governed, observable, and close to the workflows already running on CoreWeave," said Chen Goldberg, EVP, Product and Engineering at CoreWeave. "CoreWeave Sandboxes closes the execution gap in reinforcement learning and agent workflows without requiring teams to build custom execution systems around them. And for teams that want these capabilities without managing their own clusters, the serverless path through Weights & Biases makes that same execution layer accessible in minutes."

Addressing the growing complexity of AI workflows
“Managing separate clusters and scheduling sandboxes across different node types lacked a unified solution, costing us time and resources. CoreWeave Sandboxes eliminated that issue,” said Roman Soletskyi, AI scientist, Mistral. “We now run hundreds of concurrent sandboxes on CPU nodes and alongside Slurm training jobs on GPU nodes, all through a single setup. The Python SDK let our researchers get started immediately, and the CoreWeave team worked closely with us to adapt the open-source SDK to fit seamlessly into our codebase."

“Enterprises are under pressure to build agentic AI automation as fast as possible, so they're looking for any help to accelerate the time from idea to live agent,” said Holger Mueller, VP and principal analyst, Constellation Research. “As they enter the next stages of agentic AI automation, they need to support reward verification and evaluation without adding custom infrastructure to the environments they already run. Purpose-built execution that stays inside existing training infrastructure reduces operational sprawl and removes the fragility of homegrown sandbox systems, a gap that general-purpose and CPU-only sandbox vendors are not designed to solve."

Built on proven AI infrastructure
CoreWeave consistently delivers industry-leading infrastructure performance, demonstrated by record-breaking MLPerf benchmark results, its position as the only AI cloud to earn the top Platinum ranking in both SemiAnalysis ClusterMAX™ 1.0 and 2.0, and its #1 ranking for inference speed and price-performance for Moonshot AI’s Kimi K2.6 in independent inference benchmarking conducted by Artificial Analysis.

About CoreWeave
CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to move at the pace of innovation, building and scaling AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave serves as a force multiplier by combining superior infrastructure performance with deep technical expertise to accelerate breakthroughs. Established in 2017, CoreWeave completed its public listing on Nasdaq (CRWV) in March 2025. Learn more at www.coreweave.com.

View source version on businesswire.com: https://www.businesswire.com/news/home/20260514255975/en/

Press Contact:
press@coreweave.com

Source: CoreWeave, Inc.

CoreWeave Sandboxes Launches to Accelerate Reinforcement Learning, Agent Tool Use, and Model Evaluation

Key Terms

Related Articles