ArenaData & evaluation

Mapped players2As of2026-05-22

Weights & Biases

Experiment tracking and evaluation platform for model and agent teams. Used to compare behavior across model, prompt, and deployment iterations.

Founded 2017 · San Francisco, US · Subsidiary · Part of CoreWeave · ~200 employees

API-firstObservability

About

Weights & Biases builds an experiment tracking and evaluation platform used by model and agent teams. It is venture-backed and US-based.

Strategy

Widely adopted at the model training stage and now expanding into agent evaluation. The main competition is from LangSmith for teams already in the LangChain ecosystem and from cloud-native eval tooling. W&B's advantage is breadth — it covers training, fine-tuning, and deployment monitoring in one platform without needing to stitch tools together.

May 5CoreWeave completed acquisition of Weights & Biases.

Apr 15Weights & Biases partnered with LG CNS for finance-specialized LLM leaderboard.

Nov 10W&B launched agent evaluation framework for production workflows.

Aug 20Weights & Biases expanded integration with major cloud providers.

May 15W&B added support for multimodal model evaluation.

Official site Docs Pricing

Humanloop

Prompt and evaluation management layer focused on production LLM reliability and testable iteration workflows.

Founded 2021 · London, UK · Private · Backed by Index Ventures, Basis Set · ~30 employees

API-firstObservability

About

Humanloop builds a prompt and evaluation management platform for production LLM workflows. It is a UK-based startup focused on the reliability and governance use case.

Strategy

A prompt and evaluation management platform aimed at production LLM reliability. Smaller than W&B but more focused on the prompt governance use case. Relevant for regulated industries where prompt changes need audit trails and teams want structured evaluation workflows.

May 10Humanloop added enterprise-grade prompt governance features for regulated industries.

Apr 20Humanloop expanded evaluation workflow templates for financial services.

Nov 15Humanloop launched audit trail feature for prompt changes.

Aug 10Humanloop added integration with major model providers.

May 20Humanloop released structured evaluation framework for production LLMs.

Official site Docs Security