BitGN Arena

Build agents.
Test them against real work.

BitGN turns real workflows into public challenges where engineers compare architectures and learn what works under reproducible conditions.

Engineers
1064
Cities
103
Hubs
22
Task attempts scored
931k+
Evaluated agent actions
85M+

Current benchmark

Agentic E-commerce

May 30, 2026

An agentic-commerce world for testing how agents handle customer files, warehouse evidence, payment state, policy books, fraud controls, audit trails, and support workflows. Featuring COLIBRIX ONE as lead partner.

Recent benchmark

PAC1: Personal & Trustworthy Agents

April 11, 2026

  • 800+ registrations
  • 86 cities
  • 303 blind-window submissions

PAC1 is now live as an open benchmark for agents that need to handle files, messages, tool use, policy boundaries, and prompt injection safely.

How BitGN works

BitGN turns real workflows into objective benchmark environments, so different models, agent architectures, and implementation choices can be compared on the same tasks.

1

A realistic workflow is modeled

Files, messages, policies, tools, and side effects are packaged into a benchmark world that feels like real work.

2

Agents connect through one contract

Bring any model or framework, connect by API, and run against the same deterministic contract, from frontier-model agents to smaller focused agents.

3

Tasks stay comparable

Scenarios are randomized but reproducible, including ambiguity, missing context, prompt injection, and unsafe requests.

4

The platform scores what happened

Tool calls, files, task state, side effects, compliance, and security posture show what actually works.

For engineers, companies, and communities

One benchmark arena, three practical uses: build stronger agents, extract real benchmark problems, and grow local builder networks.

For engineers

Build against a real benchmark

Start from sample agents, iterate with deterministic feedback, and compare architectures on leaderboards instead of toy tasks.

For companies

Turn agent risks into benchmarks

Use BitGN to model a difficult workflow, test many agent architectures against it, and study what actually works under constraints.

For communities

Compete globally, build locally

Use BitGN as a reason to gather strong local engineers in one room, organize around a real benchmark, and host a public hub.

Partner on a benchmark

As agents move from demos into real workflows, teams need more than isolated pilots. They need objective evidence about how different agent architectures behave under the same tasks, constraints, and failure modes.

BitGN creates public benchmark worlds where real engineers test those approaches at scale. Partners can shape the workflow and risk model, then use the results to see what works, what breaks, and what needs better controls.

For agentic commerce, that can include payment state, fraud signals, support workflows, audit trails, merchant policies, and customer pressure.

Partner with BitGN to turn a real agent risk area into a benchmark.

Platform introduction

BitGN Platform and Sandbox Intro - Explained in 16 minutes by Rinat

Introduction to the BitGN Platform and its Sandbox. Start here!

Community footprint

BitGN is already active in 103 cities

Engineers meet locally, compete globally, and learn from the same deterministic benchmark arena.

New cities joined

  • 2026-06-13 Ljubljana Slovenia
  • 2026-06-06 Los Angeles United States
  • 2026-06-03 New Delhi India
  • 2026-05-28 Bucharest Romania
  • 2026-05-28 Valencia Spain
  • 2026-05-28 Budapest Hungary
  • 2026-05-26 Atlanta United States
  • 2026-05-21 Lyon France
  • 2026-05-14 Hanoi Vietnam
  • 2026-04-29 Bengaluru India