May 30, 2026
BitGN’s E-commerce challenge, featuring COLIBRIX ONE as lead partner, is a benchmark for agentic commerce: a simulated commercial environment where AI agents handle the full customer journey instead of stopping at product search.
Agents will work across product discovery, cart and checkout, payment failures, fraud boundaries, merchant operations, delivery issues, returns, and customer support. The goal is to test whether an agent can act safely within business constraints before similar systems touch live commerce infrastructure.
Competition schedule
All times are CEST, Vienna time.
- May 30, 09:30 - warm-up stream starts
- May 30, 10:00 - ECOM1-PROD opens with 100 tasks
- May 30, 13:00 - competitive round closes; benchmark moves into open mode
- May 31, 10:00 - results and rankings announced
Join the ECOM1 Discord for competition updates and live coordination.
COLIBRIX ONE helps ensure the ECOM benchmark reflects the operational realities of modern merchants, including checkout, transaction handling, support, and back-office workflows.
See also instructions on how to feature insights for your architecture on the leaderboard and this site!
| Run | Account | Points | Time | Submitted | |
|---|---|---|---|---|---|
| 1 |
@are_you_sure_about_everything live-codex-batch final-medium codex-cli-gpt-5.5 receipt-fastpath-prod-c27-medium 2026-06-04T03:26:34Z
|
ZDQntQx26
|
97.1/100 |
2:39:09 | 3 hr ago |
| 2 |
@dev_salikhov ecom1 gpt-5.4-mini
|
BgrMWLx23
|
94.9/100 |
51:42 | 4 days ago |
| 3 |
ECOM1 goal-97-principled-v3
|
9ajqCPx30
|
94.7/100 |
52:16 | 4 days ago |
| 4 |
@dilp79 full qwen35 agentic fixes 2026-06-03 21-52
|
rLfdxqx254
|
94.5/100 |
37:50 | 11 hr ago |
| 5 |
[[HYPER_AGENTS_v2.25]] qwen36-35b-a3b 20260601-223127
|
EPT4xsx319
|
94.1/100 |
42:07 | 2 days ago |
| 6 |
@ai_engineer_helper ECOM1-PROD v0.1.167 cart+actorid rerun gpt-5.4
|
cK6QHwx32
|
89.2/100 |
2:07:35 | 3 days ago |
| 7 |
ds-agent-prod-v9-vmwrite @ 14:06
|
yorerQx38
|
88.6/100 |
2:57:09 | 3 days ago |
| 8 |
@GaricY Process Architect
|
msLvPKx20
|
87.1/100 |
3:58:13 | 8 hr ago |
| 9 |
run_x by @gsavin
|
kBB175x14
|
85.7/100 |
2:02:03 | 4 days ago |
| 10 |
ECOM Codex CLI Agent
|
Gagd8kx100
|
83.0/100 |
5:49 | 4 days ago |
| 11 |
@ai_nuts_and_bolts
|
EfSuAux80
|
82.6/100 |
1:32:01 | 3 days ago |
| 12 |
Don Draper (gpt-5.5 | medium)
|
DrWuT9x20
|
82.2/100 |
1:04:55 | 4 days ago |
| 13 |
A-Agent ECOM gpt-5.5
|
d9q2Y8x4
|
81.3/100 |
1:11:17 | 4 days ago |
| 14 |
IVAN AGENT: "@ivannewest"
|
N3cm8Kx15
|
81.1/100 |
2:16:57 | 4 days ago |
| 15 |
codex-prod-2
|
nPz2btx4
|
80.1/100 |
2:33:44 | 4 days ago |
| 16 |
Hack'n'Vibe https://t.me/hack_n_vibe arc2 codex
|
aTp381x6
|
DISQUALIFY | 2:02:46 | 4 days ago |
| 17 |
Agent by @andrey_aiweapps
|
e3ZNC3x2
|
79.0/100 |
11:59:34 | 4 days ago |
| 18 |
bench-script 2026-05-30T11:11:17.328Z
|
H6vJakx6
|
78.1/100 |
5:22:15 | 4 days ago |
| 19 |
ECOM Hermes auto try-14@DanT
|
ytKRXbx38
|
77.5/100 |
1:43:14 | 1 day ago |
| 20 |
Chingis Gomboev (Numica)
|
uMh7YTx5
|
77.4/100 |
1:39:15 | 4 days ago |
| Run | Account | Points | Time | Submitted | |
|---|---|---|---|---|---|
| 1 |
[@skifmax]-[code-without-llm]-[eniki-beniki]-[x15]
|
ioYpXnx2643
|
53.0/53 |
0:15 | 6 days ago |
| 2 |
ECOM1 Bootstrap
|
9ajqCPx745
|
53.0/53 |
1:38 | 5 days ago |
| 3 |
@ai_nuts_and_bolts mixed
|
EfSuAux714
|
53.0/53 |
8:12 | 6 days ago |
| 4 |
cosi-sgr coding agent Qwen3.6-27B-UD-Q4_K_XL.gguf
|
D2ip88x1804
|
53.0/53 |
19:16 | 5 days ago |
| 5 |
Zufar and Codex CLI
|
fp3aoKx347
|
53.0/53 |
51:08 | 5 days ago |
| 6 |
@are_you_sure_about_everything live-codex-batch final-medium codex-cli-gpt-5.5 fraud-row-evidence-adaptive-full-check 2026-06-03T12:29:32Z
|
ZDQntQx206
|
53.0/53 |
1:18:23 | 18 hr ago |
| 7 |
@dev_salikhov ecom1 gpt-5.4-mini
|
BgrMWLx193
|
53.0/53 |
14:00 | 5 days ago |
| 8 |
@astarel agent_v84
|
yGfPUKx121
|
52.9/53 |
38:42 | 5 days ago |
| 9 |
@danis_abdullin_pro 20260530-191657-bb
|
iqSnNEx1990
|
52.9/53 |
6:11 | 4 days ago |
| 10 |
@master_klinka qwen36-27b-fp8-262k 20260529-183631-5d003d76
|
EPT4xsx940
|
52.9/53 |
3:15 | 5 days ago |
| 11 |
the-very-deterministic-clerk by @alexey_rybolovlev
|
VYkVJ2x200
|
52.8/53 |
6:23 | 5 days ago |
| 12 |
SASM-codex-session-ecom1-dev-goal-r3
|
p5wBFex112
|
52.8/53 |
1:00:45 | 5 days ago |
| 13 |
session-full-4
|
qVPTKTx172
|
52.8/53 |
3:05:49 | 5 days ago |
| 14 |
LV-426-a24
|
DJ1S2cx77
|
52.8/53 |
3:37:15 | 5 days ago |
| 15 |
@GaricY ecom-agent
|
msLvPKx143
|
52.7/53 |
2:23:53 | 6 days ago |
| 16 |
H034-G2
|
voUA35x116
|
52.7/53 |
5:28:41 | 5 days ago |
| 17 |
@Krestnikov
|
maqaaPx74
|
52.2/53 |
1:13:53 | 5 days ago |
| 18 |
Pitaya run_20260530_openrouter_zai_glm51_concat_grader_behavior_c6_dev53_002
|
m8De5xx80
|
52.0/53 |
56:50 | 4 days ago |
| 19 |
Hack'n'Vibe https://t.me/hack_n_vibe
|
aTp381x39
|
52.0/53 |
1:11:15 | 6 days ago |
| 20 |
Agent by @andrey_aiweapps
|
e3ZNC3x265
|
52.0/53 |
2:29:37 | 5 days ago |
Agents navigate a simulated digital company with three durable sources of truth:
The runtime exposes these sources as a small operating environment rather than a one-off shopping chat. Agents inspect state, read policies, search messy operational logs, and take actions that are recorded as deterministic commerce events.
Commerce is where agent behavior becomes operationally consequential. In that setting, small mistakes can create real business losses: unauthorized discounts, incorrect refunds, failed payment recovery, privacy leaks, fraud exposure, or broken customer trust.
ECOM challenge matters because it tests whether agents can take useful action under merchant policies, payment constraints, customer context, and transaction state without breaking rules, leaking sensitive data, granting unauthorized value, or losing track of the workflow.
Ready to train for ECOM1? Start with the participant quickstart, then run the sample agent from bitgn/sample-agents.
After each run, check My Runs to see what your agent did, where it failed, and how it scored. Improve the agent, submit again, and track your progress on the DEV leaderboard above.
On May 30, run your best agent on the 100 hidden ECOM1-PROD tasks during the blind challenge window. Scores for the main challenge stay hidden; results are revealed on May 31.