Vibecode Benchmarks

A running collection of coding-agent benchmarks: the same two games — a small Real-Time Strategy (RTS) and Minesweeper — built one-shot by various LLMs. Click a test to play it, or grab the source from its gist. By Senko Rašić.

:: 2026

RTSGLM-5.2

via Z.ai chatbot in agent mode · view gist

RTSGLM-5.1

via Z.ai chatbot in agent mode · view gist

RTSKimi K2.7-code

via Kimi platform playground (zero shot, no agent) · view gist

RTSFable 5

zero-shot via Claude Code on a phone · view gist

MinesweeperGemma 4-12B

4-bit quant; had to manually fix a few stupid syntax errors · view gist

RTSOpus 4.8

via Claude Code (ultracode) · view gist

RTSDeepSeek V4-Flash

2-bit quantized, run via DwarfStar4 · view gist

RTSGPT-5.5

OpenAI · view gist

RTSQwen3.6 Max

view gist

RTSKimi K2.6

view gist

RTSDeepSeek V4 Pro

view gist

MinesweeperQwen 3.6-27B

Q4 quantized · zero-shot · view gist

RTSOpus 4.7

view gist

RTSGPT-5.4

view gist

RTSOpus 4.6

view gist

RTSCodex 5.3

view gist

:: 2025

MinesweeperGPT-5 nano

zero-shot · view gist

MinesweeperGPT-5

zero-shot · view gist

MinesweeperQwen3-coder

single-shot · view gist

MinesweeperKimi K2

single-shot · on Groq · view gist

MinesweeperGPT-4.1

single-shot · view gist

MinesweeperGemini 2.5 Pro

single-shot · view gist

:: Prompts

Each game has a single fixed prompt, used as-is with no follow-ups:

Minesweeper one-shot, in a chat interface
Create a beautiful and fully functional Minesweeper clone in HTML/JS/CSS (all in one file).
RTS one-shot, from a coding agent
Create a simple but functional real time strategy (RTS) game similar to old WarCraft, StarCraft or Command & Conquer games. The player should be able to build buildings, create units, gather resources and should uncover the whole map. No AI or multiplayer needed. Use simple but nice-looking graphics. No sound. Implement everything in HTML/CSS/JS, everything in a single file (you can use 3rd-party js or css libraries/frameworks via CDN).