Vibecode Benchmarks

A running collection of coding-agent benchmarks: the same two games — a small RTS and Minesweeper — built one-shot by various LLMs. Click a test to play it, or grab the source from its gist. By Senko Rašić.

:: 2026

RTSOpus 4.8

via Claude Code (ultracode) · view gist

RTSDeepSeek V4-Flash

2-bit quantized, run via DwarfStar4 · view gist

RTSGPT-5.5

OpenAI · view gist

RTSQwen3.6 Max

view gist

RTSKimi K2.6

view gist

RTSDeepSeek V4 Pro

view gist

MinesweeperQwen 3.6-27B

Q4 quantized · zero-shot · view gist

RTSOpus 4.7

view gist

RTSGPT-5.4

view gist

RTSOpus 4.6

view gist

RTSCodex 5.3

view gist

:: 2025

MinesweeperGPT-5 nano

zero-shot · view gist

MinesweeperGPT-5

zero-shot · view gist

MinesweeperQwen3-coder

single-shot · view gist

MinesweeperKimi K2

single-shot · on Groq · view gist

MinesweeperGPT-4.1

single-shot · view gist

MinesweeperGemini 2.5 Pro

single-shot · view gist