Founder Mode — A Main Sequence Experiment

How we built this

Built in a single Claude Code session. We wrote a game engine in Python, then an AI agent autonomously modified the strategy code using hill climbing — the LLM reads the current code, proposes a change, the system plays 50+ games to evaluate, keeps improvements, reverts regressions. No human in the loop. The agent iterated 67 times across multiple models, with statistical significance testing (Wilson CIs, chi-squared) to separate real gains from noise. The final breakthrough was a veto lookahead — simulate 12 ticks ahead and reject any move that leads to death. That took it from 96% to 100%. Everything you see runs in your browser at zero cost.

Model testing

We raced 15 models from 7 companies across 4 countries: US (Meta Llama, OpenAI GPT, Google Gemma, xAI Grok), China (DeepSeek, Alibaba Qwen, Zhipu GLM, MiniMax), France (Mistral), and Singapore (SEA-LION). Models ran on Groq, Cloudflare Workers AI (free), OpenAI, xAI, and OpenRouter. 8 models hit 100% win rate. 4 completely failed to produce valid code. The cheapest path: Cloudflare Workers AI at $0.00. Total spend across all experiments: $0.29.

Inspired by karpathy/autoresearch

We're Hiring

Main Sequence Ventures is hiring AI-aware Investment Managers and Evangelists for our deep tech portfolio companies.

Hosted on Cloudflare Pages (static, free). Global leaderboard on Cloudflare Workers + D1 (free). Game engine, AI strategy, charts, and drone mode all run client-side in JavaScript. Zero server compute per visitor.

APPLY NOW

Built by @mikenicholls88 — LinkedIn

Session

Games

Wins

Win Rate

Score

Killed

Accuracy

Shots

Hits

Ticks

Controls

SPD: 1x

Optimization Journey

67 iterations | 1.6% → 100% | Total LLM cost: $0.29

Leaderboard — Who Contributed What?

Rank	Agent	Contribution	Peak WR	Cost	Iters
1	Human + Claude Code	+70.0 pp	94%	$0	5
2	Baseline heuristic	+24.0 pp	24%	$0	1
3	Llama 3.3 70B (Groq)	+4.0 pp	98%	$0.22	57
4	Veto lookahead (Claude)	+2.0 pp	100%	$0	1

pp = percentage points of win rate improvement.
Human + Claude wrote the strategy shape (targeting, dodging, column clearing) — the biggest leap.
Llama 70B autonomously refined parameters over 57 iterations for $0.22.
The final 2% came from a veto lookahead (simulate 12 ticks, reject lethal moves) — $0 pure insight.

Breakthroughs

Iter	Agent	WR	What changed
1	Baseline	24%	Naive: move toward invader, shoot
2	Human+Claude	78%	Bottom-targeting, trajectory dodge
3	Human+Claude	92%	Multi-bullet dodge, edge columns
5	Human+Claude	94%	Wider fire threshold, faster aim
13	Llama 70B	96%	Autonomous parameter refinement
45	Llama 70B	98%	Adaptive fire cooldown, tracking
67	Veto system	100%	12-tick death simulation veto

AI State

Human Leaderboard

Click PLAY to try — Arrow keys + Space