How Google Simula Works: Mechanism Design Explained

Google Simula's secret is mechanism design — planning the entire data set as a product before generating anything. Here's how it works, why it produces dramatically better quality than the alternatives, and how you can apply the same pattern to your own AI workflows even without using Simula directly.

Most AI data generation tools work one prompt at a time, like a factory worker making one shoe at a time without a plan — sometimes you get great shoes, sometimes you get a pile of left feet. Simula is different. It plans the whole data set top-down before making anything, and Google calls this approach mechanism design. This post explains how it works.

Why Google Simula Mechanism Design Matters

There are three problems mechanism design solves that random prompting can't.

The first is coverage gaps. Random generation misses entire parts of a domain because there's no map of what's been covered.

The second is quality variance. Some prompts produce great data, others produce garbage, and without planning you can't tell which is which until after the fact.

The third is diversity collapse. AI tends to repeat itself when generating in volume, which kills the variety that good training data needs.

Mechanism design addresses all three at the architectural level rather than in post-processing.

The 3-Stage Google Simula Mechanism Design

Simula breaks data generation into three distinct stages.

Stage 1 — Global diversification

Map the entire domain first before generating anything.

Stage 2 — Local diversification

Zoom into each spot on the map and generate variety within that cell.

Stage 3 — Dual critic filter

Run quality control before saving anything to the final data set.

Stage 1 — Global Diversification

This is the planning stage where Simula uses a taxonomy. Think of it as a giant menu of every possible topic and subtopic in the area you care about.

For cyber security data, the taxonomy covers every type of attack, every type of defender, every type of system, and every corner of the space. For legal data, it covers every type of case, every type of legal question, every relevant jurisdiction, and every category of practice area.

This taxonomy ensures coverage. Without it, you'd miss whole topics.

Stage 2 — Local Diversification

Once the map is drawn, Simula zooms into each cell using two techniques.

The first is one-of-N meta prompting. For each spot on the taxonomy, Simula generates many different versions rather than one. "Many" not "one" — and that prevents the data set from sounding repetitive.

The second is complexification. Simula takes simple examples and pushes them harder, working through easy (basic version of the scenario), medium (more nuanced version), hard (edge case version), and "boss fight" (extreme edge case). It's like levelling up a video game, and the model trained on this learns the full range rather than just the easy parts.

🔥 Want to apply mechanism design to your AI workflows? Inside the AI Profit Boardroom, I share how to apply Simula-style mechanism design to your own AI use, plus daily training and weekly live coaching. 3,000+ members. → Get the playbook

Stage 3 — Dual Critic Filter

The final stage is quality check before anything is saved.

Two different critic models look at each example and decide whether it's good enough to keep or whether it should be thrown out. The numbers from Google's tests are striking — on the legal data set, 61% of generated data was rejected by the dual critic filter.

That's a serious filter. Most of what was generated wasn't good enough. The output quality is high BECAUSE the filter is strict.

Why Two Critics, Not One

A single critic is a single point of failure. Two critics give you checks and balances.

If one critic accepts something the other rejects, it's flagged for closer review. If both reject, it's thrown out. If both accept, it's kept.

This matches how real research works — peer review involves multiple reviewers for a reason, and the same logic applies to AI data quality control.

What Mechanism Design Means For You

Three lessons applicable beyond Simula itself.

1 — Plan before generating

Whatever you're building with AI, sketch the full scope first. Don't just start prompting and hope coverage works out.

2 — Cover the full domain

Don't let AI default to the easy and common parts. Push it to cover edge cases deliberately.

3 — Use a critic step

Always have a second AI (or human) review before publishing. The dual critic pattern works whether you're generating training data or shipping content.

I apply this in Hermes Agent Swarm workflows.

Quality Vs Diversity Vs Complexity

This is one of Simula's biggest insights, and it applies broadly.

Most AI generation conflates quality, diversity, and complexity into one knob. Simula treats them as three separate knobs that you control independently, which means you can optimise for whichever one your specific use case demands.

When you want high quality plus low complexity, you're training a chatbot — you want safe, simple examples. When you want high complexity plus narrow scope, you're training specialist AI for legal or medical work — narrow but deep. When you want high diversity plus medium complexity, you're training general-purpose models — broad coverage with depth.

Different use cases need different settings, and Simula gives you that control rather than forcing one trade-off on every workload.

How Mechanism Design Compares To Other Data Generation

Simula uses top-down planning, multi-stage refinement, and a dual critic filter. Most AI data generation tools work one prompt at a time with no taxonomy and a single critic at best. Manual data generation is slow, expensive, and inconsistent.

Simula's approach is strictly better than most alternatives.

Real Numbers From Google's Tests

The math benchmark (GSMAT) showed striking results.

A low complexity versus high complexity comparison across 64,000 data points each showed high complexity gave a 10% accuracy gain. That's massive in AI terms.

But there's a catch — it only worked with a strong teacher model. With a weak teacher (57% accurate), performance dropped with high-complexity data.

The lesson is that complexity helps when the teacher can label correctly. Don't force complexity on a weak teacher.

Why "Real Reference Data" Sometimes Loses

Real-world data covers what people happen to write online. Simula covers what's needed on purpose.

The result is that Simula data sets sometimes have better coverage than real data sets, which is counter-intuitive but real.

Applying Mechanism Design Beyond Data Generation

The pattern applies to anything you're producing in volume with AI — content like SEO posts, customer responses, code modules, marketing assets.

The three-step pattern that transfers is straightforward.

The first is to define the taxonomy. What categories does your output need to cover?

The second is to generate diverse examples per category. Don't let AI default to one style — push for variety.

The third is to apply a critic step. Always review before publishing.

I apply this principle in Claude Code SEO Agent workflows.

Complexification As A Pattern

Simula's "complexification" technique is broadly useful even outside training data.

Take simple AI outputs and push them harder. "Make this more nuanced." "Add edge cases." "Challenge the obvious answer." Output quality improves.

This is something you can apply today even without Simula.

What This Reveals About AI's Future

Three predictions based on what Simula represents.

The first is that more products will use synthetic training data. Privacy, cost, and access concerns make synthetic increasingly appealing.

The second is that the mechanism design pattern spreads beyond data. Production-quality AI workflows will adopt similar planning plus filtering architectures.

The third is that the quality bar rises industry-wide. When everyone uses better techniques, the floor rises and the marginal advantage shifts elsewhere.

What Solo Operators Can Take From Simula

Three lessons that translate cleanly to solo work.

The first is to plan before generating. Don't just throw prompts at AI — map your domain first.

The second is to cover the full domain. Push AI to handle edge cases instead of letting it default to the safe middle.

The third is to always use critics. Second-pair-of-eyes (AI or human) on everything.

Output quality jumps when you do this consistently.

🚀 Want my full AI workflow design playbook? The AI Profit Boardroom has my AI workflow templates, OpenClaw 6-hour course, Hermes 2-hour course, daily training, weekly live coaching. 3,000+ members. → Join here

FAQ — Google Simula Mechanism Design

What is mechanism design?

Planning the full data set as a product before generating anything. Top-down approach versus prompt-by-prompt.

Why two critics, not one?

A single critic is a single point of failure. Two creates checks and balances.

Can I use Simula myself?

Not directly — it's a research framework. But you can apply the pattern to your own AI workflows.

Will Simula become open source?

Possibly. Google often releases research papers.

Is mechanism design slow?

Initial planning takes time. Then execution scales much faster than ad-hoc prompting.

Can Simula generate any type of data?

Best for structured domains. Less suited for highly creative or stylistic data.

What's the biggest insight from Simula?

Quality, diversity, and complexity should be separate knobs — not lumped together.

How Google Simula Works: Mechanism Design Explained

Why Google Simula Mechanism Design Matters