LLM-Friendly Database Benchmark

Two agents, four metrics, one verdict

Lower is better. Each agent is compared only to itself across databases.

Does the gap widen as the schema deepens?

Same business outcome, three Postgres schema depths. MongoDB stays one document; Postgres normalizes deeper.

Why — straight from the agent traces

Heuristic indicators parsed from the real captured transcripts.

How we kept it honest

Inspect the evidence

Every number traces to a hashed source file. Re-verified by the integrity gate.