Two agents, four metrics, one verdict
Lower is better. Each agent is compared only to itself across databases.
Does the gap widen as the schema deepens?
Same business outcome, three Postgres schema depths. MongoDB stays one document; Postgres normalizes deeper.
Why — straight from the agent traces
Heuristic indicators parsed from the real captured transcripts.
How we kept it honest
Inspect the evidence
Every number traces to a hashed source file. Re-verified by the integrity gate.