MAC Model Acceptance Criteria · v0 · coming soon

Your dbt tests
live in one cell.

TDD changed how we write code. Evals changed how we benchmark LLMs. MAC is how we prove our data models work. A free 7-email course on the testing matrix that turns "we have some not_nulls" into a deliberate coverage system.

Free. 7 emails over 14 days. Unsubscribe with one click.

Day 1 ships immediately. Day 14 wraps the course. Then you decide.


The pattern

TDD → code.
Evals → LLMs.
MAC → data.

Every era of software gets the testing discipline its problems demand. Web apps got TDD. Foundation models got evals. Analytical data has always needed a framework that handles aggregates, lineage, and reconciliation, and never had one. MAC is that framework.


The problem

Your tests are in one corner of a much larger room.

Pull up any schema.yml in your repo and count: how many of your tests check a single column in isolation against a fixed rule? In most codebases, north of 90%.

That means every test you have lives in one cell of a much larger space. One corner of a matrix you didn’t know you were supposed to fill.

The X is where most teams test. The ?s are where the bugs hide.

On a recent gold-layer pipeline model, the inherited tests covered 14 cases. Most were not_null, unique, and accepted_values. By the time the matrix was walked, the model had 95 tests. The 81 new ones weren’t edge cases. They were the cells nobody had thought to fill.

One of those new tests caught thousands of rows in production reporting where opportunity type and sub-type didn’t align for closed-won deals. The bug had been live for months. Nobody caught it because nobody had a framework that told them where to look.


The framework

Scope × Basis.
Three rows. Six columns.
Eighteen cells.

Every meaningful test on an analytical model lives at the intersection of two questions. Where are you evaluating? And against what?

One cell per intersection. Walk every cell. Decide explicitly.

Scope

Column. One field, in isolation. amount > 0. stage is in an accepted list.

Row. One record, across columns. start_date < end_date. New-business opportunities must have positive ACV.

Aggregate. The whole dataset. Accounting identities. Row-count reconciliation. Total pipeline ties to the executive dashboard.

Basis

Absolute. A fixed rule. No external reference.

Relative: Source. Did the transformation preserve what it was supposed to?

Relative: Production. Does the new model tie to the old truth?

Relative: Recon. Stripe, the bank, a vendor export.

Temporal. Did last quarter’s numbers retroactively change?

Human. Show the number to someone who knows the business and watch their face.

A model isn’t "tested" until you’ve walked every cell and decided explicitly whether it applies, what the check is, or why you’re skipping it. That’s the discipline.


The agent multiplier

95 tests is a lot.
Agents make it work.

If 95 tests fire alerts at 3am and a human triages every one, the on-call rotation burns out in a week. That’s why most data teams can’t run this many checks. The signal-to-noise destroys them before the tests prove their worth.

Severity tiers change the math. Every test runs, but only Stops wake a human. Pauses go to an agent that knows the lineage, pulls row counts from the last seven runs, cross-references the affected models, and writes a triage note. By the time a human looks, the question isn’t "what happened?" It’s "do I accept this explanation?"

And the flip side: agents require this rigor. A model making decisions on your data at 3am doesn’t Slack you when a number looks weird. It acts on what you served. The old testing posture - catch the big stuff, trust the rest - stops being safe.

MAC makes the load bearable for humans and the surface legible for agents. It’s the testing discipline analytical data has always needed and now - finally - has the tooling to deploy.


The 7-email course

One cell at a time, end to end.

  1. Day 1 · Your tests are in one cell. The matrix problem. Why not_null + unique + accepted_values covers ~10% of the surface area, and the 3am question that exposes the gap.
  2. Day 3 · Walking the Scope axis. Column vs. Row vs. Aggregate. What each one catches, and the quick diagnostic to figure out which scope your current tests are covering.
  3. Day 5 · Walking the Basis axis. Absolute, Relative: Source, Relative: Production, Relative: Recon, Temporal, Human. The six lenses, with one example each.
  4. Day 7 · System of Record. The prerequisite question nobody answers. Why half of all dbt projects can’t write trustworthy Relative: Source tests until they fix this.
  5. Day 9 · Severity tiers. Stop, Pause, Go. Layer-aware (Bronze / Silver / Gold). The mapping that lets agents handle Pauses without waking humans.
  6. Day 12 · The agent multiplier. Why MAC works now and didn’t five years ago. Jevons paradox for data testing: lower the per-test triage cost, and the economically-viable test surface expands.
  7. Day 14 · What to do Monday. The one-page diagnostic. Pick a model, walk the matrix, ship the gap-fill. Plus what comes next.

Get the course

Free.
No vendor noise.

7 emails over 14 days. Practitioner takes from real client engagements. No "AI will replace you" panic pieces. Unsubscribe with one click.

Free. 7 emails over 14 days. Unsubscribe with one click.


Who’s sending this

Ben Wilson.
Founder, Ray Data Co.

Twelve years building analytical data systems. Most recently as a senior contractor on enterprise SaaS engagements, owning gold-layer models that finance leaders read off in board decks. The MAC framework came out of that work: every quarter, the same shape of bug surfacing the same way, and no language for the testing gap that let it through.

Ray Data Co is the consulting and writing practice. The weekly newsletter is Sanity Check - field notes from the edge of agentic adoption in data engineering. MAC is the first piece of permissionless leverage coming out of that work: a course, a methodology, and the start of a longer book project.