Stay Skeptical, Save Budget: The Marketing Replication Crisis Exposed

Here’s a number that should bother every marketing leader: when independent teams try to rerun published marketing findings, only a small fraction hold up. One review in the marketing literature put the success rate near 15 percent. Even when elite journals try to replicate their own studies, the pass rate sits around a third.

Sit with that for a second. A large share of the “proven” tactics you’re funding may not survive a second look.

The replication crisis usually gets filed under academic or medical problem. It’s not just that. The same weak-evidence habits leak straight into your media plans, your vendor decks, and your quarterly spend. The fix isn’t more data. It’s a sharper filter on the data you already have.

This is a piece about that filter: how to spot fragile evidence before it steers your budget, and how to make skepticism a repeatable process instead of a gut feeling.

A quick reality check on evidence

Most bad marketing decisions trace back to the same handful of tells. Learn to name them and you’ve done half the work.

What tends to fall apart on a second look:

p-hacking. Slicing the data until something crosses the significance line.
Tiny samples. Big claims built on a handful of accounts or sessions.
Walled-garden metrics. Platforms grading their own homework on their own lift.
Publication bias. Only the wins get written up. The flat tests quietly disappear.

What tends to survive:

Hypotheses locked in before the test runs, not after the results land.
Open data and shared methodology you can actually inspect.
Effect sizes that look like the real world, not a miracle.
Independent replications by someone with no stake in the outcome.

None of this is exotic. It’s the same rigor a good analyst already wants. The shift is making it the price of entry for any study that touches your spend.

How I’d put skepticism to work

1. Interrogate every claim before you fund it

When a vendor or an internal team pitches a “proven” tactic, ask for the receipts before the meeting, not after:

Was the hypothesis registered before the test, or written to fit the result?
Is there a power analysis showing the sample was big enough to mean anything?
Can you see the raw cut? (A study run on enterprise B2B accounts tells you very little about your B2C funnel.)

A short note does the heavy lifting here. Something like: “Before we meet, can you send the pre-registration details, the power analysis, and agreement to share anonymized raw data so we can validate it on our side?” The reply you get back is itself a signal. Confident teams hand it over. The rest go quiet.

2. Re-test the playbook on your own audience

A claim that replicates somewhere else still has to replicate on your list, your offer, your market. So run it small before you run it everywhere.

Test the vendor’s claim on a holdout segment, around 10 percent of the relevant audience.
Compare what you see to what they published.
Scale only when your results land inside a reasonable band of theirs (I use roughly 20 percent variance as a gut check, not a law).

Half the “underperforming” tools I’ve seen weren’t broken. They got scaled before anyone checked the claim held locally.

3. Run the skeptic’s checklist

For any study driving a real budget decision, run it past four checkpoints:

Checkpoint	Red flag	Green flag
Hypothesis	Written after seeing the data	Registered before the test ran
Sample size	Under 1,000 respondents	Meets an 80 percent power threshold
Significance	p-hacking and cherry-picked cuts	Confidence intervals that clear zero
Effect size	Suspiciously large	In line with industry benchmarks

Two green flags and two red flags isn’t a “maybe.” It’s a “test it yourself before you believe it.”

4. Build a process, not a one-off

Skepticism that lives in one analyst’s head dies when that analyst changes jobs. Make it a system.

Triangulate. Don’t lean on a single study. Read user research, a real field A/B test, and a third-party panel together. When all three point the same way, you’ve got something. When they don’t, you’ve found exactly where the soft spot is.

Pressure-test platform lift. Self-reported lift from inside a walled garden is the marketing version of grading your own exam. Check it against clean-room measurement or a matched-market or synthetic-control test. Incrementality is the only number that survives a CFO.

Audit your vendors on a schedule. Once or twice a year, score each partner on data transparency, replication track record, and methodology rigor (pre-registration, power analysis, open data). Then shift budget toward the partners who score well. The scorecard does the arguing for you.

Measure the things that actually protect the budget

Vanity metrics are how weak evidence hides. A few quality KPIs are worth more than a wall of dashboards:

Replication pass-rate of the vendor studies and internal experiments you’ve actually re-tested.
Share of spend sitting under a real test-and-control review.
Cost per verified incremental conversion, set next to the modeled lift you were promised.

When you set lift expectations, anchor them in meta-analyses and industry data rather than a single rosy case study. Realistic ranges keep you grounded: modest single-digit movement on awareness for established brands, a wider band on intent for new launches, more headroom on direct-response performance. The exact numbers shift by category and year, so treat published benchmarks as a sanity check, not a target.

And keep the receipts. Archive your pre-registrations, test designs, results, and replication attempts somewhere searchable (Notion, NotebookLM, a shared drive, whatever your team will actually open). Plan to re-test your “proven” tactics on a cycle. The Meta and Google of today are many algorithm updates removed from the version that generated last year’s case study, and a tactic that worked then can quietly stop working now.

This whole habit sits inside a bigger discipline: governing how AI and vendor claims get trusted, tested, and approved before they touch spend. I wrote the full framework in the AI marketing governance pillar, and if you’re still assembling the toolset around it, the AI marketing hub is where I keep the rest.

A 90-day way in

You don’t have to fix all of this at once. Here’s the order I’d run it.

This week

Run the skeptic’s checklist on your next ad or vendor proposal.
Re-read the claims behind your last three major campaigns. What was actually tested, and what was assumed?
Write down a one-page testing process your team can follow without you in the room.

This month

Ask every research partner for pre-registration and raw-data access as a default.
Stand up your evidence library, even if it starts as one messy folder.
Schedule the first vendor audit.

This quarter

Run small replications on your top three “proven” tactics.
Set a cross-platform measurement standard so lift means the same thing everywhere.
Calculate your baseline replication pass-rate so you have something to improve against.

The bottom line

Trust, then verify. Big ROI claims should come with big transparency, and the ones that don’t are telling you something. The replication crisis isn’t going anywhere, but the marketers who treat skepticism as a process get a quiet edge: they stop paying for tactics that only worked in a slide deck.

The brands winning over the long run aren’t the ones chasing quick wins. They’re grinding it out across search, social, video, and connected TV, fighting for real incremental lift. That work is harder, slower, and a lot more durable.

Stay skeptical,

Alec

Get the skeptic’s playbook in your inbox

Every week I break down one weak-evidence trap and the test that catches it, drawn from real vendor decks and real campaign reviews. If “show me the holdout test” should be your reflex, this is the email for you.

Subscribe free →