Markdown vs Database for Programmatic SEO: Running the experiment on myself
Looking for the ceiling. Starting to think there isn't one.
I'm running an A/B test on myself right now.
Two programmatic SEO builds. Similar structure what I affectionately call the "SEO hell machine" approach. Hundreds or thousands of pages targeting long-tail keywords across niches, locations, or both.
Same concept. Different content backends.
Platform A: Markdown/MDX files living in the repo
Platform B: Postgres database with an admin UI
Everyone has opinions about this. "Just use markdown, it's simple." "Database scales better." "Have you tried [insert headless CMS]?"
The opinions usually come from people who built one way and never seriously tried the other. So I'm trying both. On real projects. With real traffic goals.
Here's what I'm finding.
First, what even is Programmatic SEO?
If you're new to this: programmatic SEO (pSEO) is the practice of generating large numbers of pages algorithmically to capture long-tail search traffic.
Think of how Yelp has a page for "best pizza in [every city]" or how Zapier has a page for "[App A] + [App B] integration" for thousands of app combinations. These aren't hand-written. They're generated from data, which are templates filled with structured content.
One page ranking for a 50-volume keyword is nothing. A thousand pages each ranking for 50-volume keywords is 50,000 monthly visitors. Ten thousand pages? You see where this goes.
The challenge is building the infrastructure to create, manage, and iterate on that many pages without losing your mind.
Which brings us to the content backend question.
The two approaches
Markdown/MDX in repo:
Your content lives as files in your codebase. Each niche or location is a file — /content/niches/pottery.mdx, /content/niches/woodworking.mdx, and so on.
Your build process reads these files, combines them with templates, generates static pages, deploys.
It's the default for most static site generators. Next.js, Astro, Gatsby — they all expect this pattern. The ecosystem is built around it.
Database:
Your content lives in Postgres (or MySQL, or Supabase, or whatever). You've got a niches table with columns like slug, title, description, meta_description, search_volume, is_published, created_at.
Your app queries the database at build time (for static generation) or runtime (for server-side rendering) to generate pages.
More infrastructure. More flexibility. Different scaling characteristics.
The quick technical difference
Here's how basic operations differ:
Adding pages: Markdown means new files, commit, push, CI runs, build runs, deploy happens. For one page. Database means open admin UI, add row, save. It's live on the next request or rebuild.
Non-developer access: Markdown is not friendly unless you add a CMS layer, which adds complexity and cost. Database with a basic admin UI lets marketing and SEO people self-serve.
Build times: Markdown builds scale with file count. 500 files build slower than 100 files. 5000 files? Hope you like waiting. Database builds stay relatively flat. You're querying rows, not parsing files.
A/B testing content: Markdown makes this painful. You're either duplicating files, building a variant system, or just not doing it. Database makes it trivial. Add a variants table, query by experiment ID, done.
Filtering and querying: Markdown means build-time scripts parsing frontmatter into arrays, filtering, sorting. Works, but clunky. Database means SELECT * FROM niches WHERE search_volume > 1000 AND is_published = false ORDER BY search_volume DESC. Native.
Where markdown makes sense
I'm not here to trash markdown. For the right scale and use case, it's genuinely the better choice and I have markdown on most of my sites with manageable content volume and/or I don't want to add another Supabase instance.
Version control is real version control.
Every content change is a git commit. You can see exactly who changed what, when, and why. You can revert a bad change in seconds. You can branch content experiments. You can require PR reviews for content changes if you want that level of control.
With a database, you can add audit logging, but it's something you build. Git gives you this for free.
The infrastructure is minimal.
No database to provision. No connection strings to manage. No credentials to rotate. No managed database bills. No "the database is down" incidents at 2am.
Your content deploys with your code. One artifact. One thing that can fail. Simple.
Static site generators expect it.
The entire JAMstack ecosystem assumes markdown. Plugins work with markdown. Syntax highlighting works with markdown. MDX components work with markdown. You're swimming with the current, not against it.
Type safety is achievable.
With tools like Contentlayer, Velite, or just manual Zod validation, you can get TypeScript to enforce your frontmatter schema. Add a required field to your niche type, and the build fails if any file is missing it.
This is surprisingly valuable when you have 300 files and need to add a new required property to all of them.
Local development is instant.
Clone the repo. Run the dev server. Everything works. No database seeding, no connection setup, no docker-compose. Your content is just... there.
Where markdown starts hurting
But then you scale. Or you want to move faster. And the friction compounds.
The deploy tax.
Every content change requires a deploy. New niche? Deploy. Fix a typo? Deploy. Update a meta description? Deploy.
For one change, this is fine. For twenty changes in a day while you're iterating on what's working? You spend more time waiting for CI than actually doing the work.
You start batching changes just to avoid the deploy cycle. That's a workflow smell.
Build times grow linearly (or worse).
100 markdown files with MDX processing, image optimization, and sitemap generation? Maybe 30 seconds.
500 files? A few minutes.
2000 files? Hope you're paying for beefy CI runners. Hope you've implemented incremental builds correctly. Hope your cache invalidation logic actually works.
There are solutions like incremental static regeneration, on-demand revalidation, distributed builds. But each solution adds complexity. Your "simple markdown setup" is now a distributed build system.
Non-developers are locked out.
Unless your marketing person is comfortable with git, they can't touch content. Every change flows through engineering. This creates bottlenecks and resentment on both sides.
"Can you add these 50 location pages?" "Can you update the meta descriptions on the plumbing niches?" "Can you publish the HVAC pages now?"
You can add a CMS layer like Contentful, Sanity, Tina, whatever. But now you have two sources of truth. Now you're syncing content to markdown files or querying the CMS directly. Now you're paying for another service. The simplicity argument is gone.
Querying is build-time hackery.
"Show me all niches with search volume over 1000 that aren't published yet, sorted by volume."
With markdown, you're writing a script that reads all files, parses frontmatter, filters the array, sorts, outputs. This logic lives in your build pipeline or a custom script.
Want a dashboard showing this? You're building a whole tool.
With a database, that's one query. In SQL, in your ORM, in a query builder. The database is designed for this.
A/B testing content is DIY everything.
Want to test whether "Best X for Y" titles outperform "Top X in Y"? With markdown, you need to build a system. Variant files? Runtime title switching? A/B test tracking?
With a database, you add a page_variants table with variant_name, title, description, conversion_rate. Query a random variant. Log the results. It's just data.
Where Database Wins
Once you're past the markdown pain point, database starts feeling like relief.
No deploys to publish.
Add 500 niches through an admin UI. They're available immediately — either on the next ISR revalidation or instantly with SSR. No git. No CI. No waiting.
This sounds like a small thing until you're actually trying to move fast. Then it's everything. Ideas → live pages in minutes, not hours.
Non-developers can self-serve.
Marketing adds niches. SEO updates descriptions. Content writers toggle is_published. Nobody files a ticket. Nobody waits for the next sprint.
Engineering focuses on engineering. Content people focus on content. Everyone's happier.
Gradual rollout is native.
The is_published flag isn't just for drafts. It's for staged rollouts.
Publish 20 new niches. Watch Search Console for a month. The ones that get impressions? Keep them. The ones that don't? Unpublish and try different angles.
This iteration cycle is painful when every change is a deploy. It's trivial when it's flipping a boolean.
Search volume prioritization.
Store your SEMrush or Ahrefs data right in the content table. search_volume, keyword_difficulty, cpc — whatever you're tracking.
Now your backlog query is:
sql
SELECT * FROM niches
WHERE is_published = false
AND search_volume > 500
ORDER BY search_volume DESC
LIMIT 50You know exactly what to build next. No spreadsheet reconciliation. No "wait, did we publish that one already?"
Runtime filtering enables features.
Related niches widget? Query by category, exclude current, limit 5.
Niche browser for internal tools? Paginated query with search and filters.
"Show me everything in the Home Services category we published last month"? One query.
You're not pre-computing everything at build time and hoping you cached the right views.
A/B testing is just data.
sql
CREATE TABLE page_variants (
id SERIAL PRIMARY KEY,
niche_id INT REFERENCES niches(id),
variant_name VARCHAR(50),
title VARCHAR(200),
meta_description TEXT,
impressions INT DEFAULT 0,
clicks INT DEFAULT 0
);Serve a random variant. Track impressions and clicks. Compare CTR. Roll out winners.
This is a Tuesday afternoon with a database. It's a multi-week project with markdown.
Localization without file multiplication.
Need content in 6 languages? With markdown, that's 6 files per niche. 500 niches × 6 languages = 3000 files. Your repo becomes unmaintainable.
With a database, that's a locale column or a separate translations table. Your schema handles it. Your file count stays sane.
Build times are decoupled from content volume.
Whether you have 100 pages or 10,000, your build is querying a database. The query might take 100ms instead of 10ms. Your build doesn't care.
Meanwhile, markdown builds are parsing 10,000 files, processing MDX, validating frontmatter, optimizing images. The scaling characteristics are fundamentally different.
Where database hurts
It's not free wins, though.
More infrastructure, more problems.
You need a database. Probably managed Supabase, Neon, Aiven, PlanetScale, RDS, whatever you like. That's monthly cost, connection string management, environment variables across environments.
Your staging environment needs a staging database. Your preview deployments need database access. Your local development needs either a local instance or a dev database.
Simple projects genuinely don't need this complexity.
You need an admin UI.
Unless your content team writes SQL (they don't), you need a way to manage content. Options:
- Build a custom admin UI (time investment but doable)
- Use a tool like Directus, Forest Admin, or Retool (another service to pay for and maintain)
- Accept that everyone uses Prisma Studio or TablePlus (not great for non-developers)
More surface area. More stuff to keep working.
Migrations are a thing.
Adding a column? Migration. Renaming a field? Migration. Changing a type? Migration.
It's not hard with modern tools like Prisma Migrate, Drizzle Kit, raw SQL files, but it's overhead that markdown doesn't have. You're managing a schema now.
One more thing that can break.
Database goes down? Your site has problems. Connection pool exhausted during traffic spike? Problems. Forgot to run migrations in production? Spectacular problems.
Markdown files in a git repo don't have connection issues. They don't hit concurrent connection limits. They don't have pool timeout configurations.
There's less that can go wrong when there's less stuff.
The 30K page caveat
I have one site with over 30K pages running on markdown.
So clearly markdown can scale. But context matters.
That site required architecture refactors twice. The first time around 5K pages when builds became unbearably slow. The second around 15K when the incremental build logic was getting unreliable.
It works now. But "it works" involved:
- Distributed build caching
- Aggressive incremental regeneration
- Separate content processing pipelines
- Build-time sharding
It's not "markdown files in a repo" anymore. It's a custom build system that happens to read markdown files.
If I were starting that project today, with what I know now? I'd probably start markdown for the core content and move the long-tail expansion to a database much earlier.
The hybrid architecture
Here's what I'm converging toward as the actual answer:
Core content stays in code.
Your foundational 100-200 niches - the ones that define what your product is about - live in TypeScript. Type-safe. Always available. Fast. Deployed with the code. Version controlled.
These are the pages that need to work when the database is having a bad day.
Expansion content goes in the database.
The long tail, like the 1000+ niches you're testing and iterating on, the location variations, the experimental categories, lives in Postgres.
Flexible. Queryable. No-deploy updates. Marketing can self-serve.
The system checks both.
typescript
async function getNiche(slug: string) {
// Check TypeScript first (fast, always available)
const coreNiche = coreNiches[slug];
if (coreNiche) return coreNiche;
// Fall back to database
return await db.niches.findUnique({ where: { slug } });
}Database becomes your expansion layer, not your foundation.
You keep the reliability of code for what matters most. You gain the flexibility of database for scale and iteration.
Where's the breaking point?
This is what I'm actually testing with these two builds.
My current hypothesis: markdown starts hurting somewhere between 200-500 pages. That's when:
- Deploy friction becomes noticeable
- Build times start creeping up
- You start wanting to move faster than git-commit-push-wait-deploy allows
- Non-developer content requests become a bottleneck
But the real threshold probably depends on:
How often you're changing content. A 500-page site that rarely changes? Markdown might be fine forever. A 500-page site where you're adding 20 niches a week and killing underperformers? Database yesterday.
Whether non-devs need access. If you're the only one touching content, markdown friction is manageable. If marketing needs to self-serve, you're building a CMS layer anyway.
How much you're querying content. Simple "render this page" usage? Markdown is fine. Complex filtering, sorting, dashboards? You'll be happier with SQL.
Whether you're A/B testing. If you're serious about testing content variations, the database infrastructure pays for itself immediately.
What I'm watching for
As I run both builds in parallel:
Deploy frequency. How often am I wishing I could just update content without a full deploy?
Build time trajectory. At what page count does the markdown build start feeling slow?
Non-dev requests. How many "can you update this" requests am I getting, and how annoyed am I?
Feature friction. When I want to add something - related niches, content filtering, A/B tests - which platform makes it easier?
I'll report back when I have more data. For now, I'm building both and watching where the pain shows up first.
TL;DR
Markdown: Simple, version controlled, scales poorly, deploy to publish everything.
Database: More infrastructure, flexible, scales well, publish without deploy.
Crossover point: Probably 200-500 pages depending on iteration speed. Can go higher with architecture work (I've done 30K) but you're building custom systems at that point.
Hybrid approach: Core content in code, expansion content in database. Probably the actual right answer for serious programmatic SEO.
Testing it now. Will let you know what breaks first.