Database CI/CD Basics (T-SQL Tuesday #177)

Tom

2024-08-13

4 minutes

Page content

This month’s invitation is from Mala Mahadevan (b) asking us how we manage the database code. Since this is my passion project, I have a few basic tips to share.

I have yet to see a perfect implementation but even a partial one benefits you. My experience is mostly with enterprise-scaled environments - large servers, minimal downtime, brownfield development and also mostly single-tenant.

But these concepts and building blocks should be generic enough.

Throughout this article, I’ll use the word schema. It’s an overloaded term but in this case, I’m referring to a general “shape” of the database (object definitions with their results sets, columns, etc.) and not the schema returned by sys.schemas.

With that out of the way, everything can be summed up by this meme:

SRGrafo pitchfork meme use same principals for database code as you would for application code

Source of Truth 🔗︎

You can’t handle the source of Truth — Me

We should take inspiration from application development because it’s very similar. Database code is also a code.

Mr Incredible meme “Math is math” but with code instead

It’s just a tiny bit harder because you have to persist the state, so you can’t just drop the DB and rebuild it from scratch for any change.

If you want automatic deployments, you have to store your scripts somewhere. The hint is in the name: source control should store source code.

And when I say source control I mean git. Even though no one understands git, you’ll soon enough learn the golden path scenario of about 5-6 commands. And if it doesn’t work, you’ll nuke the branch and start again until it does (it’s the only way to be sure).

You should strive to:

Move the source of Truth from the production database to the source code (no half-measures)
Create an empty environment just from your source-controlled scripts
- This will get you one step away from developing on a shared environment and one step closer to isolation

How should I version my scripts? 🔗︎

There are two main schools of thought: state-based and migration-based (or a hybrid). In the end eve

State-based approach 🔗︎

You define the start and end state (e.g. script out the table as is and how it should look in the end)
Usually, a schema compare tool generates the migration for you (e.g. alter table… add column)
Comparing different schemas (dev <-> test or dev <-> prod) will produce different migration scripts
State-based approach is great for collaboration
- You can see the desired state
- Since source code is usually stored as one script file per object, you can see git conflicts when two people try to edit at the same time
The migration scripts are only as good as the schema compare tool. It often struggles with renames, online deployments or anything beyond AdventureWorks-level complexity.

Migration-based approach 🔗︎

Mostly the opposite of the state-based approach
You write numbered migration scripts and apply them in order
You store a version table on the target and only apply the missing migrations
Better control over deployment script (manually created)
Unless you run the migration scripts against an environment, you don’t see the desired state
Conflicts are harder to detect as changes against the same object can be in different script files
Possible out-of-order merges and deploys

Hybrid approach 🔗︎

The best of both worlds
- state-based approach for source control
- migration-based approach for deployments
More moving parts mean it’s more brittle

In the end, it’s all migration scripts and you can switch back and forth between the approaches to some extent.

And the rest 🔗︎

There is much more to cover. From the top of my head:

git branch strategy
monorepo vs manyrepo
- Store application code along with the database code vs separate storage
- Store all databases in a single repo vs one repo per DB
- Variations of the above
- You have to pick your poison either scaling or dependency hell
Unit testing
Hotfixes and schema drift
Empty schema vs minimal dataset vs curated dataset vs generated data
Online deployments
Rollback vs rollforward
Deployment feature flags
Free tools, paid tools, homegrown tools
Monitoring and alerting

As you can see this is a pretty complex topic which doesn’t have a silver bullet answer yet and the tools and approaches are still evolving. I hope you found this crash course useful.

Thank you for reading.