Understanding Schema Migration: It's like Git, but for DBs

Cover image

I know you are big on ensuring requirements are pretty clear before you commence development, this includes making sure you get your database designs right, from relationships down to fleshing out your schema, you ensure due diligence and you always double check with clients to okay your assumptions periodically, and at the end of the day, you finally arrive at your “final” design — or maybe not.

This is real life, final is only as long as it lasts. Things change arbitrarily, feelings change and people even fall out of love, how much more changing database schemas?

Well, it’s not like you have a problem with modifying schemas though, after all, all you need to do is drop the db associated with the old schema, modify your schema and recreate your db. Pretty straightforward isn’t it? except that you were in development, the dummy data you dropped in your local DB was of no business importance.

Now that you’re in production, doing away with data here can literally mess up your life(and I mean it), depending on the kind of client you’re serving and the importance of the data you’re handling.

Schema migrations to your rescue!

What is Schema Migration like I’m Pro?

According to Wikipedia, this refers to the management of incremental, reversible changes and version control to relational database schemas.

It’d be nice to point out that the term Schema Migration is often used interchangeably with Database migration. While a lot of people agree that both terms indeed should refer to the same thing, however, there exist some conflicting views as well. Something also worthy of note, is that context matters in the discussion too, as Database Migration, in enterprise discussions, is likely to refer to the process of selecting, preparing, extracting, and transforming data and permanently transferring it from one computer storage system to another.

What is Schema Migration like I’m Noob?

Think about it like Git, but for Database Schemas. For Git, I track my source code changes, but for Schema Migrations, I track changes to my Database schemas.

Let’s say I started out designing a Database table to hold articles I publish on my blog, where I document my travel experiences. At the start of my design, I wanted my blog to be pretty simple and straightforward , nothing fancy. So I end up with a sample table schema(written in Django) like below.

class Articles(models.Model):
    id = models.AutoField(“id”, primary_key=True)
    title = models.TextField(default=””)
    content = models.TextField(default=””)
    created = models.DateTimeField(auto_now_add=True)
    updated = models.DateTimeField(auto_now=True)

This is a simple table to hold my articles, the fields are id(primary key), title of my article, the actual content, created which is the date it was created and updated, which is the date it was last updated.

Now, let’s assume I go ahead and settle for this design and finally launch my blog on the internet and start out putting out actual content, and thankfully, I start attracting a good number of readers. All seems fine, until I noticed my readers were reaching out and keenly asking for more visual content i.e they wanted me to at least include pictures in my articles to give them a better feel of my travel experiences.

Now, if I were still on development, without utilising schema migrations and somehow became privy of this reader needs, all I’d need to do is redesign my schema as thus:

class Articles(models.Model):
    id = models.AutoField(“id”, primary_key=True)
    title = models.TextField(default=””)
    content = models.TextField(default=””)
    image_url = models.TextField(default=””)
    created = models.DateTimeField(auto_now_add=True)
    updated = models.DateTimeField(auto_now=True)

adding a new field to hold my image URLs and then drop my existing db and recreate a new one, using the new schema.

Like I mentioned earlier in the opening story, this is perfectly fine as I am still in development and the test/dummy data stored on my DB possesses little to no business or personal value. But the moment I get on production and start publishing actual articles, attracting actual readers and getting those articles shared by readers who love them, I totally lose this luxury.

This is where migration comes in, it keeps track of all the changes you make to your schema in what’s called a migration file and in turn, uses this file as the single source of truth as regards the state of your database at every point in time, when it needs to make changes.

With Schema Migration, the previous scenario would play out as thus:

  1. I start out my travel blog project with my initial Schema design
  2. Run my initial schema migration to track the initial state of the schema and automatically create a matching table in my DB.
  3. After I get recommendations by readers to make my articles more visually appealing by adding images, I modify my schema to include a new image URL field to hold images.
  4. I run another migration to capture the new state of the schema and in turn reflect those changes in my DB.

The above process is basically how the migration workflow happens. You make incremental changes to your schema, you notify the migration engine by running a migration, it generates a migration file, which is basically a note of the current state of your DB at the time of running, alongside the new changes you add and finally, uses this note to make the relevant changes to your database.

This is somewhat similar to the git flow of staging and committing changes.

A few things to note

  • Migration engines typically work best with ORM systems that abstracts Database operations at a high level.
  • Different languages and frameworks provide their own migration engines.
  • Migration engines automatically modify your Database. Migration engines aren’t as smart as humans, sometimes they make mistakes and you’d sometimes need to double check generated migration files to ensure the actions they are to perform does tally with what you really need, and when they do not, you’re expected to modify them to reflect your needs.

Remember more noteworthy things about schema migrations and migration engines? Feel free to add yours in the comment section.

Closure

That was a brief, non-technical introduction to the concept of Schema migrations. It was aimed to be as code/language-agnostic as possible, aside the table schema demonstrated in Django. I hope this helps you get a grasp of what the heck migrations are and why you might(definitely) need them.

Thanks for reading :)