Chapter 1. Introduction

Table of Contents

What is dbmig?
Motivation
Inspiration

What is dbmig?

dbmig is a schema management tool for relational databases that is fully compatible with Continuous Delivery. It aims to set and fulfill a gold standard in this area by providing the capability for full automation throughout the software development lifecycle.

dbmig is free software, meaning that anyone can use, share, and modify it. It is licensed under the GNU General Public License (GPL), version 3.

Motivation

A standard approach to building and using software is generally a good thing. By having a sufficiently well-adopted standard, a group of people who are interested in developing software have fewer challenges facing them:

A good standard provides guidance for developing software so that a new joiner to the field is capable of focussing the majority of their efforts on the real problem they want to solve with software, rather than the tooling or supporting infrastructure.
A well-adopted standard will provide many examples of software that already uses that standard, to act as reference material.
A well-adopted standard will provide any software developer with a community of people already well-versed in that standard, able to offer assistance to anyone who may need it.
A well-adopted standard will allow a community of people to pool their resources and think about how to improve their standard and make it better.

At the time of writing (mid 2014!), there does not appear to be any prevalent standard for the management and deployment of relational database schema. In the author's experience of real software projects in the enterprise and corporate settings, unnecessary effort is still expended in devising custom "standards" and tooling that is not reusable, and some approaches repeat commonly-seen problems.

dbmig aims to be a gold standard in the field of managing and deploying schema for relational databases.

dbmig goes further than most existing attempts by providing a setup that is fully compatible with the idea of Continuous Delivery. It attempts to provide an automated solution to the common tasks performed at the beginning of the deployment pipeline, including, crucially, the management of non-linear development.

Inspiration

The author of dbmig has taken inspiration from a number of existing tools that perform a similar role to dbmig:

Chapter 2. Building and Installing

Hopefully, dbmig will one day be sufficiently popular to be available in the library of software available to package managers, or available as a binary download. But for the time being, if you want to use dbmig, you will need to build it from source.

dbmig is built using GNU Autotools. Normally, the following commands should be sufficient to configure and build the package:

$ ./bootstrap.sh
$ ./configure
$ make

You can then install the package (with sufficient privileges) by issuing:

# make install

Chapter 3. Essential Requirements of Database Schema Management

Table of Contents

Non-destructively upgrade to the next version
Roll back to a prior version
Create a new database installation from scratch
Awareness of what has already been deployed
Manage non-linear development

Non-destructively upgrade to the next version

Relational databases, like any data store, have a schema that can be considered separate from their data content. Unlike upgrading a binary, where we have the luxury of trashing what is there and starting again without penalty, any change to the schema of a database table must preserve data already within that store, and so a non-destructive upgrade is required.

Given a current state schema and a desired target schema, many tools attempt to automatically generate a synchronisation script that can upgrade a database installation from the current state to the target. However, although such utilities provide a helping hand, there will always be scenarios where manual intervention is required on the part of the developer - for example, in migrating data between an old and new format. For this reason, an upgrade script must be considered part of the essential source code for database schema management.

Roll back to a prior version

An upgrade of a relational database may not succeed (whether obvious immediately or only later), and so any software that claims to be robust must support a mechanism that will aim to restore the database to a previous working state.

Like an upgrade script, and for the same reasons, the actions to roll back cannot always be generated automatically by comparing current and desired target state. Therefore, they also form an essential part of the source code for database schema management.

Scripts that try roll back must aim to precisely reverse the effect of some prior upgrade work. In the general case, performing upgrade work immediately followed by performing the corresponding roll-back work should result in no logical change in the target database installation.

It is not always possible to perfectly roll back some prior upgrade: if an upgrade removes a defunct field from a table forever, a roll back script would not be expected to restore that deleted data (if it really were necessary to do so, the upgrade script would probably have put the data in an 'backup' table or similar, in which case it would not really have been removed).

Create a new database installation from scratch

Developers of software using relational databases will always come across scenarios where there is currently no database installed, but one is desired. For example, it could be a new installation at a brand new site, or because a developer wishes to spin up a test environment for feature development or debugging. A specific version of the database may be needed, or perhaps the latest bleeding-edge version is required. A method is needed to support this.

One common approach seen in some software development teams is to take a copy of the relational database schema (and perhaps data) from a live environment, and use it as a baseline in order to develop new changes. In such an approach, it is important to note that source control is no longer the master of the relational database schema! And a dependency on an existing installation has now been introduced into the very first stages of the deployment pipeline. This goes against the ideas in Continuous Delivery, and limits the scope of development testing that can be done.

dbmig takes an approach whereby source control is the master, and any installation of a database schema, whether a historic version or the bleeding edge version under development, is taken solely from source control.

Awareness of what has already been deployed

The order in which any upgrade work takes place is important: the running of scripts is non-commutative for a relational database.

As such, in order to have sufficient information about what has been deployed, it is necessary to have access to a record of precisely what scripts were run, and in what order. This record is known as the database changelog, and is a core concept to both dbmig and earlier tools.

Manage non-linear development

Transactions that take place against a database are committed in a linear fashion, but the way that software is developed isn't linear: teams of developers may work on changes simultaneously, and sometimes an emergency change may be needed to a live database whilst development is still ongoing for the next planned release.

dbmig allows developers to work on their own changes, and for changes from others to be merged in, whilst ensuring that differences between source control and deployed environments are properly recognised. dbmig makes it easy to find out when a deployed environment is in an inconsistent state, and lets you decide how to resolve it before it becomes a problem.

Chapter 4. A Versioning Scheme: SemVer

dbmig has adopted Semantic Versioning (SemVer) as the versioning scheme of choice. It is formally-specified, and comprehensive. Developers will code for a known version of their software, specified in SemVer. Since many scripts may form a particular release, dbmig makes use of the build meta-data portion of the version number to encode a script number n, as shown in Figure 4.1, “SemVer build metadata convention used by dbmig”.

Figure 4.1. SemVer build metadata convention used by dbmig

<major>.<minor>.<patch>+script.<n>

Several early attempts to automate the management of database schema source chose to use a simple linear sequence of numbers, but this does not lend itself well to handling non-linear development. Semantic versioning allows developers to target a known next version (e.g. 1.5.0 to 1.6.0), whilst still supporting the possibility for emergency patch releases (e.g. 1.5.0 to 1.5.1) without forcing everyone to re-number their scripts.

Chapter 5. Relational Database Project Structure

Table of Contents

The install directory
The upgrade directory
The latest directory

dbmig requires a standard way to arrange your database schema source code in your repository. This layout consists of three top-level directories, each with a specific purpose:

The install directory contains scripts that will deploy to a target environment where no existing database is already in place.
The upgrade directory contains scripts that will take an existing database installation to another version.
The latest directory contains scripts that represent the latest bleeding edge state of the database schema without any explicit version associated, designed to be used on a target environment where no existing database is already in place.

The `install` directory

The install directory contains scripts that will deploy to a target environment where no existing database is already in place. Each script has an explicit semantic version associated with it, representing the version of the database that will result if this install script were to be run on an empty target environment.

In order to associate a semantic version with a script, a filename convention must be adopted. One can either prefix a script with the full semantic version, or place the script under a sub-directory named after the major/minor/patch part of the version and then prefix the script with the script number. Figure 5.1, “Install directory using files in versioned sub-directories” and Figure 5.2, “Install directory using files with full version prefix” show example directories employing both naming conventions.

Figure 5.1. Install directory using files in versioned sub-directories

my_db/
  install/
    1.2.3/
      0052_install_some_stuff.sql

Figure 5.2. Install directory using files with full version prefix

my_db/
  install/
    1.2.3+script.0052_install_some_stuff.sql

Note that leading zeroes may be used in filenames as a convenience measure (to allow lexicographical order to match numerical order), but they will be stripped by dbmig when intrepreted to a real SemVer.

Since there are usually just a very small number of install scripts in place for an application (even a long-lived application), the latter style using full version prefix without sub-directories is preferred for install scripts.

Typically, an application will start with just one install script (to install the very first version of the application), and then author many piecemeal upgrades in individual upgrade scripts. In order to make the process of installation easier and quicker, an application can choose to create subsequent install scripts that compress many iterations' worth of upgrades into a single file.

The `upgrade` directory

The upgrade directory contains scripts that will take an existing database installation to another version. Each script has an explicit semantic version associated with it.

Each script contains both the actions needed to upgrade to a later version as well as the actions to roll back, in a single file. Thus, the version associated with a file represents either:

The version that will result if this script were run on an installation of the database that is at an earlier version, OR
The version that an installation of a database must be at in order to roll back to an earlier version.

In order to support upgrade and roll-back actions in the same file, a magic separator line consisting of the text "--//@UNDO" must be in place somewhere in the file. All the upgrade actions must be listed first, followed by the magic separator, followed by the roll-back actions. Figure 5.3, “An example upgrade script” shows a simple example.

Figure 5.3. An example upgrade script

-- Upgrade portions of the script go here.
ALTER TABLE human_being ADD weight_in_kg DECIMAL(5,2) NULL;

--//@UNDO

-- Roll-back statments now follow.
ALTER TABLE human_being DROP COLUMN weight_in_kg;

dbmig adopted this convention of using a magic separator from DBDeploy.

In order to associate a semantic version with a script, a filename convention must be adopted. One can either prefix a script with the full semantic version, or place the script under a sub-directory named after the major/minor/patch part of the version and then prefix the script with the script number. Figure 5.4, “Upgrade directory using files in versioned sub-directories” and Figure 5.5, “Upgrade directory using files with full version prefix” show example directories employing both naming conventions.

Figure 5.4. Upgrade directory using files in versioned sub-directories

my_db/
  upgrade/
    1.3.0/
      0001_new_columns_on_foo_table.sql
      0002_foo_field_lengths_increased.sql
    2.0.0/
      0001_add_the_quux_table.sql

Figure 5.5. Upgrade directory using files with full version prefix

my_db/
  upgrade/
    1.3.0+script.0001_new_columns_on_foo_table.sql
    1.3.0+script.0002_foo_field_lengths_increased.sql
    2.0.0+script.0001_add_the_quux_table.sql

Note that leading zeroes may be used in filenames as a convenience measure (to allow lexicographical order to match numerical order), but they will be stripped by dbmig when intrepreted to a real SemVer.

Since a given release of a database is usually composed of many scripts (usually covering different features and written by different developers), the versioned sub-directory style is preferred for upgrade scripts.

The `latest` directory

The latest directory contains scripts that represent the latest bleeding edge state of the database schema without any explicit version associated, designed to be used on a target environment where no existing database is already in place.

Caution

Usage of this directory is not yet implemented in dbmig. Documentation will be written accordingly once the feature is available.

Chapter 6. The Changelog Table in the Database Installation

Caution

Documentation incomplete - watch this space

Chapter 7. Basics For The dbmig Command-Line Client

Table of Contents

Specifying connection strings
See what is installed
Specifying the repository

The supplied command-line program, called dbmig, provides almost all functionality required in order to manage and deploy database source code. The following terminology is used:

A repository refers to the location on disk of a database's project directory structure, as described in Chapter 5, Relational Database Project Structure.
A target refers to an instance of a database on some database server that can be deployed to. A target is identified using a connection string, the format of which is documented in the section called “Specifying connection strings”.

Specifying connection strings

The connection string format that dbmig uses is governed in reality by the database library that is used internally. It is supplied using the --target option. Examples are shown in Figure 7.1, “Connection String Examples”.

Figure 7.1. Connection String Examples

postgresql://dbname=<mydb>
postgresql://dbname=<mydb> user=someone password=hunter2

See what is installed

dbmig's show command will print the currently-installed version of a given database target.

$ dbmig --target=<conn_str> show
Version installed: 1.3.0+script.1

Specifying the repository

By default, dbmig will assume that the root level of the repository (i.e. where your database schema source code is) to use is in the current working directory. This can be overridden by the --repo-dir option if needed.

$ cd ~/path/to/my/database/repository
$ dbmig ...
(assumes repository is current working directory)

$ cd ~
$ dbmig --repo-dir=~/path/to/my/database/repository ...
(takes repository directory as specified)

Chapter 8. Deploying a New Version With dbmig

dbmig's migrate command will allow you to attempt to move a given target database installation to a new version, using the various scripts in the source repository specified.

It doesn't matter what version is currently installed in the target environment (if any at all); dbmig will work out the right steps to take. If specified with no further arguments, dbmig will attempt to migrate to the latest available version in the repository.

$ dbmig --target=<conn_str> show
Version installed: 1.3.0+script.1

$ dbmig --target=<conn_str> migrate
Run upgrade script 1.3.0/0002_foo_field_lengths_increased.sql? [yn] y
Upgraded to 1.3.0+script.2

If you pass a specific version to the migrate command, using the --version argument, you can choose the target version to migrate to, which provides the ability to roll back.

$ dbmig --target=<conn_str> show
Version installed: 1.3.0+script.2

$ dbmig --target=<conn_str> migrate --version=1.2.3+script.52
Run rollback script 1.3.0/0002_foo_field_lengths_increased.sql? [yn] y
Run rollback script 1.3.0/0001_new_columns_on_foo_table.sql? [yn] y
Rolled back to 1.2.3+script.52

By default, dbmig will ask for confirmation before performing any task that will modify the target database. You can override this behaviour by specifying the --force argument when invoking dbmig.

Furthermore, dbmig is terse by default. If you want to see diagnostic and other potentially-interesting information about what is going on when you invoke a command, supply the --verbose argument.

$ dbmig --target=<conn_str> migrate --verbose --force
...
(loads of info here)
...

Chapter 9. Non-Linear Development With dbmig

Caution

Documentation incomplete - watch this space

dbmig User Guide

Adam Szmigin

Foreword

Warning

Chapter 1. Introduction

What is dbmig?

Motivation

Inspiration

Chapter 2. Building and Installing

Chapter 3. Essential Requirements of Database Schema Management

Non-destructively upgrade to the next version

Roll back to a prior version

Create a new database installation from scratch

Awareness of what has already been deployed

Manage non-linear development

Chapter 4. A Versioning Scheme: SemVer

Chapter 5. Relational Database Project Structure

The `install` directory

The `upgrade` directory

The `latest` directory

Caution

Chapter 6. The Changelog Table in the Database Installation

Caution

Chapter 7. Basics For The dbmig Command-Line Client

Specifying connection strings

See what is installed

Specifying the repository

Chapter 8. Deploying a New Version With dbmig

Chapter 9. Non-Linear Development With dbmig

Caution

dbmig User Guide

Adam Szmigin

Foreword

Warning

Chapter 1. Introduction

What is dbmig?

Motivation

Inspiration

Chapter 2. Building and Installing

Chapter 3. Essential Requirements of Database Schema Management

Non-destructively upgrade to the next version

Roll back to a prior version

Create a new database installation from scratch

Awareness of what has already been deployed

Manage non-linear development

Chapter 4. A Versioning Scheme: SemVer

Chapter 5. Relational Database Project Structure

The install directory

The upgrade directory

The latest directory

Caution

Chapter 6. The Changelog Table in the Database Installation

Caution

Chapter 7. Basics For The dbmig Command-Line Client

Specifying connection strings

See what is installed

Specifying the repository

Chapter 8. Deploying a New Version With dbmig

Chapter 9. Non-Linear Development With dbmig

Caution

The `install` directory

The `upgrade` directory

The `latest` directory