Table of Contents
List of Figures
Table of Contents
dbmig is a schema management tool for relational databases that is fully compatible with Continuous Delivery. It aims to set and fulfill a gold standard in this area by providing the capability for full automation throughout the software development lifecycle.
dbmig is free software, meaning that anyone can use, share, and modify it. It is licensed under the GNU General Public License (GPL), version 3.
A standard approach to building and using software is generally a good thing. By having a sufficiently well-adopted standard, a group of people who are interested in developing software have fewer challenges facing them:
A good standard provides guidance for developing software so that a new joiner to the field is capable of focussing the majority of their efforts on the real problem they want to solve with software, rather than the tooling or supporting infrastructure.
A well-adopted standard will provide many examples of software that already uses that standard, to act as reference material.
A well-adopted standard will provide any software developer with a community of people already well-versed in that standard, able to offer assistance to anyone who may need it.
A well-adopted standard will allow a community of people to pool their resources and think about how to improve their standard and make it better.
At the time of writing (mid 2014!), there does not appear to be any prevalent standard for the management and deployment of relational database schema. In the author's experience of real software projects in the enterprise and corporate settings, unnecessary effort is still expended in devising custom "standards" and tooling that is not reusable, and some approaches repeat commonly-seen problems.
dbmig aims to be a gold standard in the field of managing and deploying schema for relational databases.
dbmig goes further than most existing attempts by providing a setup that is fully compatible with the idea of Continuous Delivery. It attempts to provide an automated solution to the common tasks performed at the beginning of the deployment pipeline, including, crucially, the management of non-linear development.
Hopefully, dbmig will one day be sufficiently popular to be available in the library of software available to package managers, or available as a binary download. But for the time being, if you want to use dbmig, you will need to build it from source.
dbmig is built using GNU Autotools. Normally, the following commands should be sufficient to configure and build the package:
$ ./bootstrap.sh $ ./configure $ make
You can then install the package (with sufficient privileges) by issuing:
# make install
Table of Contents
Relational databases, like any data store, have a schema that can be considered separate from their data content. Unlike upgrading a binary, where we have the luxury of trashing what is there and starting again without penalty, any change to the schema of a database table must preserve data already within that store, and so a non-destructive upgrade is required.
Given a current state schema and a desired target schema, many tools attempt to automatically generate a synchronisation script that can upgrade a database installation from the current state to the target. However, although such utilities provide a helping hand, there will always be scenarios where manual intervention is required on the part of the developer - for example, in migrating data between an old and new format. For this reason, an upgrade script must be considered part of the essential source code for database schema management.
An upgrade of a relational database may not succeed (whether obvious immediately or only later), and so any software that claims to be robust must support a mechanism that will aim to restore the database to a previous working state.
Like an upgrade script, and for the same reasons, the actions to roll back cannot always be generated automatically by comparing current and desired target state. Therefore, they also form an essential part of the source code for database schema management.
Scripts that try roll back must aim to precisely reverse the effect of some prior upgrade work. In the general case, performing upgrade work immediately followed by performing the corresponding roll-back work should result in no logical change in the target database installation.
It is not always possible to perfectly roll back some prior upgrade: if an upgrade removes a defunct field from a table forever, a roll back script would not be expected to restore that deleted data (if it really were necessary to do so, the upgrade script would probably have put the data in an 'backup' table or similar, in which case it would not really have been removed).
Developers of software using relational databases will always come across scenarios where there is currently no database installed, but one is desired. For example, it could be a new installation at a brand new site, or because a developer wishes to spin up a test environment for feature development or debugging. A specific version of the database may be needed, or perhaps the latest bleeding-edge version is required. A method is needed to support this.
One common approach seen in some software development teams is to take a copy of the relational database schema (and perhaps data) from a live environment, and use it as a baseline in order to develop new changes. In such an approach, it is important to note that source control is no longer the master of the relational database schema! And a dependency on an existing installation has now been introduced into the very first stages of the deployment pipeline. This goes against the ideas in Continuous Delivery, and limits the scope of development testing that can be done.
dbmig takes an approach whereby source control is the master, and any installation of a database schema, whether a historic version or the bleeding edge version under development, is taken solely from source control.
The order in which any upgrade work takes place is important: the running of scripts is non-commutative for a relational database.
As such, in order to have sufficient information about what has been deployed, it is necessary to have access to a record of precisely what scripts were run, and in what order. This record is known as the database changelog, and is a core concept to both dbmig and earlier tools.
Transactions that take place against a database are committed in a linear fashion, but the way that software is developed isn't linear: teams of developers may work on changes simultaneously, and sometimes an emergency change may be needed to a live database whilst development is still ongoing for the next planned release.
dbmig allows developers to work on their own changes, and for changes from others to be merged in, whilst ensuring that differences between source control and deployed environments are properly recognised. dbmig makes it easy to find out when a deployed environment is in an inconsistent state, and lets you decide how to resolve it before it becomes a problem.
dbmig has adopted Semantic
Versioning (SemVer) as the versioning scheme of choice. It is
formally-specified, and comprehensive. Developers will code for a known
version of their software, specified in SemVer. Since many scripts may
form a particular release, dbmig makes use of the build meta-data portion
of the version number to encode a script number n
, as shown in Figure 4.1, “SemVer build metadata convention used by dbmig”.
Several early attempts to automate the management of database schema source chose to use a simple linear sequence of numbers, but this does not lend itself well to handling non-linear development. Semantic versioning allows developers to target a known next version (e.g. 1.5.0 to 1.6.0), whilst still supporting the possibility for emergency patch releases (e.g. 1.5.0 to 1.5.1) without forcing everyone to re-number their scripts.
Table of Contents
dbmig requires a standard way to arrange your database schema source code in your repository. This layout consists of three top-level directories, each with a specific purpose:
install
directory contains scripts
that will deploy to a target environment where no existing database is
already in place.
upgrade
directory contains scripts
that will take an existing database installation to another version.
latest
directory contains scripts
that represent the latest bleeding edge state of the database schema
without any explicit version associated, designed
to be used on a target environment where no existing database is already
in place.
The install
directory contains scripts
that will deploy to a target environment where no existing database is
already in place. Each script has an explicit semantic version
associated with it, representing the version of the database that will
result if this install script were to be run on an empty target
environment.
In order to associate a semantic version with a script, a filename convention must be adopted. One can either prefix a script with the full semantic version, or place the script under a sub-directory named after the major/minor/patch part of the version and then prefix the script with the script number. Figure 5.1, “Install directory using files in versioned sub-directories” and Figure 5.2, “Install directory using files with full version prefix” show example directories employing both naming conventions.
Figure 5.1. Install directory using files in versioned sub-directories
my_db/ install/ 1.2.3/ 0052_install_some_stuff.sql
Figure 5.2. Install directory using files with full version prefix
my_db/ install/ 1.2.3+script.0052_install_some_stuff.sql
Note that leading zeroes may be used in filenames as a convenience measure (to allow lexicographical order to match numerical order), but they will be stripped by dbmig when intrepreted to a real SemVer.
Since there are usually just a very small number of install scripts in place for an application (even a long-lived application), the latter style using full version prefix without sub-directories is preferred for install scripts.
Typically, an application will start with just one install script (to install the very first version of the application), and then author many piecemeal upgrades in individual upgrade scripts. In order to make the process of installation easier and quicker, an application can choose to create subsequent install scripts that compress many iterations' worth of upgrades into a single file.
The upgrade
directory contains scripts
that will take an existing database installation to another version.
Each script has an explicit semantic version associated with it.
Each script contains both the actions needed to upgrade to a later version as well as the actions to roll back, in a single file. Thus, the version associated with a file represents either:
In order to support upgrade and roll-back actions in the same file, a
magic separator line consisting of the text "--//@UNDO
" must be in place somewhere in the file.
All the upgrade actions must be listed first, followed by the magic
separator, followed by the roll-back actions.
Figure 5.3, “An example upgrade script” shows a simple example.
Figure 5.3. An example upgrade script
-- Upgrade portions of the script go here. ALTER TABLE human_being ADD weight_in_kg DECIMAL(5,2) NULL; --//@UNDO -- Roll-back statments now follow. ALTER TABLE human_being DROP COLUMN weight_in_kg;
dbmig adopted this convention of using a magic separator from DBDeploy.
In order to associate a semantic version with a script, a filename convention must be adopted. One can either prefix a script with the full semantic version, or place the script under a sub-directory named after the major/minor/patch part of the version and then prefix the script with the script number. Figure 5.4, “Upgrade directory using files in versioned sub-directories” and Figure 5.5, “Upgrade directory using files with full version prefix” show example directories employing both naming conventions.
Figure 5.4. Upgrade directory using files in versioned sub-directories
my_db/ upgrade/ 1.3.0/ 0001_new_columns_on_foo_table.sql 0002_foo_field_lengths_increased.sql 2.0.0/ 0001_add_the_quux_table.sql
Figure 5.5. Upgrade directory using files with full version prefix
my_db/ upgrade/ 1.3.0+script.0001_new_columns_on_foo_table.sql 1.3.0+script.0002_foo_field_lengths_increased.sql 2.0.0+script.0001_add_the_quux_table.sql
Note that leading zeroes may be used in filenames as a convenience measure (to allow lexicographical order to match numerical order), but they will be stripped by dbmig when intrepreted to a real SemVer.
Since a given release of a database is usually composed of many scripts (usually covering different features and written by different developers), the versioned sub-directory style is preferred for upgrade scripts.
The latest
directory contains scripts
that represent the latest bleeding edge state of the database schema
without any explicit version associated, designed
to be used on a target environment where no existing database is already
in place.
Usage of this directory is not yet implemented in dbmig. Documentation will be written accordingly once the feature is available.
Documentation incomplete - watch this space
The supplied command-line program, called
dbmig
, provides almost all functionality
required in order to manage and deploy database source code. The
following terminology is used:
The connection string format that dbmig uses is governed in reality by
the database library that is used internally. It is supplied using
the --target
option. Examples are
shown in Figure 7.1, “Connection String Examples”.
Figure 7.1. Connection String Examples
postgresql://dbname=<mydb> postgresql://dbname=<mydb> user=someone password=hunter2
dbmig's show
command will print the
currently-installed version of a given database target.
$ dbmig --target=<conn_str> show Version installed: 1.3.0+script.1
By default, dbmig will assume that the root level of the repository
(i.e. where your database schema source code is) to use is in the
current working directory. This can be overridden by the
--repo-dir
option if needed.
$ cd ~/path/to/my/database/repository $ dbmig ... (assumes repository is current working directory) $ cd ~ $ dbmig --repo-dir=~/path/to/my/database/repository ... (takes repository directory as specified)
dbmig's migrate
command will allow you
to attempt to move a given target database installation to a new
version, using the various scripts in the source repository specified.
It doesn't matter what version is currently installed in the target environment (if any at all); dbmig will work out the right steps to take. If specified with no further arguments, dbmig will attempt to migrate to the latest available version in the repository.
$ dbmig --target=<conn_str> show Version installed: 1.3.0+script.1 $ dbmig --target=<conn_str> migrate Run upgrade script 1.3.0/0002_foo_field_lengths_increased.sql? [yn] y Upgraded to 1.3.0+script.2
If you pass a specific version to the
migrate
command, using the
--version
argument, you can choose the
target version to migrate to, which provides the ability to roll back.
$ dbmig --target=<conn_str> show Version installed: 1.3.0+script.2 $ dbmig --target=<conn_str> migrate --version=1.2.3+script.52 Run rollback script 1.3.0/0002_foo_field_lengths_increased.sql? [yn] y Run rollback script 1.3.0/0001_new_columns_on_foo_table.sql? [yn] y Rolled back to 1.2.3+script.52
By default, dbmig will ask for confirmation before performing any task
that will modify the target database. You can override this behaviour
by specifying the
--force
argument when invoking dbmig.
Furthermore, dbmig is terse by default. If you want to see diagnostic
and other potentially-interesting information about what is going on when
you invoke a command, supply the
--verbose
argument.
$ dbmig --target=<conn_str> migrate --verbose --force ... (loads of info here) ...