Docs: Recipes

This page describes how to use Skeema to achieve common schema management tasks.

Linting database objects

Linting all object definitions is very easy in a CI pipeline: simply execute skeema lint. In addition to checking for schema design problems, by default this will also flag CREATE statements with non-canonical formatting, which may be undesirable in CI; you can disable that behavior with --skip-format.

If you have many object definitions with low-importance legacy problems, it may be preferable to just lint objects that have changed in a given commit or pull request. By default, skeema diff and skeema push will lint objects that have modifications in the diff. See the recipe for generating DDL by comparing Git commits for advice on automating this linting and capturing STDERR.

Enforcing a strict SQL style guide for CREATE statements

If you want to enforce strict formatting of CREATE statements to match the canonical format of the database server’s SHOW CREATE output, use skeema format. By default, this will rewrite .sql files in-place, and return an exit code of 0 only if all CREATEs were already in the canonical format.

In a CI pipeline, if you only care about the exit code and don’t wish to actually rewrite the files, use --skip-write.

Testing schema compatibility between MySQL and MariaDB

With Skeema and Docker, you can combine Skeema’s Docker workspaces feature with a command-line override of the flavor option, in order to test cross-vendor compatibility. For example, if your current database server is MariaDB and you want to check schema compatibility on MySQL 8, simply run:

skeema lint --workspace=docker --docker-cleanup=destroy --flavor=mysql:8.0

Learn more in our MySQL vs MariaDB: Tables and DDL blog post.

Testing schema compatibility before a server major version upgrade

The technique described in the previous section on MySQL/MariaDB compatibility can also be used to compare different server versions with each other, for example before upgrading from MySQL 5.7 to MySQL 8.0.

If you need to override any session variables in the Docker workspace container, you can set --connect-options on the command-line.

Detecting schema drift

Sometimes, your live databases may “drift” from the desired state expressed by your *.sql files. A few common causes for this include:

  • Emergency hotfix: due to an urgent production incident, an engineer may need to run a schema change directly, outside of Skeema and your normal workflow.
  • Accidental direct DDL change: if your database privileges are not sufficiently locked down, an engineer may mistakenly make a non-emergency schema change directly on the database.
  • Unsupported changes: Skeema cannot manage table renames, column renames, or certain rare ALTER TABLE cases such as sub-partitioning. In such cases, schema changes must be made directly.
  • Shard failures: in a sharded environment, each schema change must be made on each shard; it is possible that a server will fail at an inopportune time, causing a shard to miss out on a change. This becomes more likely over time if you have hundreds or thousands of shards, and perform schema changes frequently.

In all cases, Skeema’s declarative design makes drift detection trivially easy: simply run skeema diff, which will only return 0 if no differences were detected between the filesystem definitions and the live database(s).

To resolve a drift situation caused by an “out of band” schema change (first 3 bullets), assuming the live database definition is the “correct” one, you can use skeema pull to make the filesystem reflect the live database again. If you have special formatting or inline comments that you need to preserve, consider using skeema pull --skip-format, which won’t overwrite CREATEs if their semantic contents already match the live definitions.

In the fourth case (shard failures), it is often desirable to identify which database servers have drifted, instead of outputting SQL DDL. Use skeema diff --brief for this purpose. See detecting schema drift among shards.

Generating DDL by comparing Git commits

With skeema diff, it is very easy to compare the state of a live database environment with the state of the schema repo. However, in some workflows you may wish to compare different Git commits to each other, regardless of the current state of any of your actual database environments.

For example, you may wish to generate the DDL represented by a single Git commit relative to its parent commit, or generate the DDL represented by a series of commits in one feature branch or pull request. This is especially useful for these tasks:

  • Linting commits/PRs in an isolated CI environment: By default, skeema diff lints the definitions of any modified objects. This STDERR output is useful in CI, in order to provide targeted feedback on the changes in a pull request, as opposed to needlessly including linter annotations about all objects which weren’t necessarily touched by the PR. Additionally, for security and safety reasons, it is often desirable to isolate CI from real database environments.

  • Generating dry-run DDL in CI for code review purposes: Supplementing the previous bullet, the diff STDOUT output can also be captured and presented to code reviewers, so that they can see the generated SQL DDL corresponding to the pull request.

  • Auto-generating “migrations” for execution with another tool: Some companies choose to eschew skeema push and instead wish to use a traditional imperative migration tool for actual execution of DDL. Mechanically identical to the previous bullet, the diff STDOUT output can be captured and used for this purpose.

These goals can be accomplished by diffing changes against an ephemeral sandbox database, usually on the same machine as the CI runner. As a one-time setup step, configure this database as an environment in your .skeema files, for example adding a [ci] section containing host=127.0.0.1.

At a high level, the basic CI pipeline steps include:

  1. Spin up an empty sandbox database server. Some CI platforms offer this via YAML configuration. Others allow you to use Docker, for example docker run -d -p 3306:3306/tcp -e MYSQL_ALLOW_EMPTY_PASSWORD=1 mysql:8.0. Be sure to match the database server flavor/version used in your other environments.
  2. Checkout the older comparison commit. For a branch or pull request, you should git checkout the SHA of the merge-base, which many CI platforms will provide in an environment variable. In other cases, it might make sense to checkout the previous commit (HEAD~) or first parent of a merge (HEAD^).
  3. Deploy that older comparison commit’s schema to the sandbox database server. One way to do this is with skeema push ci --skip-lint --skip-verify. Alternatively you can use the MySQL client directly, for example cat *.sql | mysql -h 127.0.0.1 --init-command="SET SESSION foreign_key_checks=0" mydb.
  4. Checkout the new commit which triggered the CI run. On some CI platforms, this may be as simple as git checkout - to return to whichever commit was already checked out by default prior to step 2.
  5. Generate the DDL by diffing the working tree against the CI database server. Run skeema diff ci and capture the STDOUT, STDERR, and exit code as needed. Be sure to prevent exit code 1 from causing the pipeline to fail.

There are several ways to process the generated DDL from STDOUT. For example, you could write custom automation logic to do any of the following:

  • Store it to a database, perhaps keyed to the commit SHA, pull request number, or branch name
  • Write it to a file and create a commit
  • Create a pull request comment, for example using GitHub Actions

Continuous delivery

The previous section covers CI, but what if you need fully automated CI/CD? Extra caution is required for fully automating schema changes upon main-branch commits/merges.

The simplest approach is to automate skeema push whenever code is committed. However, you will need to handle locking/queueing to ensure only one push occurs at a time. Otherwise, overlapping pushes may re-detect and re-attempt the same ALTER TABLEs from the previous set of changes that are still in-flight.

More advanced approaches build on the CI pipeline in the previous section, such as sending the STDOUT from step 5 to a system responsible for scheduling and executing schema changes. Another possibility is to use skeema push with the ddl-wrapper option, to have another system process each generated DDL statement individually. This avoids the need to tokenize/parse STDOUT to split up the separate SQL statements.

Regardless of approach, when automating schema change execution, do not ever use allow-unsafe in a production automation pipeline. The risk of data loss is too high if a commit contains human error.

In Skeema Premium, SSH tunnels can be leveraged to allow communication between CI platforms and internal database systems. Take extreme caution with SSH key security, OS user permissions, and database privilege grants if using this technique.

Continuous delivery automation is an advanced topic, with many potential architectural differences depending on a company’s specific requirements. If your company needs assistance with configuring Skeema in complex automation, consider subscribing to Skeema Max in order to obtain private technical support, or contact us to discuss expert consulting, custom development services, or a monthly advisory retainer.

Detecting auto_increment columns at risk of maxing out

Auto_increment columns are extremely useful, but they present an operational risk if the column data type is too small: once the maximum value for the column type has been generated, subsequent INSERTs will fail. Skeema’s lint-auto-inc and allow-auto-inc are useful for restricting what data types are acceptable for auto_increment columns, for example limiting them to bigint unsigned. But what if you already have some non-bigint columns that you cannot adjust yet? Skeema’s lint-auto-inc can also be used as part of a cron script to detect these cases, but it requires some non-default settings.

Ordinarily, when Skeema’s dump commands (skeema init and skeema pull) write a CREATE TABLE to the filesystem, they exclude auto_increment counters, since these values change after every INSERT. But in order for lint-auto-inc to examine auto_increment value exhaustion, the counters must be present in the CREATE TABLE statements. You can include them (or refresh them with the latest prod values) by using skeema pull --include-auto-inc in your cron. You can just do this in a working copy of your schema repo – you don’t necessarily need to commit the result, as long as the up-to-date-from-production auto_increment counters are present in the filesystem for the next step.

Now that the counters are in-place, your cron script just needs to run skeema lint --lint-auto-inc=error, and a fatal error will be generated if any auto_increment counter exceeds 80% of its data type’s maximum value. Depending on your other .skeema file settings, you may also wish to use some additional command-line option overrides, for example to disable other linter checks or to use a more permissive allow-auto-inc.

Finally, your cron script should examine the exit code from skeema lint and take action if it was 2 or higher, indicating at least one linter error annotation was emitted.

Bulk-altering table character sets

For examples of how to convert the character set of many objects at once, see our blog post on safely switching character sets. This post describes how to combine Skeema with bulk search-and-replace tools like sed, in order to automatically generate DDL to manipulate all objects in your schema.

Using Skeema only for stored procedures and functions

If you wish, you can use Skeema to manage deployment of stored procedures and functions, combined with a traditional imperative migration tool for managing other types of objects. Simply configure ignore-table=. in a top-level .skeema file, and all Skeema commands will now only interact with procs/funcs. Learn more about this technique in our blog post on versioning and deploying stored procedures.

Avoiding server OS incompatibilities from lower_case_table_names

Our blog post on understanding the lower_case_table_names server variable describes several techniques for avoiding operating system case sensitivity problems using Skeema.