Docs: Getting Started

Create a git repo of CREATE statements

Use the skeema init command to generate a “schema dump”: a directory tree of CREATE files (CREATE TABLE, CREATE PROCEDURE, CREATE FUNCTION) for all tables and routines found on a DB instance. A subdirectory will be created for each schema, containing .sql files corresponding to all objects in that schema.

Connectivity options for skeema init are identical to those for the standard MySQL and MariaDB command-line clients:

  • TCP/IP connection to default port 3306: skeema init -h <host> -u <user> -p[password]
  • TCP/IP connection to custom port: skeema init -h <host> -P <port> -u <user> -p[password]
  • Localhost unix domain socket: skeema init -h localhost -S <socketpath> -u <user> -p[password]

There are several ways to supply a password:

  • If you have an existing ~/.my.cnf MySQL client configuration file, Skeema will parse it to obtain user and password.
  • To interactively prompt for password on STDIN, supply -p without a value, just like when using the mysql command-line client.
  • To supply a password directly on the command-line, for example “foo”, use -pfoo or --password=foo; do not use a space between the flag and value. (This functionality matches how the standard mysql command-line client takes passwords.)
  • You may also supply a password using the $MYSQL_PWD environment variable, instead of using the password option on the command-line.

The -d (--dir) option tells skeema init what to call the host-level directory. If omitted, it will default to the hostname, but a descriptive name is usually preferable:

skeema init -h my.prod.db.hostname -u root -p -d schemas

Skeema saves the host information in a configuration file called .skeema in the top-level created directory. Each database schema subdirectory also has its own .skeema file, defining the schema name and character set defaults. Configuration options in .skeema files “cascade” down to subdirectories, allowing you to define options that only affect particular hosts or schemas.

If your DB instance only has one relevant database schema, you may use skeema init’s --schema option to create a “flat” single-directory layout. In this case, the generated .skeema file will contain DB host information and the schema information.

The above example assumes the host information corresponds to the production environment. Multi-environment examples are found later in this document.

This example also assumes you only have a single primary DB cluster. If you have several distinct clusters (that is, masters with completely different sets of schemas/tables), run skeema init once per cluster, supplying a different --dir each time reflecting the cluster name.

Skeema is not opinionated about your repo layout. You can choose to have a single repo dedicated to Skeema files, or multiple dedicated repos (e.g. one per database cluster), or a “mono-repo” which places Skeema files in a subdirectory alongside application code. In any case, simply cd to the desired location before running subsequent skeema commands.

cd schemas

All subsequent examples assume you are already in the directory created by skeema init. Skeema’s other commands all operate recursively from the current working directory. You can manipulate all schemas on a database instance by invoking commands from a host-level directory, or just a single schema by invoking commands from a schema-level subdirectory.

All examples below also omit password handling. If you supplied a password to skeema init on the command-line, you’ll need to repeat that for all other skeema commands below. Again, you can avoid this by configuring a password in ~/.my.cnf, or by setting the $MYSQL_PWD environment variable.

Alter a table

Skeema is a declarative system: your repo is a set of CREATE statements, representing a desired state. You never need to write ALTER statements or include them in your repo. The system figures out automatically how to transition a live database into the desired state represented by the repo.

To make a schema change to an existing table, first simply edit the table’s CREATE TABLE definition in its .sql file. Then, run skeema diff to preview the auto-generated ALTER TABLE DDL which corresponds to your change. skeema diff displays the generated DDL, but does not actually run it. To execute the ALTER, use skeema push.

vi product/users.sql
skeema diff
skeema push

Ordinarily, in between skeema diff and skeema push, you would want to make a commit to a new branch, open a pull request, get a coworker to review, and merge the pull request. These steps have been elided here for brevity.

If you have large tables, you may want to configure Skeema to use an external OSC tool. Otherwise, altering a large table can be a disruptive operation.

Create or drop a table

To create a new table, simply add a new .sql file containing the desired CREATE TABLE statement. The output of skeema diff will include the new table, and running skeema push will actually create it.

Similarly, to drop a table, delete its corresponding .sql file and then run skeema diff to preview or skeema push to execute. However, since a drop is a destructive action, it will be prevented unless the --allow-unsafe option is used.

vi product/comments.sql
skeema diff
rm product/tags.sql
skeema diff
skeema diff --allow-unsafe
skeema push --allow-unsafe

To aid in rapid development, you can configure Skeema to always allow dropping empty tables or small tables with the safe-below-size option. For example, putting safe-below-size=10m in ~/schemas/.skeema will remove the requirement of specifying --allow-unsafe when dropping any table under 10 megabytes in size. Or use safe-below-size=1 to only loosen safeties for tables that have no rows. (Skeema always treats zero-row tables as size 0 bytes, as a special-case.)

Check table definitions for problems

Skeema’s linter checks the CREATE statements in *.sql files for common problems, including SQL syntax errors, undesirable storage engine or character set usage, lack of primary key, and more.

skeema lint

By default, this will rewrite all of the CREATE statements in the *.sql files to match the canonical format shown by MySQL’s SHOW CREATE, but this behavior may be disabled via --skip-format. Conversely, if you only want to reformat statements, see the skeema format command.

Update .sql files with changes made manually / outside of Skeema

If you make changes outside of Skeema – either due to use of a language-specific migration tool, or to perform an operation unsupported by Skeema – you can use skeema pull to update the filesystem to match the database. Essentially, this is the opposite of skeema push.

skeema pull

By default, this also normalizes file format like skeema format, but you can skip that behavior with the --skip-format option (or equivalently set as --format=0 or --format=false).

Keep dev and prod in-sync

Let’s assume each engineer has a dev MySQL instance on their local dev server. This example shows how to add an environment named “development”, using the --socket (-S) option to reach a MySQL instance on localhost.

As a one-time setup to configure this new environment, use skeema add-environment from the host directory previously created by init:

skeema add-environment development -h localhost -S /var/lib/mysql/mysql.sock -u root

This automatically added a new section to the .skeema file, configuring the dev environment’s connection information. You could also do this by hand instead of using skeema add-environment.

In general, you can always manually add any other config directives to any .skeema file. If you put options at the top of the file (outside of any [section]), they’ll affect all environments. Or put them in an environment section to only affect that environment.

Once your additional environments are configured, you may now interact with them by supplying the environment name as a positional arg to most Skeema commands, such as skeema diff development. As an example, here is a more advanced team workflow for schema changes:

# make the schema change in dev using any preferred method
# this is for Django, just as an example
python manage.py migrate

# pull the changes from dev into the repo
cd ~/schemas
skeema pull development

# diff the changes against production, to safety check
skeema diff production

# commit using any desired git flow
git checkout -b my-new-branch
git commit -a -m 'Updating schema files from Django migration'
git push -u origin my-new-branch
# make a pull request (or whatever your normal code workflow is)
# get coworker review / sign-off before merging

# after PR is merged, push the changes to production
# diff'ing first is not strictly necessary, but recommended
skeema diff production
skeema push production

Advanced configuration

This example shows how to configure Skeema to use the following set of rules:

  • Development
    • MySQL instances are located on each engineer’s local dev box
    • Be fully permissive about dropping tables or columns in dev, regardless of table size
    • Just use standard ALTERs, no online DDL or external OSC tool in dev
  • Any environment EXCEPT development
    • Only automatically permit unsafe changes for tables that have no rows. For any table with at least one row, force the user to supply --allow-unsafe on the command-line to confirm when needed.
    • When ALTERing any table 1GB or larger, use pt-online-schema-change
    • When ALTERing any table below 1GB, use online DDL. (Some specific ALTERs will fail due to requiring offline DDL; in this case the user can supply --skip-alter-algorithm and --skip-alter-lock as needed.)
  • Staging
    • Has its own mysql instance, reached via TCP/IP, on a nonstandard port
  • Production
    • Has its own mysql instance, reached via TCP/IP, on the standard 3306 port
alter-wrapper="/usr/local/bin/pt-online-schema-change --execute --alter {CLAUSES} D={SCHEMA},t={TABLE},h={HOST},P={PORT},u={USER},p={PASSWORDX}"
alter-wrapper-min-size=1g
alter-algorithm=inplace
alter-lock=none
safe-below-size=1

[development]
host=localhost
socket=/var/lib/mysql/mysql.sock
allow-unsafe
skip-alter-wrapper
skip-alter-algorithm
skip-alter-lock

[staging]
host=staging-db.mycompany.com
port=3333

[production]
host=prod-db.mycompany.com

Note that the lines at the top of the file (prior to any named section) will apply to all environments. But then the [development] environment overrides a few settings, and these overrides take precedence when using the environment.

The above example is assuming a .skeema file located in a host-level directory. Alternatively, you could apply this configuration at an individual schema level by putting it in a schema directory’s .skeema file, minus all the host, port, and socket lines. This permits you to have a different configuration for specific schemas that need special-case logic.

Or if you have multiple distinct database clusters, and you want them to all follow this logic, you could put this configuration at a global level (/etc/skeema, /usr/local/etc/skeema, or ~/.skeema), again minus all the host, port, and socket lines. You can still override specific settings on a per-host and/or per-schema basis, in the .skeema file corresponding to their directories.