Skeema.io: Versioning and Deploying Stored Procedures

By Evan Elias | October 24, 2023

Stored procedures are a powerful database feature. However, they’re also somewhat controversial, in part due to operational challenges with their development workflow and lifecycle. In this post, we’ll review Git version control and deployment concepts for stored procedures, along with practical solutions when using Skeema.

Stored procedures and functions – sometimes collectively called “stored routines” or just “routines” – provide a mechanism for writing programmatic logic in SQL, and persisting this logic on the database server for later re-use. Procedures are invoked by using a CALL statement, whereas functions can be used in arbitrary SQL expressions. For simplicity’s sake, when we say “stored procedures” in this post, we’re generally including stored functions as well, unless otherwise noted.

Stored procedures are code. Manage them accordingly!

Engineers often take for granted how simple it is to edit source code in a normal application codebase. You can generally organize your code in a logical way, for example grouping related functions and methods in one file. When modifying existing logic, diff UIs show exactly which lines have changed. Version control systems, such as Git, provide the bedrock for each stage of the software development process.

Meanwhile, in the database world, the development lifecycle for stored procedures is trickier. Although you can modify routines directly using SQL DDL, this is extremely dangerous in production. It is much safer to adopt a formal pipeline for editing, testing, versioning, reviewing, and deploying stored routines. In this section, we’ll compare two different conceptual approaches: imperative migrations and declarative schema management.

Managing stored routines with imperative migrations

Traditional database “migration” tools use an imperative approach, modeling schema changes in an incremental fashion. For example, a migration might describe the operation of adding a new column to an existing table. Each migration typically consists of a new file, which defines the exact action that must be taken to modify the schema. In most migration tools, the file names need to follow a specific convention, such as having a numbered or timestamped prefix which reflects their desired order of execution.

Some of these imperative migration tools claim to be “infrastructure as code”, simply by virtue of storing the migrations in a Git repo. However, the imperative migration workflow is really quite different from the application code workflow discussed above. It becomes especially awkward for developing stored routines. Consider some common development activities using a migration tool:

Adding a new stored procedure using a migration – You need to put its definition in a new migration file, following the tool’s migration file naming scheme. Even if the new procedure is related to some other pre-existing routines, you can’t arbitrarily organize related logic into the same file.

Editing a single line within an existing stored procedure using a migration – You need to add another new migration file, which repeats the entire new definition of the procedure. Code review is difficult, since your Git commit includes the entire body of the proc, not just the line that changed. The old definition is still in the repo in its original migration, and cannot ever be edited in-place. (Some migration systems do allow you to “squash” the repo to purge old intermediate migrations, but at the cost of losing all Git logging and blame information.)

Removing a stored procedure using a migration – You need to add yet another new migration file, which performs a drop of the procedure. If your migration system requires “down” (rollback) migrations, you need to include the entire previous body of the procedure as well. The old definition also separately remains in your repo in its original migration file, so you now have at least two redundant definitions in your repo – all for a chunk of code that isn’t even used anymore.

With a migration tool, the underlying conceptual paradigm is a sequential list of migrations, not a cohesive codebase. Migrations establish their own versioning and changeset abstraction, and then redundantly place this same information into another version control layer such as Git. It’s convoluted. Fortunately, there’s a better way.

Managing stored routines declaratively

Declarative schema management provides a true infrastructure-as-code experience which inherently solves these problems. Your declarative schema repo only tracks CREATE statements. You can simply add, remove, or modify these statements directly, just like the lifecycle for application code development. When it comes time to deploy, the schema management tool figures out how to convert your changes to SQL DDL, while providing extensive guardrails to ensure your changes are safe and correct.

For example, to change one line of an existing procedure, you just edit that single line of whichever file stores its CREATE statement. The schema management tool knows how to convert that one-line change into a valid DDL statement which inherently includes the full proc body, but without you having to specify that manually.

Declarative tooling allows you to manage your stored routines just like any other codebase, using Git the way it is intended:

You can organize your routines into files however you see fit.
You can browse all routine definitions, as a cohesive codebase, at any time.
You can easily search the codebase using grep or a graphical IDE.
If you need to see historical state, you can simply git checkout older commits as needed.
If you need to un-do a change, you can use git revert, just like with application code. You never need to pollute your codebase with “down” migrations.
When you’re changing an existing routine, code reviewers can easily see which lines have been changed in a commit or pull request.
You can use git log and git blame to understand the who, when, what, and why of previous changes, on a line-by-line basis.
You can easily use Git branches for development. Multiple developers can work on the same codebase, without any risk of mis-numbering sequential migrations. Conflicting changes are surfaced as ordinary Git merge conflicts.
You never need to periodically “squash” the schema repo. Outdated historical changes are in the Git history, not intermingled in your working directory.

Using Skeema for deploying stored procedures

Skeema is a declarative schema management tool for MySQL and MariaDB. Although it was originally designed for managing tables, Skeema has also fully supported stored procedures and functions since 2019, when Psyonix (creators of Rocket League) generously sponsored the open source implementation of this functionality.

We won’t repeat our guides to getting started with Skeema or its workflow here, but we’ll cover some recipes and tips specific to procedures and functions.

You can still use migrations for tables, if you want

While we firmly believe that the declarative approach is ideal for all database objects, we recognize that many engineers wish to continue using their existing imperative migration solutions for tables. Luckily it’s actually very easy to configure Skeema to only manage stored routines, while leaving table management to a different tool.

The subtle trick here involves Skeema’s ignore-table option. This regex filter option was originally contributed by GitHub’s database team in order to exclude gh-ost shadow tables from schema management. But you can also configure a regex that matches all possible table names, i.e. ignore-table=., in order to make Skeema only manage stored procedures and functions.

If you wish, you can use this right from the start when creating your first schema dump: skeema init --ignore-table=. --host .... In this case the ignore-table option will automatically be persisted to the generated .skeema config file.

Re-organize your definitions as needed

When creating your schema dump, skeema init places each database object’s CREATE statement in its own .sql file. However, you can freely rename these files or move statements between files afterwards, as long as they remain in the same database schema subdirectory. This allows you to group and organize your stored routines in whatever fashion you prefer.

Even if you’ve adjusted the statement locations, commands such as skeema pull and skeema format still work properly. Skeema is capable of updating statements in-place, regardless of what file they’ve been moved to.

Coding with graphical database clients

If you prefer editing stored procedures directly on the database, in a graphical client such as MySQL Workbench, you can easily keep this as part of your workflow – all while still benefiting from a full deployment pipeline. In this case, the best practice is to use your graphical tool to edit proc/func definitions on your dev database only. When you’re ready to put your changes into a Git commit, run skeema pull development (or whichever name you’ve configured for the dev environment) to export your changes to the filesystem. This way, you can commit the changes and go through whatever pull request and code review process your company uses.

Use relative symlinks for common utility procs/funcs

If you have multiple database clusters, you may need to push a set of common stored routines to all clusters. However, you want to avoid duplicating their definitions for each cluster, since this makes it difficult to keep future modifications in sync.

Symbolic links can solve this problem. Simply create a file common.sql in a directory without a .skeema config file, and place all of your cross-cluster utility procs and funcs in there. Then, inside each cluster’s schema subdirectory, create a relative symlink to common.sql. As long as it’s all in the same Git repo, relative symlinks generally work properly in both Git and Skeema.

Catch SQL syntax errors early to avoid bad deploys

When deploying stored procedures and functions, it is essential to ensure they contain valid SQL. Skeema actually checks this automatically: in order to introspect your CREATE statements, they are executed in a temporary location early in the process for each Skeema command. If any SQL syntax errors are present, the command will catch this, fail loudly, and skip operation on the entire directory. In the case of skeema push, this happens prior to executing any DDL on your real schema, so you don’t need to worry about a partial deploy failing mid-way simply due to a typo in a procedure definition.

By default, both skeema diff and skeema push also run linter checks and safety checks on any modified objects. Separately, you can run linter checks on all objects in your repo at any time by using skeema lint. These commands all have predictable exit code behavior, permitting easy use in CI pipelines and other automation.

Pay attention to DEFINERs

All routines have a notion of a DEFINER user. If you don’t specify a DEFINER clause when creating a routine, the user who created the routine becomes the DEFINER. By default, routines are executed using the privileges of the DEFINER user, unless the routine definition explicitly includes SQL SECURITY INVOKER.

On managed database-as-a-service platforms, definers can be problematic. Elevated privileges are necessary in order to set a DEFINER to a different user than the one running the CREATE, and these privileges are often unavailable in DBaaS platforms. For example, although AWS Aurora for MySQL 8.0 actually does provide the necessary SET_USER_ID privilege, traditional non-Aurora RDS does not.

If elevated privileges are not available on your database platform, for ease of deployment you’ll likely need to use the same DEFINER for all of your routines, and also ensure that you run Skeema with that same user. You may also need to stick with the same username across dev/stage/prod environments to ensure your schema definitions are portable. Alternatively, Skeema Premium users can leverage its automatic DEFINER stripping functionality to ensure these mismatched user clauses are removed whenever the necessary privileges aren’t available.

Another potential headache is missing or non-existent definer users, whether due to a typo in a DEFINER clause, or an intentional-but-poorly-planned DROP USER. In this situation, the corresponding routines become orphaned, and any queries invoking them will error. Skeema’s definer linting functionality can help prevent this situation.

Operational considerations in MySQL and MariaDB

From an operations perspective, the physical act of deploying a brand new stored procedure or function is a safe operation. However, modification or deletion of an existing routine carries more risk, since it will affect existing callsites. In this section, we’ll review safety considerations and edge cases.

Dropping a routine

Before dropping a routine, you’ll need to ensure it is no longer in-use anywhere, checking for callsites in application code as well as your schema definitions (other procs/funcs, views, triggers). If you have a declarative schema repo, you can simply grep or use your IDE’s search functionality.

Skeema considers dropping a routine to be an unsafe action, requiring use of skeema push --allow-unsafe, just to confirm you are certain that it is safe to drop.

Modifying a routine

When adjusting an existing routine, the situation depends on what part of it is being modified.

Changing the SQL statements in the body

In MariaDB, CREATE OR REPLACE syntax is available to atomically update a routine, and Skeema will use this automatically when possible. However, in MySQL, this syntax is not available, and modifying the body requires a DROP followed by a re-CREATE. This is not atomic – there’s a split second where the routine does not exist, and any query referring to it in that exact moment will fail. For this reason, Skeema considers this to be an unsafe operation in MySQL.

In a mission-critical application without periodic maintenance windows, this means MySQL users must devise a custom solution at the application level. One approach is to implement retries whenever SQL error 1305 (PROCEDURE / FUNCTION does not exist) is seen by your application’s database query wrappers.

Adjusting the parameters or return type

If you’re changing the parameters of an existing routine, or adjusting the return data type of a function, this is very likely to break existing callsites! Skeema considers this type of change to be unsafe, even in MariaDB.

For example, if you’re changing a stored procedure to take 3 parameters instead of 2, you’ll need to update all occurrences of the proc in your queries. This is especially challenging since there’s no way to atomically push application changes and schema changes at the exact same moment. To solve this problem, one solution is to use a naming scheme for routines which includes a version number. For example, instead of modifying an existing routine my_routine_v1, use a three-step process: create and deploy a new separate routine my_routine_v2; adjust all callsites to use the new routine; and then drop my_routine_v1 once it is no longer used.

Other modifications

A few minor properties of a routine – such as the COMMENT clause – are considered “characteristics”, which may be modified with an ALTER PROCEDURE or ALTER FUNCTION statement. This is safe and atomic, even in MySQL. Skeema will automatically use this syntax when possible.

Binary logging and functions

By default, if binary logging is enabled on the database server, creating a stored function requires the SUPER privilege, among a few other requirements. (This restriction applies to creation of stored functions specifically, as well as triggers – but not stored procedures.)

To bypass this restriction, enable the dynamic global variable log_bin_trust_function_creators. This is essential on managed database platforms that don’t provide SUPER access, such as RDS.

The original purpose of this restriction related to ensuring that functions are safe and deterministic for statement-based replication. Now that statement-based replication is deprecated in MySQL, the log_bin_trust_function_creators variable is deprecated as well. Even so, it is still necessary to enable this variable for now. This will likely remain the case for some time, until statement-based replication is entirely removed from a future MySQL release.

C++ UDFs

In this post, we’ve only addressed stored procedures and functions, but not native or loadable UDFs, which are written in C++ and compiled into shared libraries. Skeema does not currently support management of these UDFs, but we’re open to exploring this functionality if there is sufficient community interest.

Some features described in this blog post were newly added in Skeema v1.11. For the latest Skeema updates, follow us on GitHub, Twitter, or RSS.