MySQL 5.7 to 8: Schema and DDL Changes

By Evan Elias  |  February 28, 2024

MySQL 5.7 reached its end-of-life in October 2023, and managed database products such as Amazon RDS are starting to transition to expensive “extended support” pricing for 5.7 users. Before you jump on a last-minute MySQL 8 upgrade, it is essential to understand the major functional differences between these database versions. In this post, we’ll provide a deep-dive on the upgrade implications for schema management and DDL operations.

MySQL 8 schema gotchas

MySQL 8 includes a massive number of excellent new features. However, several behavioral changes in MySQL 8 can affect schema management in subtle ways, causing surprising problems after upgrading. When planning an upgrade from MySQL 5.7, be sure to check your table definitions and server configurations for these common land-mines.

Spatial indexes and SRIDs

MySQL 8’s query planner only uses a spatial index if the indexed column has an SRID attribute. However, the SRID attribute did not exist in MySQL 5.7. This means that upon upgrading to MySQL 8, your existing spatial indexes are no longer usable for query optimization. You must drop and recreate all such indexes after upgrading.

Starting in Skeema v1.11, spatial indexes which lack an SRID are flagged by Skeema’s linter if the server is running MySQL 8.

Default collation for utf8mb4

The utf8mb4 character set is the modern preferred choice for arbitrary textual data, since it can encode any unicode character. Previously, its default collation was utf8mb4_general_ci, which doesn’t conform to any UCA standard. This old collation considers all supplementary characters (such as emoji) to be equal in comparisons, which is especially problematic for uniqueness constraints.

MySQL 8 introduces new collations to solve this issue, and the default collation for utf8mb4 has been updated accordingly: it is now utf8mb4_0900_ai_ci, which is based on UCA 9.0.0. This is a positive improvement, but the default change causes upgrade complications with logical dumps, as well as schema management. If you execute a CREATE TABLE statement which includes CHARACTER SET utf8mb4 columns, but doesn’t explicitly specify a COLLATE clause, that logical statement now differs in meaning between MySQL 5.7 vs MySQL 8. The exact impact of this depends on your chosen method of upgrade.

  • Dump-and-reload upgrade: Since mysqldump relies on SHOW CREATE TABLE (which omitted default-for-the-charset COLLATE clauses in 5.7), affected columns will change from utf8mb4_general_ci to utf8mb4_0900_ai_ci when you load the dump into MySQL 8. Switching to utf8mb4_0900_ai_ci is generally beneficial, but may require extra application testing for supplementary unicode input and the resulting sort order and string equivalency behaviors.

  • Binary in-place upgrade: These columns will stay on utf8mb4_general_ci. However, when using a schema management solution, you must now update your repo of table definitions to ensure your utf8mb4 columns are explicit about what collation is intended. Otherwise, re-running your schema definitions (e.g. when spinning up a test/CI sandbox database) will result in use of utf8mb4_0900_ai_ci, which now differs from your prod DB. If you use Skeema, this is easy to solve: after upgrading to MySQL 8, simply run skeema pull a single time to re-normalize your table definitions. This will rewrite your CREATE TABLE statements using MySQL 8’s SHOW CREATE TABLE behaviors, which now always include an explicit COLLATE clause for utf8mb4 columns.

Server default charset and collation

Previously, the server variables character_set_server and collation_server defaulted to latin1 and latin1_swedish_ci, respectively. In MySQL 8, these variables now default to utf8mb4 and utf8mb4_0900_ai_ci. These variables behave like cascading defaults for DDL: they affect CREATE DATABASE statements which don’t specify a charset or collation, and in turn the database-level defaults then affect CREATE TABLE statements which similarly omit charset and collation.

Generally these changes won’t affect your existing tables upon upgrade, regardless of your upgrade method. They typically don’t affect Skeema either, since skeema init will automatically track database-level defaults in .skeema config files, meaning that the server-level defaults are not relied upon. However, other database tools and migration systems may be affected if they aren’t careful about always specifying the charset or collation.

Foreign keys and metadata locking

When running some types of DDL on a table, MySQL 8 obtains additional metadata locks across foreign key constraint relationships to other tables, as noted in the manual:

Metadata locks are extended, as necessary, to tables related by a foreign key constraint to prevent conflicting DML and DDL operations from executing concurrently on the related tables. When updating a parent table, a metadata lock is taken on the child table while updating foreign key metadata. Foreign key metadata is owned by the child table.

Previous versions of MySQL did not perform this behavior. While this change is conceptually “correct”, it can have severe operational consequences on heavy users of foreign key constraints. DDL on a table can now be blocked by long-running SELECT queries on the other side of a foreign key. Worse still, since a blocked DDL statement is considered higher priority than other metadata lock requests, this blocked DDL will in turn block any other new incoming queries on these tables, even simple reads. Typically this manifests as a major query pile-up which lasts until the long-running SELECTs all complete, followed by the DDL completion.

Automatic behaviors of timestamp columns

In MySQL 8, the explicit_defaults_for_timestamp server variable now defaults to ON, which prevents some nonstandard legacy behaviors of the first timestamp column per table. In previous versions of MySQL, this variable defaulted to OFF, causing the first timestamp per table to assume special DEFAULT and ON UPDATE behaviors.

This change won’t affect your existing tables upon upgrading, regardless of how you perform the upgrade. Logical dump tools such as mysqldump or skeema init rely on SHOW CREATE TABLE, which will fully include any DEFAULT and ON UPDATE clauses which were automatically created by explicit_defaults_for_timestamp=OFF. However, this change can affect any old hand-written table definitions in your schema repo, as these CREATE TABLE statements may have been written assuming the nonstandard timestamp behaviors would apply. For this reason, when using Skeema we recommend running skeema pull once after upgrading to MySQL 8, to rewrite all table definitions using MySQL’s canonical dump format.

DDL statement overhead

MySQL 8’s new internal data dictionary implementation provides important robustness improvements, but that comes at a cost of more overhead per DDL statement. This isn’t problematic for one-off DDL usage, but it can become noticeable in situations which rely on rapidly executing hundreds or thousands of DDL statements.

Typically this can come up in integration test scenarios, where a database schema needs to be created and torn down for each test. With some traditional imperative migration tools, the entire history of migrations needs to be re-run each time, which gets increasingly slow. Some tools offer the ability to periodically “flatten” the full schema definition, which can mitigate the MySQL 8 performance loss in this situation. Meanwhile, declarative schema management tools such as Skeema avoid this problem by inherently always being flattened, using Git for tracking the history of each CREATE statement natively.

Integer display widths removed

Integer types in MySQL now omit display widths. For example, in SHOW CREATE TABLE and information_schema, you’ll just see int instead of int(11). Similarly, if you try to specify a display width on a new column, the server will just ignore that attribute.

Display widths typically have no effect whatsoever and are a common cause of developer confusion, so this is a positive change, but nonetheless it can be surprising. The first time you run skeema pull after upgrading to MySQL 8, display widths will be stripped from your *.sql files.

The only exception to this change is tinyint(1) or its alias bool. These will still show as tinyint(1) as a visual indicator that the column is likely used as a boolean value.

New table functionality in MySQL 8

Upgrade headaches aside, MySQL 8 provides a ton of benefits over earlier versions. Here are some of the impactful new features relating to table design.

CHECK constraints

CHECK constraints provide a data validation mechanism for writes to tables. Every inserted or updated row is checked for validity against the table’s CHECK constraint expressions. The write will fail with an error if any new or updated row causes any CHECK constraint expression to return false.

Column default value expressions

Prior versions of MySQL only supported literal constant values in the DEFAULT clause of a column definition. In MySQL 8, you may now use arbitrary expressions for column default values as well.

Expression defaults are indicated by wrapping the expression in parentheses. You can even use expression defaults for data types which prohibit literal constant defaults (BLOB, TEXT, JSON, GEOMETRY).

Invisible columns

Columns can be marked with the INVISIBLE attribute to exclude them from SELECT * queries. This is useful when you need to add a new column to a table without breaking legacy applications.

Spatial reference systems

MySQL 8 provides extensive support for spatial reference systems for geometry types and spatial indexes. Spatial columns may be restricted to a specific spatial reference system using the SRID attribute. Prior versions of MySQL did not provide this attribute, and only supported Cartesian calculations.

Descending index parts

Previously, all indexes in MySQL were stored in ascending order. This meant that if an ORDER BY clause specified a mix of ASC and DESC columns, an index could not be fully used to avoid a sort operation, even if the index otherwise contained the correct columns. MySQL 8’s support for descending index parts provides a solution in this situation.

Functional indexes

MySQL 8 supports functional indexes, which permit indexing of arbitrary expression values directly. This feature is similar to the idea of indexing a VIRTUAL column (available since MySQL 5.7), but without having to create or name a column explicitly.

Multi-valued indexes over JSON

As a special case of functional indexes, JSON arrays may be indexed using multi-valued indexes, where multiple index entries point to the same row. This feature also provides a subtle way to implement partial indexes in MySQL, since empty arrays are omitted from the index entirely.

DDL and operational benefits in MySQL 8

Some of MySQL 8’s most important changes center around execution of DDL, solving major operational hurdles from prior releases.

Instant ALTER TABLE

Several common forms of ALTER TABLE can now be completed as an instantaneous metadata-only operation, including most situations involving:

  • Adding a column
  • Dropping a column (8.0.29+)
  • Renaming a column (8.0.28+)
  • Adding or removing a column’s DEFAULT clause
  • Adding new elements to the end of an ENUM or SET value list
  • Renaming a table

Some limitations and edge cases may apply. Be sure to add an ALGORITHM=INSTANT clause to your DDL (or --alter-algorithm=instant in Skeema) if you want the statement to fast-fail if it doesn’t support instant mode.

Fast DROP TABLE

MySQL 8.0.23+ solved a longstanding performance issue when executing a DROP TABLE on a server that has a large InnoDB buffer pool. When combined with a disabled innodb_adaptive_hash_index and a filesystem that supports fast unlinking (e.g. XFS), table dropping is now nearly instantaneous, regardless of table size, buffer pool size, or presence of table pages in the buffer pool.

Multi-threaded secondary index creation

MySQL 8.0.27+ has the ability to build secondary indexes using multiple threads. This feature can improve ALTER TABLE ... ADD INDEX performance quite substantially, as long as your system has the spare resources to dedicate to this.

Invisible indexes

You can now mark an index as INVISIBLE to prevent it from being used by the query planner. This feature provides a useful mechanism for quickly testing whether it would be safe to drop an index.

Generated invisible primary key

In MySQL 8.0.30+, you can configure the database server to automatically add an invisible primary key to any new table which lacks one. However, we generally recommend using Skeema’s linter to catch PK-less tables instead, since explicit primary keys are clearer, and you can control annotation severity (warning vs error) and even configure permissible data types for primary key columns.

Auto-increment counter persistence

Prior to MySQL 8, if you explicitly increase a table’s next auto_increment counter using ALTER TABLE ... AUTO_INCREMENT = X, the elevated value would be lost upon any restart of the database server. In MySQL 8, these alterations persist across restarts.

Atomic DDL

Last but certainly not least, MySQL 8’s new internal data dictionary implementation ensures that DDL statements are atomic and crash-safe. Since the old table .frm files no longer exist, there is no risk of orphaned tables or inconsistencies between the filesystem and data dictionary.

Need assistance with a MySQL 8 upgrade? In addition to developing Skeema, we provide expert consulting services for MySQL and MariaDB. Reach out to enquire about a hands-on consulting engagement or advisory retainer.