Docs: Features: SSH Tunnels

Skeema Premium Edition includes the ability to dynamically create SSH tunnels. When enabled, this permits Skeema to launch database connections through an intermediate remote machine, and/or to establish database connections “locally” on a database server host which is actually remote.

Skeema’s SSH implementation includes automatic interactions with the OpenSSH ecosystem of configuration files, authentication agents, identity files (private keys), and known hosts files.

SSH tunnels are configured through the ssh and ssh-to-db options, which can be used individually (single-hop tunnel) or in combination (two-hop tunnel). The functionality is designed to handle any of these use-cases:

  • SSH tunnel through an intermediate bastion host / jump box. Typically this is motivated by security reasons, such as environments that restrict where direct database connections can be made. This use-case is handled by the ssh option.
  • SSH tunnel directly to each database server host, for situations where database administrative users (with elevated grants/privileges) are restricted to localhost connections. This use-case is handled by the ssh-to-db option.
  • Two-hop SSH tunnel, first to an intermediate bastion / jump box, and then to the database host. This use-case involves configuring both ssh and ssh-to-db options at the same time.
  • Executing all DDL wrapper scripts (alter-wrapper or ddl-wrapper) from a common location, either centralized (ssh option) or on each individual database host (ssh-to-db option). This has many potential benefits:
    • Security (restricting who can run OSC, and auditing these operations)
    • Locking (preventing concurrent execution)
    • Uniformity (ensuring the same version of pt-osc, gh-ost, or spirit)
    • Portability (removing the need to get Perl-based pt-osc working on Windows or MacOS)
    • Network performance (running an OSC tool directly on a database host)

Bastion hosts / jump boxes

SSH tunnels through a bastion host / jump box are configured using the ssh option. At minimum, set the option value to the name of the intermediate host to SSH to, or a hostname alias defined in your ~/.ssh/config OpenSSH configuration file.

You may optionally also include an SSH username, SSH password, and/or nonstandard SSH port by specifying the option value in format [username[:password]@]hostname[:port], for example ssh=mybastion.com:2222, ssh=myuser@mybastion.com, ssh=myuser:mypassword@mybastion.com:2222, and so forth. These optional components can be omitted to instead use values from an SSH configuration file, falling back to normal SSH defaults if omitted there as well (OS username, port 22, etc).

SSH tunnel on database host

In some companies, database administrative grants/privileges are restricted only to localhost connections on each database host. Skeema’s ssh-to-db boolean option can be used in such situations. When enabled, Skeema will establish an SSH tunnel to each database server host, for use in all database connections on that specific host. If a socket is also configured, database connections will use the specified UNIX domain socket for localhost connections. Otherwise, if no socket is configured, database connections will use TCP/IP to the configured database port over the loopback interface (127.0.0.1).

Since ssh-to-db is a boolean option, it does not permit directly configuring an SSH username or non-standard SSH port. By default, the current local OS username will be used, along with the standard SSH port of 22. To override this behavior, use an OpenSSH configuration file ~/.ssh/config with entries matching your host values. See the SSH configuration file section below for more information.

The ssh-to-db option is compatible with SSH hostname aliases. For example, in Skeema’s configuration, you could configure host=mydbalias and enable ssh-to-db, even if “mydbalias” is not defined in DNS, as long as your ~/.ssh/config contains a configuration section mapping “mydbalias” to a real HostName. This works because whenever ssh-to-db is enabled, database connections are made to localhost / 127.0.0.1, so no DNS lookup is performed for the database connection.

In managed database-as-a-service environments (AWS RDS, Google CloudSQL, etc), ssh-to-db cannot be used, as these services typically do not provide SSH access to the underlying database server host.

Two-hop tunnels

If both ssh and ssh-to-db are enabled, Skeema will create a two-hop SSH tunnel: first to the bastion/jump-box specified by ssh, and then to each database host. This combination is only necessary in environments which severely restrict both where database connections can be made and where tool binaries like skeema may be installed.

All SSH configuration and identity files will be used from your local machine, for both hops. If your local machine has an ssh-agent, it will be used for both hops automatically. (This does not require or use agent forwarding. Skeema does not ever set up agent forwarding, and it ignores the ForwardAgent OpenSSH configuration option.)

SSH agents

If an OpenSSH authentication agent (ssh-agent) is running, Skeema will communicate with it in order to supply your identities (SSH keys) for authentication with the remote server.

On Linux and MacOS, Skeema uses the UNIX socket location specified by $SSH_AUTH_SOCK.

On Windows, the named pipe \\.\pipe\openssh-ssh-agent is used. Please note that Skeema can communicate with the Windows port of the OpenSSH ssh-agent, but not the PuTTY agent pageant.

SSH configuration file

Skeema will use your ~/.ssh/config OpenSSH configuration file for the following settings:

  • User
    • If a username is specified directly in the ssh option value, that takes precedence over ~/.ssh/config.
    • If no username is specified in the ssh option value, nor in ~/.ssh/config, then the current OS username is used by default.
  • Port
    • If a port is specified directly in the ssh option value, that takes precedence over ~/.ssh/config.
    • If no port is specified in the ssh option value, nor in ~/.ssh/config, then port 22 is used by default.
  • HostName
    • Specifies the “real” hostname or IP address to use in place of an SSH alias.
  • IdentityFile
    • Specifies an SSH identity file (private key) to use, instead of (or in addition to) using an SSH agent.
    • If not specified, defaults to ~/.ssh/id_rsa and/or ~/.ssh/id_dsa if these files exist.
    • Without an SSH agent, encrypted private key files may be used if STDIN is a TTY; Skeema will interactively prompt you for the passphrase in this situation.
    • If an SSH agent is running, Skeema assumes that all encrypted keys are managed by the agent, and will not prompt for key decryption passphrases interactively.
  • UserKnownHostsFile
    • Specifies the location of the known_hosts file, or defaults to ~/.ssh/known_hosts if that file exists. If the file does not exist, Skeema will not create it.
    • If the file does not exist or cannot be read, and StrictHostKeyChecking is yes, this is treated as a fatal error.
    • If the file does not exist or cannot be read, and StrictHostKeyChecking is not yes, Skeema will log a warning and skip all behavior relating to known_hosts. This reduces security, but may be preferable in non-interactive environments, or when using ephemeral hosts.
    • Ordinarily, if a host’s public key has changed from what was recorded in known_hosts, Skeema will refuse to connect. Overriding this behavior is not recommended (due to risk of MitM attacks), but may be achieved by setting UserKnownHostsFile /dev/null along with StrictHostKeyChecking no.
  • StrictHostKeyChecking
    • Controls handling of connection to hosts that are not already found in the UserKnownHostsFile. See OpenSSH configuration manual for a description.
    • May be set to ask (the default), yes, or no.
    • With StrictHostKeyChecking ask, it is a fatal error if STDIN is not a TTY and the known_hosts file exists but lacks an entry for the host.

To debug SSH configuration and authentication problems, enable Skeema’s debug option to increase the amount of logging detail.

Special behaviors with SSH tunnels

Whenever an SSH tunnel is used (whether via ssh and/or ssh-to-db), several important modifications are automatically made to Skeema’s behavior.

OSC programs and DDL wrappers

Any configured alter-wrapper or ddl-wrapper will be executed on the remote host, rather than locally. Be sure to specify an absolute path for the wrapper. This should be a path on the remote host.

If you’re using a two-hop tunnel (setting both ssh and ssh-to-db at the same time), the wrapper will be executed on the database host, not the intermediate bastion/jump-box.

If your tables are large, consider putting a terminal multiplexer such as screen or tmux at the beginning of your configured alter-wrapper or ddl-wrapper command-line, in order to ensure that loss of SSH connectivity does not interrupt a long-lived online schema change operation.

If you intentionally detach/background a terminal multiplexer (e.g. CTRL-A CTRL-D in screen), Skeema will think your wrapper script has completed successfully, causing Skeema to immediately move on to the next operation – while the previous wrapper script continues to run remotely in the detached screen. If your push involves ALTERs to multiple tables, this means you may end up with multiple concurrent online schema changes running at once, for better or worse!

SSL / TLS

When using an SSH tunnel, if no ssl-mode has been configured, it will default to disabled instead of the usual default of preferred. This is done to prevent unnecessary double-encryption, since the SSH tunnel is already encrypted.

To override this behavior, for example if you need to connect with a database user account that has the REQUIRE SSL attribute on the server side, you can explicitly set ssl-mode to either “required”, “verify_ca”, or “verify_identity”. However, be aware that the double-encryption will increase latency.

Timeouts

Query timeouts and socket write timeouts specified in connect-options (readTimeout and writeTimeout, respectively) are ignored with an SSH tunnel. No timeout is enforced at the query or socket level. Regardless, Skeema automatically sends keep-alive messages to keep the SSH tunnel active, and will automatically abort if the SSH server remains non-responsive after a grace period of several minutes.

Non-interactive use

When using an SSH tunnel in a non-interactive / automated use-case, take caution to avoid situations that prompt on STDIN. Refer to the STDIN section of the Pipelines and Automation documention.