Gabriel Nessi

Gabriel Nessi

Renaming SQL tables with low production downtime

Renaming messy SQL tables and column names to a cleaner format following best practices and reducing production downtime to the minimum.


Context#

Let's start by giving a bit of context. 365Talents is a startup that was born 8 years ago and, like many lean startups, the development was done by interns and various developers that are no longer working here, the technology we used back then is no longer the same technology we use today, we have managed to migrate big parts of our legacy code, from PHP to NodeJS, from Angular to VueJS and also SQL tables to a cleaner format following best practices. Yes, our database tables and columns were written in a mix of snake_case and PascalCase as well as both in French and English. 🤦🏻‍♂️

I know, right? Actually this is only an aesthetic problem, of course it works, but for developers it was a real pain to read, specially for the newcomers for whom French is not their native language, like me. Some efforts were done, for example we mapped the French column names to English in our ORM models, but still a lot of legacy queries in Knex were still using the French names.

What's the plan?#

At 365Talents we have a specific day every two weeks to handle the technical debt, and renaming the SQL tables was a perfect candidate. Our starting point was to gather some best practices that fitted our needs:

  • English only
  • snake_case
  • Singular names for tables and columns, for instance user or skill
  • Relation tables for many to many relations have to be named table1_table2 table1 being the most important entity of the two, for instance user_skill
  • ORM models would also have a singular name

Once the rules were defined, well, everything was set, let's migrate some names! err... no... 😟 In the first migration, we had a 5 min downtime in production and it was just a tiny table having not so many columns, thankfully we started with something simple.

We still had plenty of tables to rename, we had to look for a solution to reduce the downtime, even though, the deployment is always done during lunch time in France, we also have users in other time zones that might still be affected.

What was the issue ?#

Let's say we have a table utilisateur that needs to be renamed to user. We create the migration that does exactly that, and separately, we replace by user all the instances in the codebase where we use utilisateur. As we merge our PR into master, the shipping process will first run the migration, renaming utilisateur to user. That happens instantly in the database of our clients, while the current production code is still pointing to the table utilisateur, so this creates a downtime as the table utilisateur cannot be found until the new code pointing to the user table is deployed. As shown in the diagram below.

Diagram #1 Diagram representating the downtime

The downtime happens from the moment we rename the table until the backend server is restarted with the new code as shown in the diagram #2. The time where your users cannot access the platform can be significant depending on the logic of the migration (renaming columns and triggers for instance).

Diagram #2 Diagram example of the downtime during deployment without using create view

Using SQL CREATE VIEW#

SQL CREATE VIEW allows you to copy the table as soon as you have renamed it, we have to give our view the old name, as at that point of the migration our production code is still pointing at utilisateur, meaning that the only downtime is between the renaming and the creation of the view, as shown in the diagram #3

Also, it supports inserts and updates meaning that no data is lost from the moment we create the view. The view table only serves its purpose until node is restarted, as the newly deployed code is now pointing at the table user. Meaning the view table can now be deleted by another migration later.

Diagram #3 Diagram example of the downtime during deployment using create view

Below, you can see a SQL migration example using CREATE VIEW. The function up is to migrate the database and down to rollback the migration.

import type { Knex } from 'knex';

export async function up(knex: Knex): Promise<void> {
  await knex.raw('RENAME TABLE utilisateur TO user');

  //...

  // create view
  return knex.raw('CREATE VIEW utilisateur AS SELECT id, name, lastname, email FROM user');
}

export async function down(knex: Knex): Promise<void> {
  // delete view
  await knex.raw('DROP VIEW utilisateur');

  //...

  return knex.raw('RENAME TABLE user TO utilisateur');
}

There is another tip to potentially reduce downtime depending on what version of MySQL you are using. At the time we were on 5.7 and by comparing the use of Knex rename functions with raw SQL, we found out that using knex.raw() was more performant due to Knex ensuring nothing is buggy on the MySQL side when renaming columns and tables and having constraints and triggers referencing these tables or columns.

For instance :

// This would take longer
export async function up(knex: Knex): Promise<void> {
  return knex.schema.renameTable('utilisateur', 'user');
}

// This would be faster
export async function up(knex: Knex): Promise<void> {
    return knex.raw('RENAME TABLE utilisateur TO user');
}

Conclusion#

There is still a downtime as you could see in the diagram #3, but it only takes around 20 seconds to rename a table.

There are different solutions to reduce downtime without using a view but from 5 minutes to 20 seconds, we considered it was not worthy to invest more time.

Thank you for reading folks! 🤠

Gabriel