Migrating Drupal to Mysql utf8mb4

Written by
Shubhra Prakash Nandi's picture

Drupal is a very popular content management system (CMS) or rather as they like to call it a content management framework (CMF). Drupal runs a very large number of websites around the world from high traffic websites to personal ones. This popularity of Drupal means that it should support as many languages and characters as possible to truly serve it's user base. Mysql among others is a popular database which powers Drupal's backend. But due to performance reasons Mysql left out true UTF8 (unicode) character support in it's databases by default and opted for it's own version of UTF8 character set implementation which is actually three bytes large instead of the usual four bytes. This article will talk about how to migrate to true UTF8 support in Mysql which Mysql calls as utf8mb4 (UTF8 Multibyte 4) from their three byte UTF8 character encoding. This migration is necessary since otherwise you may get truncated characters stored in database and loose data updated by the user if the character is a four byte long UTF8 character.

There exists a Drupal module https://www.drupal.org/project/utf8mb4_convert to take care of this migration but as a database developer and administrator I wanted more control over how exactly the data and tables are migrated. So I have outlined the steps below, which bypasses using this module completely and take care of migrating to uft8mb4 offline. Replace the placeholders enclosed in <> according to your database.

Important: Take backup of your database and webserver files before you attempt the below steps. Incorrectly executing the below steps can result in data loss. You have been warned.

1. Take Mysql dump of the database structure alone i.e. database creation and table creation statements only.

mysqldump --no-data --routines --databases <db name> -u <db user> -p > <db name>_DDL_dump.sql

 

2. Now manipulate the above dump file as below.

sed -i 's/\/\*!40100 DEFAULT CHARACTER SET latin1 \*\///g' <db name>_DDL_dump.sql
sed -i 's/ utf8 / utf8mb4 /g' <db name>_DDL_dump.sql
sed -i 's/ latin1 / utf8mb4 /g' <db name>_DDL_dump.sql
sed -i 's/ DEFAULT CHARSET=\(utf8\|utf8mb4\) / /g' <db name>_DDL_dump.sql
sed -i 's/ DEFAULT CHARSET=\(utf8\|utf8mb4\);/;/g' <db name>_DDL_dump.sql
sed -i 's/ COLLATE[ =]\(utf8_bin\|utf8mb4_bin\) / /g' <db name>_DDL_dump.sql
sed -i 's/ COLLATE[ =]\(utf8_bin\|utf8mb4_bin\);/;/g' <db name>_DDL_dump.sql
sed -i 's/^\/\*!.*\*\/;$//g' <db name>_DDL_dump.sql

 

3. The above statements will work incase Drupal was installed using default language 'English'. You can adjust the above according to the character set in which Drupal was installed if default language was not used during installation.

 

4. Now add the below statements in Mysql configuration file.

Under [mysqld]

innodb_large_prefix            = true
innodb_file_format            = barracuda
innodb_file_per_table        = true

character_set_server        = utf8mb4
collation_server            = utf8mb4_unicode_ci

Under [client]

default_character_set        = utf8mb4

Under [mysql]

default_character_set        = utf8mb4

 

5. Now take Mysql dump of all data in the database

mysqldump --no-create-info --no-create-db --complete-insert --quick --databases <db name> -u <db user> -p > <db name>_DML_dump.sql

 

6. If the above dump was created successfully, then login to Mysql as root user and execute the below SQL statement.

ALTER DATABASE <db name> CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci

 

7. Now import the DDL and DML dump files using mysql client in sequence as shown below.

mysql -u <db user> -p < <db name>_DDL_dump.sql
mysql -u <db user> -p < <db name>_DML_dump.sql

 

8. Once the DDL and DML files have been imported successfully in above step, execute the below statement to update database to tell Drupal that it can now issue SQL statements inline with utf8mb4 requirement.

echo "USE <db name>; INSERT INTO variable SET name='drupal_all_databases_are_utf8mb4', value='b:1;'; COMMIT;" | mysql -u <db user> -p

 

9. Now add the below parameters to the databases array in Drupal settings.php file.

'charset' => 'utf8mb4',
'collation' => 'utf8mb4_unicode_ci',

 

10. Restart Mysql server. Restart webserver and/or php-fpm server.

Zircon - This is a contributing Drupal Theme
Design by WeebPal.