Saturday, November 16, 2019

Drupal 6 - Ubercart 6: Display Chinese Characters in MySQL Output

CentOS 6 / Drupal 6 / Ubercart 2 / MySQL 14




Exporting Chinese characters to a file only works if the process pays respect to the
encoding of the data, so it can "understand" the character set involved.

Otherwise you get a bunch of garbage in our output file, like I did.  

Here's the story of how I fixed that problem.

First, why this article exists.  As part of our commitment to making the life of our customers as easy as possible, we decided to shift our website from a 3-column desktop-oriented interface to a fluid, responsive, mobile-phone friendly interface.  At the same time, we also decided to upgrade our version of CentOS and Drupal, because both of these products were near (or had reached) their End of Life (EOL) date.

This required us to totally rebuild our eCommerce website from its constituent parts, because there is no easy or clear migration path from Drupal 6 to Drupal 7, a real pain point for most people who rely on Drupal to run their business.

Luckily, I was a Technical Architect for a couple of Fortune 50 companies in a previous career, so I took on the challenge of migrating us from Drupal 6 to Drupal 7 - mostly because I could not find a person or organization who could do it for me in a timely, cost-effective manner (see advertisement at top of article).

So a lot of the work involved meant porting data from Drupal 6 to Drupal 7, including data that included information that had been encoded in UTF8.

While porting over Ubercart Addresses, to avoid forcing our customers to re-enter their address data, I noticed that some of their addresses, which had been entered in Chinese, were not being migrated properly.  Instead of chinese characters, we got a bunch of ????? characters, which is a classic indication of an encoding or decoding problem.

Heres' an example.  One of our Chinese customers has an address in their native language (zh-hant, or Traditional Chinese) that renders correctly within the Drupal 6 System, but not at the MySQL command line:

Example:

mysql> select street1 from uc_addresses where uid = "7981";
+----------------+
| street1        |
+----------------+
| ?????????2206? |
+----------------+

1 row in set (0.00 sec)

But within the Drupal 6 website interface, I can see the following information:


馬鞍山錦英苑錦強閣2206室

This means that the information has been encoded and stored correctly, but it is not being decoded correctly when shown either in the CentOS Command Line Interface (CLI), or the MySQL Command Line Interface (MCLI)

To show Chinese characters properly, CentOS needs to be able to use an enriched character set called UTF-8, which was designed to accomodate as many languages as possible, incluing multi-byte characters like Chinese, Japanese and Korean.

Click here for some information on UTF-8.

How to Ensure a UTF-8 Compliant Command Line Interface on CentOS 6:

To make sure you have an UTF-8 compliant Command Line Interface (CLI) on CentOS 6, you just need to make sure that you have Chinese Support installed.  

Here's how you do that:

# yum groupinstall "Chinese Support"

This will install a lot (~56Mb in my case) of software on your machine.  

During the software download and installation, I just bat back and watched the flood of installation information scroll by, hoping it would all work out.  It did.


CLI Testing:


Next, I tested to see if the CLI could display Chinese characters correctly by copying/pasting the characters that rendered correctly on the Drupal 6 Sytem interface directly into the CLI:

# 馬鞍山錦英苑錦強閣2206室

Success.


MySQL Testing:


This is starting to look like a local MySQL configuration issue.

I say this because everything else seems to understand UTF-8 encoding, but not MySQL.

So, what version of MySQL are we working with, exactly?

# mysql --version
mysql  Ver 14.14 Distrib 5.1.73, for redhat-linux-gnu (x86_64) using readline 5.1

So,  how do we get MySQL 14.14 to display UTF-8 properly?

Well,  an obvious place to look is within its own configuration with respect to encoding.

Here's how you do that:


mysql> SHOW VARIABLES LIKE 'char%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | latin1                     |
| character_set_connection | latin1                     |
| character_set_database   | latin1                     |
| character_set_filesystem | binary                     |
| character_set_results    | latin1                     |
| character_set_server     | latin1                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)

So, my MySQL server was configured to display latin1 characters when producing query results.  Obviously, this won't work well with Chinese, so let's change that setting to UTF-8 and see what happens:

mysql> set character_set_results = utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> SHOW VARIABLES LIKE 'char%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | latin1                     |
| character_set_connection | latin1                     |
| character_set_database   | latin1                     |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | latin1                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)

OK, now it's time to test to see if MySQL now displays Chinese characters correctly:

mysql> select street1 from uc_addresses where uid = "7981";
+------------------------------------+
| street1                            |
+------------------------------------+
| 馬鞍山錦英苑錦強閣2206室              |
+------------------------------------+

1 row in set (0.00 sec)

Great Success!


REFERENCES:





No comments:

Post a Comment