Fixing a MySQL Character Encoding Mismatch

Posted in: WordPress, Development

We ran into an interesting MySQL character encoding issue at Crowd Favorite today while working to upgrade and launch a new client site.

Here is what we were trying to do: copy the production database to the staging database so we could properly configure and test everything before pushing the new site live. Pretty simple right? It was, until we noticed a bunch of weird character encoding issues on the staging site.

Character Encoding Issue

It turned out that while the database tables were set to a Latin-1 (latin1), the content that populated those tables was encoded as UTF-8 (utf8). A variety of attempts to fix this failed, but what succeeded was as follows:

  1. Export the data as Latin-1. Because MySQL knows that the table is already using a Latin-1 encoding, it will do a straight export of the data without trying to convert the data to another character set. If you try to export as UTF-8, MySQL appears to attempt to convert the (supposedly) Latin-1 data to UTF-8 - resulting in double encoded characters (since the data was actually already UTF-8).
  2. Change the character set in the exported data file from ‘latin1′ to ‘utf8′. Since the dumped data was not converted during the export process, it’s actually UTF-8 encoded data.
  3. Create your new table as UTF-8 If your CREATE TABLE command is in your SQL dump file, change the character set from ‘latin1′ to ‘utf8′.
  4. Import your data normally. Since you’ve got UTF-8 encoded data in your dump file, the declared character set in the dump file is now UTF-8, and the table you’re importing into is UTF-8, everything will go smoothly.

I can confirm that a half-dozen or so variations on the above do not work. This includes INSERT INTO newdb.newtable SELECT * FROM olddb.oldtable;.

Also, if you’re doing this for a WordPress1 site (like we were), keep in mind that copying over the production database will generally mean that WP-Cache is enabled. You’ll want to remember to turn that off. Yeah. ;)

  1. This is a fairly common issue in older WordPress installs because the MySQL database default is commonly Latin-1, and older versions of WordPress did not specify the character set when creating the database tables (so they would default to Latin-1) and the default encoding in the WordPress settings is UTF-8. [back]

Popularity: 20% [?]

Posted March 6th, 2008 @ 12:13 AM

29 Replies

  1. Wp Wordpress » Blog Archive » Fixing a MySQL Character Encoding Mismatch adds this Pingback:

    […] Read it at the source […]

    March 6th, 2008 at 1:13 am

  2. Baris Unver adds this Comment:

    I didn’t understand the second step. How do we do that?

    March 6th, 2008 at 1:45 am

  3. bdtmz adds this Comment:

    baris unver, open the file in ur fav text editor and use find & replace function to change latin1 to utf8.

    March 6th, 2008 at 3:41 am

  4. jgb adds this Comment:

    Step 1 states “Export the data as Latin-1″.

    Forgive my obvious senility, but does this export take place from the source database, or from the destination database after an import has been done and the character encoding issue has been discovered?

    March 6th, 2008 at 5:27 am

  5. Alex adds this Comment:

    Step 1 should be from the source database.

    March 6th, 2008 at 7:25 am

  6. kosir adds this Comment:

    I had a similar problem on my blog and all I had to do was to add // to one line in wp-config.php

    I changed

    define(’DB_CHARSET’, ‘utf8′);

    to

    // define(’DB_CHARSET’, ‘utf8′);

    And everything was fine.

    March 6th, 2008 at 7:27 am

  7. Alex adds this Comment:

    That change won’t affect data that’s already in the database.

    March 6th, 2008 at 7:28 am

  8. shuron adds this Comment:

    Thank you I had this issues too (not with Wordpress).

    March 6th, 2008 at 7:59 am

  9. big Dog adds this Comment:

    I am not sure what this implies. I have an old WP installation. The options are set for UTF 8. In my database there is a column that reads collation and it is Latin 1 sweedish.

    Does this affect performance? Does it need to be corrected? What problems will it cause, if any?

    Thanks…

    Big Dog

    March 6th, 2008 at 9:24 am

  10. g30rg3_x adds this Comment:

    Well that method is kinda the risky and actually its not the preferable way to do this, because if your blog uses special some special chars like the “ñ” in Spanish languages this chars would be converted literally to trash…
    For this The Codex has a special page documenting this problem a the “preferable” way to do this:
    http://codex.wordpre[...]aracter_Sets

    Also deleting the charset setting inside wp-config its not a good solution, because the connections would use the default mysql charset (which is latin1 by default) so if some hosting provider or even you set this value to other things like utf8 the problem will persist.
    So if you don’t want to change the charset of your WordPress based website, its a preferable that you set explicitly the charset in wp-config.

    Greetings from Mexico

    March 6th, 2008 at 9:24 am

  11. Kirk M adds this Comment:

    Thank you Alex. You’ve solved a long standing conundrum I couldn’t track down so I followed your instructions and now I have an entire MySQL DB all encoded in utf8 (WordPress install that’s been around since 2.0).

    Basic maintenance and backing up of DB’s, exporting and importing DB’s tables etc is no problem, anything more than that I’m mostly in the dark so a question after the fact:

    It seemed that reimporting a DB table after modifying required the original table to be dropped. Attempting to import the modified table while the original table was still present in the DB resulted in a SQL error “table already exists”. I take it that dropping the original table before reimporting is the normal way of doing these types of things?

    March 6th, 2008 at 10:21 am

  12. cgilkison adds this Comment:

    There is a plugin that will do this all for you. I’ve used it successfully up to WP 2.3. Its in the WP repository - UTF-8 Database Converter

    March 6th, 2008 at 2:35 pm

  13. Quasi.dot › Lettere accentate, MySQL e Wordpress adds this Pingback:

    […] qualche altro elemento a piacere. Capita. Non dovrebbe, ma capita. Parte della soluzione è qua. E si traduce […]

    March 6th, 2008 at 3:07 pm

  14. Kerim Friedman adds this Comment:

    Thanks Alex (and all the commenters) for documenting and offering solutions to an issue which has lost me many nights of sleep in the past.

    March 6th, 2008 at 4:54 pm

  15. Tienshiao adds this Comment:

    The solution to this problem is also documented in MySQL’s docs:

    http://dev.mysql.com[...]r-table.html

    “The CONVERT TO operation converts column values between the character sets. This is not what you want if you have a column in one character set (like latin1) but the stored values actually use some other, incompatible character set (like utf8). In this case, you have to do the following for each such column:

    ALTER TABLE t1 CHANGE c1 c1 BLOB;
    ALTER TABLE t1 CHANGE c1 c1 TEXT CHARACTER SET utf8;

    The reason this works is that there is no conversion when you convert to or from BLOB columns.”

    March 7th, 2008 at 1:21 am

  16. » Apie WP DB koduotes ir problemų sprendimo būdus (nepamirškite … blogeriai.lt adds this Pingback:

    March 7th, 2008 at 4:35 am

  17. Alex adds this Comment:

    I should clarify, the reason we couldn’t use the database converter scripts or convert the columns to binary, etc. was because the source database was still live and in production. We had to copy the data from the production database to the staging database without modifying the production database.

    That is an important point and renders many of the other options mentioned here non-viable.

    March 7th, 2008 at 10:30 pm

  18. Jinge adds this Comment:

    I guess the same procedure will work from iso-8859-1 also?

    March 9th, 2008 at 12:16 pm

  19. Alex adds this Comment:

    Latin 1 is ISO-8859-1.

    March 9th, 2008 at 4:18 pm

  20. UTF-8 - Umstellung (WordPress-Optimierung, Teil 1) : fob marketing adds this Pingback:

    […] der Datenbank auf UTF-8 - nicht meinen - entdeckte ich wie gesagt kürzlich über den WP-Ticker auf alexking.org. Dort wurde empfohlen, die alte Datenbank bewusst Latin-1 kodiert zu exportieren, den Zeichensatz […]

    March 11th, 2008 at 2:17 am

  21. Exocert.com » Recuperar un Wordpress adds this Pingback:

    […] (Actualización: El error más común es que el juego de caracteres de tu mySQL no coincida con el de Wordpress - utf…) […]

    March 13th, 2008 at 8:38 am

  22. Nuno Leite adds this Comment:

    Where todo the export as latin-1 ?
    In Cpanel?
    In pnpMyAdmin?

    March 15th, 2008 at 6:15 am

  23. Kjetil Flekkoy adds this Comment:

    Hi
    Quite new to this, but still the same problem. I wonder:
    Why would I have to do step 3 (create a new table) instead of just importing the dump tables? (After all, they should now be in perfect shape)
    If I MUST create those tables manually first, should I basically create them with an equal setup as the export tables had?
    Thanks

    March 17th, 2008 at 2:39 pm

  24. Kjetil Flekkoy adds this Comment:

    Sorry - one more thing: Like Nuno Leite I can not see any option for setting export to either latin-1 (or any other format). Any hints?

    March 17th, 2008 at 3:36 pm

  25. brughagedis.nl » Stunt adds this Pingback:

    […] ik daarvoor een backup gemaakt, zoals ook wordt aangeraden. En op de blog van Alex King  wordt nog weer een andere manier aangeraden om de tekenset te veranderen die erop neerkomt dat […]

    March 21st, 2008 at 2:45 pm

  26. Boîte noire » Archive du blog » WordPress 2.5, point technique adds this Pingback:

    […] Le seul accroc du Release Candidate [review] est l’encodage UTF-8, qui ne fonctionne pas du premier coup ; heureusement, il suffit de supprimer/commenter la ligne DB_CHARSET du fichier wp-config.php [forum, solution]. […]

    March 24th, 2008 at 10:29 am

  27. DavidTan adds this Comment:

    So that was what happened to my database, wished i knew about that earlier. Cheers!

    March 30th, 2008 at 11:54 pm

  28. aw adds this Comment:

    Hi, this is a blogger from China. It seems an old trouble but I still cannot find a good solution.

    I’m using WordPress 2.2.2 and when I used “mysqldump” to export my data as a .sql file, the Chinese character is just in a mess-up.

    I tried a brand new WP 2.5, wrote some chinese character in it myself and used mysqldump to export, that’s okay, i can see the correct characters.

    But why my 2.2.2 cannot just be fine? What can I do to fix it? IMO, I believe that once I can get the right .sql file, i can solve the problem.

    Hope to get some tips from Alexa King, thanks very much.

    April 11th, 2008 at 9:10 am

  29. aw adds this Comment:

    And some more things to declare that may help you to figure out my problem:

    1) My server (online blog) is running very well on 2.2.2. I am just happen to meet this issue when looking for a “database-backup” solution

    2) My database backup plugin works fine but the .gz file cannot be extracted since i upgraded to 2.2.2

    3) My server (online blog) using “define(’DB_CHARSET’, ‘utf8′);”
    in the wp-config.php
    and it is just okay. But why the mysqldump result file (.sql) cannot show the right characters in my emeditor (with utf8)

    when I use “source” to resume the .sql file to my localhost mysql, the characters are also wrong, but when I use
    define(’DB_CHARSET’, ”);
    The troubles are gone. What’s that?

    It’s really making me a little desperated already :(

    April 11th, 2008 at 9:26 am

Add a Comment

Please note: Use of a non-personal web site or blog in the field below and/or comments that are off-topic, personal attacks, or support requests will likely be removed at my discretion.

Note: This post is over 2 months old. You may want to check later in this blog to see if there is new information relevant to your comment.

Around the web » « MacBook Air Notes

About This Site

This is the personal web site of Alex King, an independent developer based in Denver, Colorado USA. More...


Crowd Favorite

Crowd Favorite is my software and web development business.

We build web applications, design and develop custom WordPress themes and plugins, and build custom sites using WordPress as a CMS.


I also have a tumblog that aggregates my online content from other services (Twitter, Flickr, del.icio.us. etc.).

Ads