We ran into an interesting MySQL character encoding issue at Crowd Favorite today while working to upgrade and launch a new client site.
Here is what we were trying to do: copy the production database to the staging database so we could properly configure and test everything before pushing the new site live. Pretty simple right? It was, until we noticed a bunch of weird character encoding issues on the staging site.
It turned out that while the database tables were set to a Latin-1 (latin1), the content that populated those tables was encoded as UTF-8 (utf8). A variety of attempts to fix this failed, but what succeeded was as follows:
- Export the data as Latin-1. Because MySQL knows that the table is already using a Latin-1 encoding, it will do a straight export of the data without trying to convert the data to another character set. If you try to export as UTF-8, MySQL appears to attempt to convert the (supposedly) Latin-1 data to UTF-8 – resulting in double encoded characters (since the data was actually already UTF-8).
- Change the character set in the exported data file from ‘latin1′ to ‘utf8′. Since the dumped data was not converted during the export process, it’s actually UTF-8 encoded data.
- Create your new table as UTF-8 If your
CREATE TABLEcommand is in your SQL dump file, change the character set from ‘latin1′ to ‘utf8′. - Import your data normally. Since you’ve got UTF-8 encoded data in your dump file, the declared character set in the dump file is now UTF-8, and the table you’re importing into is UTF-8, everything will go smoothly.
I can confirm that a half-dozen or so variations on the above do not work. This includes INSERT INTO newdb.newtable SELECT * FROM olddb.oldtable;.
Also, if you’re doing this for a WordPress1 site (like we were), keep in mind that copying over the production database will generally mean that WP-Cache is enabled. You’ll want to remember to turn that off. Yeah.
- This is a fairly common issue in older WordPress installs because the MySQL database default is commonly Latin-1, and older versions of WordPress did not specify the character set when creating the database tables (so they would default to Latin-1) and the default encoding in the WordPress settings is UTF-8. [back]
Popularity: 15% [?]



Wp Wordpress » Blog Archive » Fixing a MySQL Character Encoding Mismatch adds this Pingback:
[...] Read it at the source [...]
March 6th, 2008 at 1:13 am
Baris Unver adds this Comment:
I didn’t understand the second step. How do we do that?
March 6th, 2008 at 1:45 am
bdtmz adds this Comment:
baris unver, open the file in ur fav text editor and use find & replace function to change latin1 to utf8.
March 6th, 2008 at 3:41 am
jgb adds this Comment:
Step 1 states “Export the data as Latin-1″.
Forgive my obvious senility, but does this export take place from the source database, or from the destination database after an import has been done and the character encoding issue has been discovered?
March 6th, 2008 at 5:27 am
Alex adds this Comment:
Step 1 should be from the source database.
March 6th, 2008 at 7:25 am
kosir adds this Comment:
I had a similar problem on my blog and all I had to do was to add // to one line in wp-config.php
I changed
define(’DB_CHARSET’, ‘utf8′);
to
// define(’DB_CHARSET’, ‘utf8′);
And everything was fine.
March 6th, 2008 at 7:27 am
Alex adds this Comment:
That change won’t affect data that’s already in the database.
March 6th, 2008 at 7:28 am
shuron adds this Comment:
Thank you I had this issues too (not with Wordpress).
March 6th, 2008 at 7:59 am
big Dog adds this Comment:
I am not sure what this implies. I have an old WP installation. The options are set for UTF 8. In my database there is a column that reads collation and it is Latin 1 sweedish.
Does this affect performance? Does it need to be corrected? What problems will it cause, if any?
Thanks…
Big Dog
March 6th, 2008 at 9:24 am
g30rg3_x adds this Comment:
Well that method is kinda the risky and actually its not the preferable way to do this, because if your blog uses special some special chars like the “ñ” in Spanish languages this chars would be converted literally to trash…
For this The Codex has a special page documenting this problem a the “preferable” way to do this:
http://codex.wordpre[...]aracter_Sets
Also deleting the charset setting inside wp-config its not a good solution, because the connections would use the default mysql charset (which is latin1 by default) so if some hosting provider or even you set this value to other things like utf8 the problem will persist.
So if you don’t want to change the charset of your WordPress based website, its a preferable that you set explicitly the charset in wp-config.
Greetings from Mexico
March 6th, 2008 at 9:24 am
Kirk M adds this Comment:
Thank you Alex. You’ve solved a long standing conundrum I couldn’t track down so I followed your instructions and now I have an entire MySQL DB all encoded in utf8 (WordPress install that’s been around since 2.0).
Basic maintenance and backing up of DB’s, exporting and importing DB’s tables etc is no problem, anything more than that I’m mostly in the dark so a question after the fact:
It seemed that reimporting a DB table after modifying required the original table to be dropped. Attempting to import the modified table while the original table was still present in the DB resulted in a SQL error “table already exists”. I take it that dropping the original table before reimporting is the normal way of doing these types of things?
March 6th, 2008 at 10:21 am
cgilkison adds this Comment:
There is a plugin that will do this all for you. I’ve used it successfully up to WP 2.3. Its in the WP repository – UTF-8 Database Converter
March 6th, 2008 at 2:35 pm
Quasi.dot › Lettere accentate, MySQL e Wordpress adds this Pingback:
[...] qualche altro elemento a piacere. Capita. Non dovrebbe, ma capita. Parte della soluzione è qua. E si traduce [...]
March 6th, 2008 at 3:07 pm
Kerim Friedman adds this Comment:
Thanks Alex (and all the commenters) for documenting and offering solutions to an issue which has lost me many nights of sleep in the past.
March 6th, 2008 at 4:54 pm
Tienshiao adds this Comment:
The solution to this problem is also documented in MySQL’s docs:
http://dev.mysql.com[...]r-table.html
“The CONVERT TO operation converts column values between the character sets. This is not what you want if you have a column in one character set (like latin1) but the stored values actually use some other, incompatible character set (like utf8). In this case, you have to do the following for each such column:
ALTER TABLE t1 CHANGE c1 c1 BLOB;
ALTER TABLE t1 CHANGE c1 c1 TEXT CHARACTER SET utf8;
The reason this works is that there is no conversion when you convert to or from BLOB columns.”
March 7th, 2008 at 1:21 am
» Apie WP DB koduotes ir problemų sprendimo būdus (nepamirškite … blogeriai.lt adds this Pingback:
[...] http://alexking.org/[...]..conversion [...]
March 7th, 2008 at 4:35 am
Alex adds this Comment:
I should clarify, the reason we couldn’t use the database converter scripts or convert the columns to binary, etc. was because the source database was still live and in production. We had to copy the data from the production database to the staging database without modifying the production database.
That is an important point and renders many of the other options mentioned here non-viable.
March 7th, 2008 at 10:30 pm
Jinge adds this Comment:
I guess the same procedure will work from iso-8859-1 also?
March 9th, 2008 at 12:16 pm
Alex adds this Comment:
Latin 1 is ISO-8859-1.
March 9th, 2008 at 4:18 pm
UTF-8 - Umstellung (WordPress-Optimierung, Teil 1) : fob marketing adds this Pingback:
[...] der Datenbank auf UTF-8 – nicht meinen – entdeckte ich wie gesagt kürzlich über den WP-Ticker auf alexking.org. Dort wurde empfohlen, die alte Datenbank bewusst Latin-1 kodiert zu exportieren, den Zeichensatz [...]
March 11th, 2008 at 2:17 am
Exocert.com » Recuperar un Wordpress adds this Pingback:
[...] (Actualización: El error más común es que el juego de caracteres de tu mySQL no coincida con el de Wordpress – utf…) [...]
March 13th, 2008 at 8:38 am
Nuno Leite adds this Comment:
Where todo the export as latin-1 ?
In Cpanel?
In pnpMyAdmin?
March 15th, 2008 at 6:15 am
Kjetil Flekkoy adds this Comment:
Hi
Quite new to this, but still the same problem. I wonder:
Why would I have to do step 3 (create a new table) instead of just importing the dump tables? (After all, they should now be in perfect shape)
If I MUST create those tables manually first, should I basically create them with an equal setup as the export tables had?
Thanks
March 17th, 2008 at 2:39 pm
Kjetil Flekkoy adds this Comment:
Sorry – one more thing: Like Nuno Leite I can not see any option for setting export to either latin-1 (or any other format). Any hints?
March 17th, 2008 at 3:36 pm
brughagedis.nl » Stunt adds this Pingback:
[...] ik daarvoor een backup gemaakt, zoals ook wordt aangeraden. En op de blog van Alex King wordt nog weer een andere manier aangeraden om de tekenset te veranderen die erop neerkomt dat [...]
March 21st, 2008 at 2:45 pm
Boîte noire » Archive du blog » WordPress 2.5, point technique adds this Pingback:
[...] Le seul accroc du Release Candidate [review] est l’encodage UTF-8, qui ne fonctionne pas du premier coup ; heureusement, il suffit de supprimer/commenter la ligne DB_CHARSET du fichier wp-config.php [forum, solution]. [...]
March 24th, 2008 at 10:29 am
DavidTan adds this Comment:
So that was what happened to my database, wished i knew about that earlier. Cheers!
March 30th, 2008 at 11:54 pm
aw adds this Comment:
Hi, this is a blogger from China. It seems an old trouble but I still cannot find a good solution.
I’m using WordPress 2.2.2 and when I used “mysqldump” to export my data as a .sql file, the Chinese character is just in a mess-up.
I tried a brand new WP 2.5, wrote some chinese character in it myself and used mysqldump to export, that’s okay, i can see the correct characters.
But why my 2.2.2 cannot just be fine? What can I do to fix it? IMO, I believe that once I can get the right .sql file, i can solve the problem.
Hope to get some tips from Alexa King, thanks very much.
April 11th, 2008 at 9:10 am
aw adds this Comment:
And some more things to declare that may help you to figure out my problem:
1) My server (online blog) is running very well on 2.2.2. I am just happen to meet this issue when looking for a “database-backup” solution
2) My database backup plugin works fine but the .gz file cannot be extracted since i upgraded to 2.2.2
3) My server (online blog) using “define(’DB_CHARSET’, ‘utf8′);”
in the wp-config.php
and it is just okay. But why the mysqldump result file (.sql) cannot show the right characters in my emeditor (with utf8)
when I use “source” to resume the .sql file to my localhost mysql, the characters are also wrong, but when I use
define(’DB_CHARSET’, ”);
The troubles are gone. What’s that?
It’s really making me a little desperated already
April 11th, 2008 at 9:26 am
Jeroen Pulles adds this Comment:
Doh, got bitten by this issue as well.
My upgrade failed because I never realised I was on one of those installations that needed a “character set upgrade”. Before I knew it the Wordpress Upgrade had encoded the utf-8 to utf-8 again. Luckily I had a backup, or so I thought.
My backup was messed up as well, as it turns out. That’s because my recent version of mysqldump encoded everything to utf-8, again, because that’s its default encoding and the source table is said to be in latin1. So I had double encoded utf-8 in the backup as well. Oh joy.
And since the original input text from my client really wasn’t latin1, but probably windows 1252, it took a while to figure out how to recover the original text. MySQL’s ‘convert to character set’ nor iconv can help you get from utf-8-encoded utf-8 to, well, utf-8, since the target encoding is really just a bunch of bytes with at least a few that are illegal in most codepages. So I programmed my own utf-8 decoder-helper-script.
It’s a pity that the Wordpress docs are not more specific about this issue. It just reads ‘check your backup’. Sure, I scanned a few lines. I just didn’t read column 16131 on line 300-something, where the first byte-garbage appeared. If the docs would have read something like “We used to do something half-arsed with text-encodings, but we read Joel Spolsky’s unicode blog post, and now we are trying to clean it up, but we need your help when you’re upgrading”, I would have been more careful.
my €0,02
Jeroen
June 1st, 2008 at 7:47 am
Brent adds this Comment:
I appreciate the comment that “create newtable/select from oldtable” won’t work, but I’m wondering how one goes about editing a large .sql textfile. For whatever reason, the (Windows) text editors I’ve used all top out around 100MB, and can’t do anything with files larger than that. A 100MB dump isn’t so much in my experience. Any recommendations on a text editor that can handle very large files (> 1GB)?
June 1st, 2008 at 12:15 pm
David Gilmour adds this Comment:
@Brent: http://www.vim.org/about.php
July 17th, 2008 at 8:34 am
How To Force DISQUS To Sync With Wordpress adds this Pingback:
[...] the plugin I tried to restore my comments from the previous days backup but unfortunately due to a character encoding issue that didn’t work either and I was left in an even bigger mess! Using the import feature of [...]
August 28th, 2008 at 11:18 am
» Converting MySQL Character Sets MaisonBisson.com adds this Pingback:
[...] Alex King solved a different problem: his apps were talking UTF8, but his tables were Latin1. His solution was to dump the tables, change the charset info in the dump file, then re-insert the contents. Related:Freaking MySQL Character Set EncodingsMySQL Performance Monitoring Tips From The MySQL NewsletterCopying MySQL Usernames and Database PrivelegesOptimizing Inserts/Updates On MySQL TablesMySQL Bug? [...]
October 9th, 2008 at 9:17 am
Panagiotis adds this Comment:
Hello people. Well here’s my case,
I’m running an IPB installation. The charset was set to iso-8859-7 in the Admin Panel so everything entered in the database was in that encoding. Tables and database collation was latin_swedish_ci.
In PhpMyAdmin every greek character was appearing like in this image: http://core-studios.net/dbprob.jpg
so I wanted to convert this database to UTF-8 (to work with joomla too). So I changed the tables’ and database’s collation to utf-8 (I think that was wrong). Then I tried to export the database to try to convert it with UniRed or Recode or some other program, but I can’t. Maybe it’s because I set the collation to utf-8?
Any ideas?
October 20th, 2008 at 4:39 am
CombatWombat adds this Comment:
http://combatwombat.[...]-utf8-issues a good general solution to this issue
October 28th, 2008 at 2:57 am
Mike Gorski adds this Comment:
I found another solution to this if you are using PHP >= 5.2.3.
In that version of PHP there is mysql_set_charset function. In the wp-includes/wp-db.php I’ve added a call to that function.
@mysql_set_charset(’latin1′,$this->dbh);
That solved the problem for me and made it so I didn’t have to convert my DB.
The issue is that the DB is still in latin1 but the connection to the DB was in utf-8.
October 28th, 2008 at 8:04 am
Putting right weird characters due to UTF encoding in a MySQL database | Likemind Web Services adds this Pingback:
[...] projects I often need to copy the contents of one database to another. Consequently I’ve seen the problem described on Alex Kings’s blog a few times, and been able to deal with it using the same [...]
November 5th, 2008 at 10:51 am
Paamayim adds this Comment:
1. Export the data as Latin-1
In phpMyAdmin there isn’t any option to choose charset in exporting phase.
November 19th, 2008 at 3:21 am
Marbles adds this Comment:
Well, I encountered this issue today, and I am using phpMyAdmin, and I know the host tech support knows nothing about character encoding issues at all. Here is what I am going to do, and I am pretty sure that it is going to work, even though I can’t specify any charset when exporting:
1. Use phpMyAdmin dump the database. It will be in UTF-8 by default.
2. open the dump with Notepad, and save it to ANSI format.
3. open the file in IE or any web browser then change the character encoding to UTF-8.
4. Select whatever on the webpage, copy and past to a new Notepad document, and then save it to UTF-8.
5. Modify the Create Table char set as mentioned in step 3.
6. Restore the db with this file.
November 26th, 2008 at 9:20 am
Marbles adds this Comment:
The step 5 in the previous comment mentioned a “step 3″. It actually refers to the step 3 in the Alexking’s original blog.
November 26th, 2008 at 9:28 am
Pleh » Blog Archive » WordPress MySQL Character Set Encoding Conversion adds this Pingback:
[...] the conversion on that page didn’t work. Luckily, I did have a backup and I did find this page. I copied that backup file, fired up vim, did a little “:%s/latin1/utf8/g”, saved, [...]
December 20th, 2008 at 11:26 pm
Partial Recall » Phoenix Rising adds this Pingback:
[...] backup of my database. The bad news is that my last backup was a year ago. Thanks to some tips from Alex King and figaro, I was able to import my old database. And, praise to Google Reader. Fortunately, I [...]
February 16th, 2009 at 7:32 am
Nouse.co.uk » Downtime adds this Pingback:
[...] solution was deceptively simple, and thanks to blogger Alex King, I solved this last [...]
June 18th, 2009 at 10:27 am