We ran into an interesting MySQL character encoding issue at Crowd Favorite today while working to upgrade and launch a new client site.
Here is what we were trying to do: copy the production database to the staging database so we could properly configure and test everything before pushing the new site live. Pretty simple right? It was, until we noticed a bunch of weird character encoding issues on the staging site.
It turned out that while the database tables were set to a Latin-1 (latin1), the content that populated those tables was encoded as UTF-8 (utf8). A variety of attempts to fix this failed, but what succeeded was as follows:
- Export the data as Latin-1. Because MySQL knows that the table is already using a Latin-1 encoding, it will do a straight export of the data without trying to convert the data to another character set. If you try to export as UTF-8, MySQL appears to attempt to convert the (supposedly) Latin-1 data to UTF-8 – resulting in double encoded characters (since the data was actually already UTF-8).
- Change the character set in the exported data file from ‘latin1’ to ‘utf8’. Since the dumped data was not converted during the export process, it’s actually UTF-8 encoded data.
- Create your new table as UTF-8 If your
CREATE TABLEcommand is in your SQL dump file, change the character set from ‘latin1’ to ‘utf8’.
- Import your data normally. Since you’ve got UTF-8 encoded data in your dump file, the declared character set in the dump file is now UTF-8, and the table you’re importing into is UTF-8, everything will go smoothly.
I can confirm that a half-dozen or so variations on the above do not work. This includes
INSERT INTO newdb.newtable SELECT * FROM olddb.oldtable;.
Also, if you’re doing this for a WordPress1 site (like we were), keep in mind that copying over the production database will generally mean that WP-Cache is enabled. You’ll want to remember to turn that off. Yeah. 😉
- This is a fairly common issue in older WordPress installs because the MySQL database default is commonly Latin-1, and older versions of WordPress did not specify the character set when creating the database tables (so they would default to Latin-1) and the default encoding in the WordPress settings is UTF-8. [back]
[…] Read it at the source […]
I didn’t understand the second step. How do we do that?
baris unver, open the file in ur fav text editor and use find & replace function to change latin1 to utf8.
Step 1 states “Export the data as Latin-1”.
Forgive my obvious senility, but does this export take place from the source database, or from the destination database after an import has been done and the character encoding issue has been discovered?
Step 1 should be from the source database.
I had a similar problem on my blog and all I had to do was to add // to one line in wp-config.php
// define(‘DB_CHARSET’, ‘utf8’);
And everything was fine.
That change won’t affect data that’s already in the database.
Thank you I had this issues too (not with WordPress).
I am not sure what this implies. I have an old WP installation. The options are set for UTF 8. In my database there is a column that reads collation and it is Latin 1 sweedish.
Does this affect performance? Does it need to be corrected? What problems will it cause, if any?
Well that method is kinda the risky and actually its not the preferable way to do this, because if your blog uses special some special chars like the “Ã±” in Spanish languages this chars would be converted literally to trash…
For this The Codex has a special page documenting this problem a the “preferable” way to do this:
Also deleting the charset setting inside wp-config its not a good solution, because the connections would use the default mysql charset (which is latin1 by default) so if some hosting provider or even you set this value to other things like utf8 the problem will persist.
So if you don’t want to change the charset of your WordPress based website, its a preferable that you set explicitly the charset in wp-config.
Greetings from Mexico
Thank you Alex. You’ve solved a long standing conundrum I couldn’t track down so I followed your instructions and now I have an entire MySQL DB all encoded in utf8 (WordPress install that’s been around since 2.0).
Basic maintenance and backing up of DB’s, exporting and importing DB’s tables etc is no problem, anything more than that I’m mostly in the dark so a question after the fact:
It seemed that reimporting a DB table after modifying required the original table to be dropped. Attempting to import the modified table while the original table was still present in the DB resulted in a SQL error “table already exists”. I take it that dropping the original table before reimporting is the normal way of doing these types of things?
There is a plugin that will do this all for you. I’ve used it successfully up to WP 2.3. Its in the WP repository – UTF-8 Database Converter
[…] qualche altro elemento a piacere. Capita. Non dovrebbe, ma capita. Parte della soluzione Ã¨ qua. E si traduce […]
Thanks Alex (and all the commenters) for documenting and offering solutions to an issue which has lost me many nights of sleep in the past.
The solution to this problem is also documented in MySQL’s docs:
“The CONVERT TO operation converts column values between the character sets. This is not what you want if you have a column in one character set (like latin1) but the stored values actually use some other, incompatible character set (like utf8). In this case, you have to do the following for each such column:
ALTER TABLE t1 CHANGE c1 c1 BLOB;
ALTER TABLE t1 CHANGE c1 c1 TEXT CHARACTER SET utf8;
The reason this works is that there is no conversion when you convert to or from BLOB columns.”
[…] https://alexking.org/blog/2008/…..conversion […]
I should clarify, the reason we couldn’t use the database converter scripts or convert the columns to binary, etc. was because the source database was still live and in production. We had to copy the data from the production database to the staging database without modifying the production database.
That is an important point and renders many of the other options mentioned here non-viable.
I guess the same procedure will work from iso-8859-1 also?
Latin 1 is ISO-8859-1.
[…] der Datenbank auf UTF-8 – nicht meinen – entdeckte ich wie gesagt kÃ¼rzlich Ã¼ber den WP-Ticker auf alexking.org. Dort wurde empfohlen, die alte Datenbank bewusst Latin-1 kodiert zu exportieren, den Zeichensatz […]
[…] (ActualizaciÃ³n: El error mÃ¡s comÃºn es que el juego de caracteres de tu mySQL no coincida con el de WordPress – utf…) […]
Where todo the export as latin-1 ?
Quite new to this, but still the same problem. I wonder:
Why would I have to do step 3 (create a new table) instead of just importing the dump tables? (After all, they should now be in perfect shape)
If I MUST create those tables manually first, should I basically create them with an equal setup as the export tables had?
Sorry – one more thing: Like Nuno Leite I can not see any option for setting export to either latin-1 (or any other format). Any hints?
[…] ik daarvoor een backup gemaakt, zoals ook wordt aangeraden. EnÂ op de blog van Alex KingÂ Â wordt nog weer een andere manier aangeraden om de tekenset te veranderen die erop neerkomt dat […]
[…] Le seul accroc du Release Candidate [review] est l’encodage UTF-8, qui ne fonctionne pas du premier coup ; heureusement, il suffit de supprimer/commenter la ligne DB_CHARSET du fichier wp-config.php [forum, solution]. […]
So that was what happened to my database, wished i knew about that earlier. Cheers!
Hi, this is a blogger from China. It seems an old trouble but I still cannot find a good solution.
I’m using WordPress 2.2.2 and when I used “mysqldump” to export my data as a .sql file, the Chinese character is just in a mess-up.
I tried a brand new WP 2.5, wrote some chinese character in it myself and used mysqldump to export, that’s okay, i can see the correct characters.
But why my 2.2.2 cannot just be fine? What can I do to fix it? IMO, I believe that once I can get the right .sql file, i can solve the problem.
Hope to get some tips from Alexa King, thanks very much.
And some more things to declare that may help you to figure out my problem:
1) My server (online blog) is running very well on 2.2.2. I am just happen to meet this issue when looking for a “database-backup” solution
2) My database backup plugin works fine but the .gz file cannot be extracted since i upgraded to 2.2.2
3) My server (online blog) using “define(‘DB_CHARSET’, ‘utf8’);”
in the wp-config.php
and it is just okay. But why the mysqldump result file (.sql) cannot show the right characters in my emeditor (with utf8)
when I use “source” to resume the .sql file to my localhost mysql, the characters are also wrong, but when I use
The troubles are gone. What’s that?
It’s really making me a little desperated already 🙁
Doh, got bitten by this issue as well.
My upgrade failed because I never realised I was on one of those installations that needed a “character set upgrade”. Before I knew it the WordPress Upgrade had encoded the utf-8 to utf-8 again. Luckily I had a backup, or so I thought.
My backup was messed up as well, as it turns out. That’s because my recent version of mysqldump encoded everything to utf-8, again, because that’s its default encoding and the source table is said to be in latin1. So I had double encoded utf-8 in the backup as well. Oh joy.
And since the original input text from my client really wasn’t latin1, but probably windows 1252, it took a while to figure out how to recover the original text. MySQL’s ‘convert to character set’ nor iconv can help you get from utf-8-encoded utf-8 to, well, utf-8, since the target encoding is really just a bunch of bytes with at least a few that are illegal in most codepages. So I programmed my own utf-8 decoder-helper-script.
It’s a pity that the WordPress docs are not more specific about this issue. It just reads ‘check your backup’. Sure, I scanned a few lines. I just didn’t read column 16131 on line 300-something, where the first byte-garbage appeared. If the docs would have read something like “We used to do something half-arsed with text-encodings, but we read Joel Spolsky’s unicode blog post, and now we are trying to clean it up, but we need your help when you’re upgrading”, I would have been more careful.
I appreciate the comment that “create newtable/select from oldtable” won’t work, but I’m wondering how one goes about editing a large .sql textfile. For whatever reason, the (Windows) text editors I’ve used all top out around 100MB, and can’t do anything with files larger than that. A 100MB dump isn’t so much in my experience. Any recommendations on a text editor that can handle very large files (> 1GB)?
[…] the plugin I tried to restore my comments from the previous days backup but unfortunately due to a character encoding issue that didn’t work either and I was left in an even bigger mess! Using the import feature of […]
[…] Alex King solved a different problem: his apps were talking UTF8, but his tables were Latin1. His solution was to dump the tables, change the charset info in the dump file, then re-insert the contents. Related:Freaking MySQL Character Set EncodingsMySQL Performance Monitoring Tips From The MySQL NewsletterCopying MySQL Usernames and Database PrivelegesOptimizing Inserts/Updates On MySQL TablesMySQL Bug? […]
Hello people. Well here’s my case,
I’m running an IPB installation. The charset was set to iso-8859-7 in the Admin Panel so everything entered in the database was in that encoding. Tables and database collation was latin_swedish_ci.
In PhpMyAdmin every greek character was appearing like in this image: http://core-studios.net/dbprob.jpg
so I wanted to convert this database to UTF-8 (to work with joomla too). So I changed the tables’ and database’s collation to utf-8 (I think that was wrong). Then I tried to export the database to try to convert it with UniRed or Recode or some other program, but I can’t. Maybe it’s because I set the collation to utf-8?
http://combatwombat.[...]-utf8-issues a good general solution to this issue
I found another solution to this if you are using PHP >= 5.2.3.
In that version of PHP there is mysql_set_charset function. In the wp-includes/wp-db.php I’ve added a call to that function.
That solved the problem for me and made it so I didn’t have to convert my DB.
The issue is that the DB is still in latin1 but the connection to the DB was in utf-8.
[…] projects I often need to copy the contents of one database to another. Consequently I’ve seen the problem described on Alex Kings’s blog a few times, and been able to deal with it using the same […]
1. Export the data as Latin-1
In phpMyAdmin there isn’t any option to choose charset in exporting phase.
Well, I encountered this issue today, and I am using phpMyAdmin, and I know the host tech support knows nothing about character encoding issues at all. Here is what I am going to do, and I am pretty sure that it is going to work, even though I can’t specify any charset when exporting:
1. Use phpMyAdmin dump the database. It will be in UTF-8 by default.
2. open the dump with Notepad, and save it to ANSI format.
3. open the file in IE or any web browser then change the character encoding to UTF-8.
4. Select whatever on the webpage, copy and past to a new Notepad document, and then save it to UTF-8.
5. Modify the Create Table char set as mentioned in step 3.
6. Restore the db with this file.
The step 5 in the previous comment mentioned a “step 3”. It actually refers to the step 3 in the Alexking’s original blog.
[…] the conversion on that page didn’t work. Â Luckily, I did have a backup and I did find this page. Â I copied that backup file, fired up vim, did a little “:%s/latin1/utf8/g”, saved, […]
[…] backup of my database. The bad news is that my last backup was a year ago. Thanks to some tips from Alex King and figaro, I was able to import my old database. And, praise to Google Reader. Fortunately, I […]
[…] solution was deceptively simple, and thanks to blogger Alex King, I solved this last […]
I just migrated from MySql 4.0 to 5.0. Data has been stored as UTF8 in MySql4.0. Without knowing this blog, I developed the same steps as described above, with one difference in step 4:
The MySql 4.0 dump file (dumped with mysqldump) contains already data in UTF8 format. Just importing it in a MySql 5.0 database with default charset UTF8 does not work correctly because the character set of the client connection is not defined correctly in the 4.0 dump. So, it would use a “latin 1” client connection, which will insert no UTF8 characters from the dump correctly. The solution: I defined the correct client connection at the beginning of the mysqldump file:
/*!40101 SET NAMES utf8 */;
With this definition, the utf8 encoded data has been imported correctly in the 5.0 database.
[…] solution was deceptively simple, and thanks to blogger Alex King, I solved this last […]
[…] Character Set Support Fixing a MySQL Character Encoding Mismatch […]
[…] some googling with a helpof one of my colleagues (a programmer). I was pointed to this nice article Mysql Latin1 Utf8 Conversion .What happens is that for some reason the dump I’ve made had latin1 character-set even though […]
Thanks guys! I can’t believe it how simple it is… I’ve been thinking about using more complex solutions, but it’s easy and cunning.
[…] reading some posts (e.g. this, this or this one) about how to fix it, I decided to solve it using one of the simple plugins out […]
[…] article on changing collation article by alex king VN:F [1.8.2_1042]please wait…Rating: 0.0/10 (0 votes cast)VN:F [1.8.2_1042]Rating: 0 (from 0 […]
Thanks Alex, this worked brilliant. I suspected it would as I did a similar thing with an Ushahidi installation as their older versions weren’t using utf8 (they are now).
I just used parts of your technique to transfer an old WP blog to my dev box. I didn’t need to change the create table charset as they were already in utf8 probably due to upgrades, but the content was all latin1 encoded.
I had to use mysqldump with the –default-character-set=latin1 flag to export the data, and then used Notepad++ to convert the file to UTF8. I then imported the sql file using phpmyadmin without issues.
Thank you this method worked well.
You saved me a lot of head scratching.
If you use InnoDB-tables (TYPO3, Magento etc.), then it could be possible, that relations between the tables will produce errors (in restore-process). To deactivate the checks you must insert an sql-statement: “set foreign_key_checks=0;”.
[…] some inspiration from https://alexking.org/blog/2008/03/06/mysql-latin1-utf8-conversion and http://codex.wordpre[...]aracter_Sets I realized that the reason the […]
Great tutorial, step #3 is critical. I forgot a couple of times to rename back to UTF-8 and it messed me up.
I have a big problem. My old server is gone, over, fini so I can’t re-export the data file. Is there any way to convert a backup file that already has the wacky characters in it?
[…] if you have seen any other posts on the internet about this topic, you know that there are horror stories enough to make a grown administrator quake in his […]
[…] King had wrote about this problem a few years ago. Older WordPress databases contain UTF-8 data, but have their defaults set to […]
Geek in chief, and everybody else here– here’s a silver bullet to automagically fix your dumps! —
It fixes double encoded mysqldumps and also removes any windows cp1252 gremlins.
[…] article by alex king This entry was posted in database administration, linux server administration, mysql, WordPress by chris. Bookmark the permalink. […]
[…] an excellent guide was available on Alex King’s blog. For more information and follow-up comments, you should definitely read the full post, but […]
Thanks! Saved me!