One of the things I still plan to do in the upcoming Tasks Pro™ and Tasks releases is change the default character set encoding (for English) from ISO-8859-1 to UTF-8. The trick is how best to handle this for existing customers. Here are the options I’m looking at:
- I’ll still have an ISO-8859-1 version of the English language file, so I can change existing prefs to use the (renamed) english_iso88591 option instead of the english option. I guess the default language in the server settings should be changed as well. Users can then choose to switch to UTF-8 if they like, but special characters will not be translated. New users (the new default is UTF-8) get the most benefit. This is probably the lowest impact option.
- Do option 1 above, and create a utility script (run separately) that can be run to convert data to UTF-8.
- Have an option in the upgrade script to change users language prefs and data to UTF-8.
#2 and #3 seem like nicer options to me, however I haven’t found a good solution for converting the data yet. Here are the requirements:
- Must be a PHP solution (no PERL or executables)
- Must be compatible with MySQL 3.2x – current
- UPDATE: Must be PHP 4.1+ compatible and included in the standard installation
I’ve looked at the MySQL CONVERT() docs, but I haven’t found an example of how to use it how I need to. My test queries didn’t succeed and I’m tired… I’m hoping enlightenment will come to me in the form of a comment. 🙂
This post is part of the project: Tasks Pro™. View the project timeline for more context on this post.
What you’re looking for is iconv functions, directly available in PHP.
Ok, one more requirement added above. 🙂
If your storing UTF-8 strings in MySQL I don’t think that will work unless you require the 4.1 family (maybe 4.0)? I didn’t think it had Unicode support until that release but I could be wrong.
Storing strings declared as UTF-8 from the browser in MySQL 3.23 seems to work fine, at least in my testing.
Are they actually utf-8 high characters? Meaning try arabic letters or that type of thing not just english. The utf-8 characters for english match the ascii so they might work but others might not.
JOS has a good article explaining it all:
http://www.joelonsof[...]Unicode.html