Version 4

MySQL and character encoding

How to fix character display issues with MySQL

Created by: laetzer, Last modification: 06 Dec 2008 (01:04 UTC) by laetzer

If you are using MySQL, you might run into character encoding problems on your site or in your database: non-English characters might appear as question marks or look strange (ö, ß, ü, ä, or Ãœ, instead of characters like ä, Ü, ß, ç, or é). See FAQ, or read on for more details.

Database connection setup

The problem for this, as sometimes stated, is that the encoding of the data is not the same all the way between website and database. The shortest and best solution is to have an UTF-8 system end-to-end.

However, it is no problem to have a database storing its data in Latin1, and displaying that data with correctly decoded UTF-8 characters on a webpage in a browser. The actual problems are between database software, database abstraction software, and web applications, and, most importantly, wrong settings (or lack of access to config files to actually apply the correct settings).

Some MySQL installations use Latin1 by default, for the database itself as well as for connections to it. This means, if you are storing your data as UTF-8, Bitweaver has to be told to request data as UTF-8. Unless told otherwise, Bitweaver is using your server's defaults. Since Bitweaver relies on the 3rd party application AdoDB for database abstraction, a fix would have to be included there.

That means, if your server is generally set up for UTF-8, especially your database and database tables, but for some reason the database connection is not, see this FAQ entry for a FAQ. It describes how to edit the script that AdoDB uses to connect to MySQL databases.

Database and server setup

Databases can be set to utf8 upon creation, as well as database tables. They can also be changed afterwards, even with tools like PhpMyAdmin. If the tables contain already user data, though, there will be yet again correct ASCII characters but wrong non-ASCII characters.

If your whole server OS, server software, or database software is not set up for UTF-8 yet, it's a good idea to do that. Consider some of the following steps. Bear in mind, that this is web server configuration info. There are many extensive tutorials about settings like these online. This can only be a starting point.

my.cnf (MySQL config file)


[mysqld]
# --------------------------------------------
# add the lines below to enforce utf8 encoding
# then restart the MySQL engine
# --------------------------------------------
collation_server=utf8_general_ci
character_set_client=utf8
character_set_server=utf8
skip-character-set-client-handshake


.htaccess (bitweaver root)


AddDefaultCharset UTF-8


php.ini


default_charset = "UTF-8"


Also: Your database may not be UTF-8 in it's collation settings. Consider dumping the SQL to a file, and use an UTF-8-compliant text editor to chang the CHARSET=Latin1 to CHARSET=utf8). Create a new database, with Collation set to utf8_general_ci, and SET NAMES 'UTF8'. Using the text editor, find/replace every bad character with the correct character. This can take hours. There are also scripts available to do that for you. Search for convert + charset etc.

Then upload the cleaned database to the new, empty database. Adjust Bitweaver's /kernel/config_inc.php file to reflect any change in database name.

To see how your database is set up, run SHOW VARIABLES; at the MySQL command prompt on your server (or find the link in PhpMyAdmin).

More information

Page History
Date/CommentUserIPVersion
08 Dec 2008 (22:08 UTC)
added info from tommie-lie regarding utf8 and phpmyadmin
laetzer85.178.6.2497
Current • Source
laetzer85.178.55.1586
View • Compare • Difference • Source
laetzer85.178.55.1585
View • Compare • Difference • Source
laetzer85.178.55.1584
View • Compare • Difference • Source
Combat Wombat118.90.21.1263
View • Compare • Difference • Source
Combat Wombat118.90.21.1262
View • Compare • Difference • Source
Combat Wombat118.90.21.1261
View • Compare • Difference • Source