Comparing versions
Version 4Current version

If you are using MySQL, you might run into character encoding problems on your site or in your database: non-English characters might appear as question marks or look strange (ö, ß, ü, ä, or Ãœ, instead of characters like ä, Ü, ß, ç, or é). See FAQ, or read on for more details.

Database connection setup

The problem for this, as sometimes stated, is that the encoding of the data is not the same all the way between website and database. The shortest and best solution is to have an UTF-8 system end-to-end.

However, it is no problem to have a database storing its data in Latin1, and displaying that data with correctly decoded UTF-8 characters on a webpage in a browser. The actual problems are between database software, database abstraction software, and web applications, and, most importantly, wrong settings (or lack of access to config files to actually apply the correct settings).

Some MySQL installations use Latin1 by default, for the database itself as well as for connections to it. This means, if you are storing your data as UTF-8, Bitweaver has to be told to request data as UTF-8. Unless told otherwise, Bitweaver is using your server's defaults. Since Bitweaver relies on the 3rd party application AdoDB for database abstraction, a fix would have to be included there.

That means, if your server is generally set up for UTF-8, especially your database and database tables, but for some reason the database connection is not, see this FAQ entry for a FAQ. It describes how to edit the script that AdoDB uses to connect to MySQL databases.

Database and server setup

Databases can be set to utf8 upon creation, as well as database tables. They can also be changed afterwards, even with tools like PhpMyAdmin. If the tables contain already user data, though, there will be yet again correct ASCII characters but wrong non-ASCII characters.

If your whole server OS, server software, or database software is not set up for UTF-8 yet, it's a good idea to do that. Consider some of the following steps. Bear in mind, that this is web server configuration info. There are many extensive tutorials about settings like these online. This can only be a starting point.

my.cnf (MySQL config file)


[mysqld]
# --------------------------------------------
# add the lines below to enforce utf8 encoding
# then restart the MySQL engine
# --------------------------------------------
collation_server=utf8_general_ci
character_set_client=utf8
character_set_server=utf8
skip-character-set-client-handshake


.htaccess (bitweaver root)


AddDefaultCharset UTF-8


php.ini


default_charset = "UTF-8"


Also: Your database may not be UTF-8 in it's collation settings. Consider dumping the SQL to a file, and use an UTF-8-compliant text editor to chang the CHARSET=Latin1 to CHARSET=utf8). Create a new database, with Collation set to utf8_general_ci, and SET NAMES 'UTF8'. Using the text editor, find/replace every bad character with the correct character. This can take hours. There are also scripts available to do that for you. Search for convert + charset etc.

Then upload the cleaned database to the new, empty database. Adjust Bitweaver's /kernel/config_inc.php file to reflect any change in database name.

To see how your database is set up, run SHOW VARIABLES; at the MySQL command prompt on your server (or find the link in PhpMyAdmin).

More information

 

Character encoding issues

If you are using MySQL, you might run into character encoding problems on your site or in your database: non-English characters might appear as question marks or look strange (ö, ß, ü, ä, or Ãœ, instead of characters like ä, Ü, ß, ç, or é).

Database connection setup

The reason for crunched special characters is most commonly seen as: encoding of the data is not the same all the way between website and database. The shortest and best solution, then, is to have an UTF-8 system end-to-end.

However, it is perfectly normal to have a database storing its data in Latin1, and displaying that data with correct UTF-8 characters on a webpage in a browser. If a database stores a character Latin1 encoded, but an application requests that character expecting UTF-8 encoding, the database sends that character UTF-8 encoded. The actual problems are bugs and wrong settings between database applications, database abstraction applications and web applications, or the lack of access to configuration files to actually apply the correct settings.

Unless told otherwise, Bitweaver is using your server's defaults. If your server is generally set up for UTF-8, especially your database and database tables, but the database connection is not, try to edit the script that AdoDB (a 3rd party application for database abstraction) uses to connect to MySQL databases:

yourbiteaver/util/adodb/drivers/adodb-mysql.inc.php


<?php
function _connect($argHostname$argUsername$argPassword$argDatabasename)
    {

    
// ... some stuff ...

        
if ($this->_connectionID === false) return false;

    
// edit adodb-mysql.inc.php around line 366 or 373
    // (depends on the version of that file)
    // function "_connect"
    //----------------------------------------------
    // THE FOLLOWING LINE IS THE ONE TO ADD:
        
if (mysql_query("SET NAMES 'UTF8'") === false) return false;
    
//-----------------------------------------------
        
if ($argDatabasename) return $this->SelectDB($argDatabasename);
        return 
true;
    }
?>


Database admin tools

MySQL usually determines character encoding via the environment (the web server's software). Only if "SET NAMES utf8" is set, the PHP application uses UTF-8 encoding to connect to the database. However, if UTF-8 is not set, but used regardlessly (and if there are no bugs interfering, depending on versions and setup), there still shouldn't be problems, as long as PhpMyAdmin (or similiar) is not used … because all valid UTF-8 byte sequences are also valid Latin1 characters.

When you install Bitweaver with a standard MySQL server, the database connections will all be Latin1. Bitweaver writes UTF-8 characters to that connection. MySQL will store them to the tables correctly. Now, when requesting them, they are reported to be Latin1 characters, but your browser, displaying the pages served by Bitweaver, interprets them as UTF-8 just fine. The problem is, PhpMyAdmin sets the connection to UTF-8, which makes MySQL try to convert the UTF-8 characters, thinking they are Latin1, to UTF-8, which results in weird characters. That means, you can never use any admin tool in UTF-8 mode to write to your Bitweaver database tables, but Bitweaver works fine if you just ignore it. It's just not the proper way to go about, setting the database connection to Latin1, but writing UTF-8 characters, and confusing when using PhpMyAdmin (or similiar).

That means, if you input data through a Bitweaver form like wiki page, and special characters seem to be written to, as well as read from the database correctly, everything is fine. Eventhough, when looking at the database with a tool like PhpMyAdmin, special characters seem wrongly decoded. In that case, PhpMyAdmin (or similiar tools) should not be used to transfer databases between systems, or input data directly through the input forms of these tools. If that is done anyway, the special characters will get messed up in the process.

Database and server setup

Databases can be set to UTF-8 upon creation, as well as database tables. They can also be changed afterwards, even with tools like PhpMyAdmin. If the tables contain already user data, though, there will be yet again correct ASCII characters but wrong non-ASCII characters.

If your whole server OS, server software, or database software is not set up for UTF-8 yet, it's a good idea to do that. Consider some of the following steps. Bear in mind, that this is web server configuration info. There are many extensive tutorials about settings like these online. This can only be a starting point.

my.cnf (MySQL config file)


[mysqld]
# --------------------------------------------
# add the lines below to enforce utf8 encoding
# then restart the MySQL engine
# --------------------------------------------
collation_server=utf8_general_ci
character_set_client=utf8
character_set_server=utf8
skip-character-set-client-handshake


.htaccess (bitweaver root)


AddDefaultCharset UTF-8


php.ini


default_charset = "UTF-8"


Also: Your database may not be UTF-8 in it's collation settings. Consider dumping the SQL to a file, and use an UTF-8-compliant text editor to chang the CHARSET=Latin1 to CHARSET=utf8). Create a new database, with Collation set to utf8_general_ci, and SET NAMES 'UTF8'. Using the text editor, find/replace every bad character with the correct character. This can take hours. There are also scripts available to do that for you. Search for convert + charset etc.

Then upload the cleaned database to the new, empty database. Adjust Bitweaver's /kernel/config_inc.php file to reflect any change in database name.

To see how your database is set up, run SHOW VARIABLES; at the MySQL command prompt on your server (or find the link in PhpMyAdmin).

More information

work for you
Page History
Date/CommentUserIPVersion
08 Dec 2008 (22:08 UTC)
added info from tommie-lie regarding utf8 and phpmyadmin
laetzer85.178.6.2497
Current • Source
laetzer85.178.55.1586
View • Compare • Difference • Source
laetzer85.178.55.1585
View • Compare • Difference • Source
laetzer85.178.55.1584
View • Compare • Difference • Source
Combat Wombat118.90.21.1263
View • Compare • Difference • Source
Combat Wombat118.90.21.1262
View • Compare • Difference • Source
Combat Wombat118.90.21.1261
View • Compare • Difference • Source