Case insensitive wiki titles

Lester Caine
Joined: 24 Apr 2004

Case insensitive wiki titles

Posted:01 May 2009 (07:28 UTC)
With the increased use of UTF8 for multi-linqual pages, I think we are strating to see a problem with 'pageExists' checks. The first one relates to stripping UPPER from MySQL queries, which is fine as long as the database has not been created using UTF8 character set ( not sure on MySQL on Firebird we specify chracter set as part of the connection string ). But basically once the database is in a 'binary' mode, then the UPPER is needed. Switching case sensitivity on by changing the FALSE to TRUE gets around the problem and allows 'Welcome' page to be found, but will not match 'welcome'.

Additionally in this check is the use off strtoupper which is not UTF8 compatible so produces different results to UPPER() used in the SQL. It may be better to add UPPER() around the parameters so that at least the same 'manging' is applied, but this does imping on the performace depending on how the database handles case insensitivity anyway!

Short term I think we should just disable case insensitive titles?
spiderr
Profile Picture
Joined: 08 Feb 2004

Re: Case insensitive wiki titles

Posted:01 May 2009 (14:29 UTC)
This would break links where people use the lowercase title mid-sentence which makes sense.

I would prefer we fix it the right way. Do we need a mono-cased (UPPER or lower) and translitered content title column?

Drupal has a transliteration module that might offer some guideance...
Lester Caine
Joined: 24 Apr 2004

Re: Case insensitive wiki titles

Posted:01 May 2009 (19:20 UTC)
I think the first problem here is that while ascii 'UPPER' is well defined and stable, upper case in other languages can be more of a problem. Actually using UPPER rather than LOWER is part of the problem, with possible changes in the number of letters in the word.

The correct solution is proper unicode, which PHP5 does not really supply, so while mb_strtoupper will give a different answer, it apparently still does not follow the rules and you need to know which 'transliteration' you want - something we do not know unless we start asking users for the language selection.

What would work better IS to use a separate column for UPPER. That is also a solution in some databases for faster sorting in case insensitive alphabetical order anyway, and populating that with the PHP conversion will ensure a match - even if the result is incorrect ( there have been a few bugs reported on ordering errors in other languages recently ) although using the database's internal 'UPPER' to generate the column and process the parameter before matching should work better.

Unicode does provide a clean definition of 'UPPER' that certainly Firebirds UTF8 collation follows fully, but until PHP does as well ...
  Page 1 of 1  1