Version 4

LanguagesResearch

What tecniques should be used to support TikiPro in multipple languages.

Created by: Jan Lindåker, Last modification: 10 Jan 2005 (19:35 UTC) by Jan Lindåker

Internalization


I have investigate three possibilites for TikiPro Internalization:

  • GNU gettext
  • IntSmarty (described in PHP Architect, April 2004 by John Coggshell)
  • Internalization method used in TikiWiki 1.8 (getstrings), which is the method currently used in TikiPro

I have also Investigated some improvements.

GNU gettext
I ruled out GNU gettext almost imedialtely, since the other two methods uses Smarty prefilters. Smarty prefilter replaces the strings during template compilation, which means that there is translational overhead when compiled template is used. Since templates are only compiled when they are changed, this method is much prefered over doing it in runtime.

IntSmarty

Appart from providing some utility functions, that will be described later, IntSmarty's advantage over GNU gettext is that the replacement of strings is not done at runtime, but at template compilation (the advantage of gettext is that it supports many lanugages, but since we have allready chosen PHP and Smarty, we can use their advantages to the fullest.

IntSmarty uses a prefilter function that replaces {l}Text to translate{\l} with the the translation. The translation is looked up in a translation translation array where a MD5 sum of the original text is used as index in the array and the translated text as value. Other Smarty tags, like variables, are left for the translator to handle correctly. Since the translator function is a prefilter all tags that are left in the translation will be handled by Smarty.

If the a MD5 sum of the text to translate is not found in the array the array is regerated with the new string added. A typical language array looks like this:


$__LANG = array {
'12345678901234567890123456789012' => 'Text to translate'
...
};


The translator then translates 'Text to translate' to whatever that string is in the desired language. One of the downside with this mehtod is that there is no complete list of strings that needs to be translated before all templates have been compiled. Another downside is that if changes or additions is made in the templates at least in the current implementation it is hard to find the new strings, especially in those cases where they only differ in spelling. Furthermore the method saveLanguage needs to be called at the end of every script that uses IntSmarty, if the developper forgets this, there is no update of the language array if there was a need to update it. This will normally be detected by an annoyed tranlator. There also does not seem to be any way to detect when translations can be removed.

IntSmarty also redefines the _compile_template mehtod, which i do not think is nessesary (and may be problematic when upgrading Smarty). The way to do this is most probably to redefine the Complier Class but I'm curently not 100% sure if this aproach will work.

For the above reasons I do not reccomend a direct use of IntSmarty.

Useful things in IntSmarty

There are some useful things with int Smarty that should be adopted.

  • IntSmarty has support for determining the prefered language of the browser, so that language is used if it exists (there is however no way to select a default fallback language, but this is a minor improvement)

  • It has a functioni18nfile, which selects a files from a directory taht is depending on the selected language.

TikiWiki and getstrings

The method for TikiWiki 1.8 is quite simmilar to IntSmarty whith the following main exceptions:

  • The tags used for strings that should be translated in templates are {tr} and {\tr}. This is not a big deal, but in my opinion they stand out al little better than {l} and {\l}. Porting of TikiWiki features fill also be much simpler if the templates does not need to be overhauled.

  • TikiWiki uses a script, getstrings, to extract the strings that needs to be translated from both templates and PHP scripts (more on that later on). The strings are placed in sections where translated, untranslated, possibly untranslated and (possibly) unused strings are separated, which makes it much easer for a translator to update a translation. Furthermore all the strings are collected at one invocation fo the getstrings program.

  • As mentioned above, the getstrings program also collects strings to translate from PHP scripts. These strings are aprameters to the tra function. Unfortunately these strings are looked up in the translation table at runtime in TIkiWIki 1.8. This can however be remedied by writing a postfilter.

  • TikiWiki also has an block function for translation. I have unfortunately not been able to figure out its use.

  • TikiWiki has the option to store it's tranlatioin tables in a database. I have a hard time to see any use for this fieature, since translation string lookup is only done at compile time, so its impact should be minimal. This is only true if an output filter that replaces tra("string"); in PHP scripts is provided, but that should not be a problem.

  • TikiWiki's tranlation table is in principle the same as in IntSmarty. The only differnece is that the translation key is not the MD5 sum, instead it is the original key. This is a big advantage, since for a translator it is much easier to make corrections to translations that look like this:
"English string" => "Badly translated string"
than like this
"12345678901234567890123456789012" => "Badly translated string"


  • Tiki 1.8 also has specific language selection code (see langmapping.php), which enables the language to selection dropdown to both shown in the native language and in the currently selected language.

All in all, I do not think that the current method should be abandoned.

Improvements

There are a number of improvements that the current translation feature in TikiPro.

  • Replace the tra function in the code with an Smarty postfilter, which takes care of the translation

  • The useful things in IntSmarty

  • Make a new plugin in TikiPro for translations, so that translators can handle the translations through the web browser and translations can be performed much like in a wiki (i.e. many translators are able to help out with the translation to a specifc language and they do not iven have to be technically oriented).

  • Make a "gibberish" translator, that make an automatic translation into "gibberish". "Gibbberish" is a language where e.g. the strings in the original language are reversed. I almost did this for TIkiWiki, but I was not able to finish it since the problem was not so easy as I had first imagined, since Smarty and string varibles needs special tratement. This will help developers verify that all strings are correctly marked with either PHP or Smarty translation tags.

  • Handle "dynamic" strings (see UnusedWords). Dynamic strings are strings that does neither exists in PHP code nor Smarty templates. Instead they are extracted from some external source (thats must be static either in a specific Tiki verison or a specific Tiki setup). The most prominent example for this is the flags feature in Tiki. The flag names are not encoded in the code but are extracted from the filename of each image file. Currently this has been solved by translators by adding these strings manually. Inlater version of tiki, the strings has been added in specially crafted comments so that getstrings can extract them. This solution is howver inelegant and error prone. To solve this problem satisfactory each module that uses "dynamic" strings, should provide an interface function in a specifically named file that allows getstrings to extract these "dynamic" strings.

  • Add plugin support for translations. Currently only one file that contains the translation table is generated. To support the plugin concept completetely getstrings has to be able to genereate one language file for each plugin, but at the same time reuse translations from other plugins and make consistency checks, so that if two plugins have the same original strings, the translations should also be the same.
Page History
Date/CommentUserIPVersion
20 May 2008 (06:23 UTC)
laetzer85.178.39.1310
Current • Source
spiderr66.93.240.2047
View • Compare • Difference • Source
Jan Lindåker81.226.206.1986
View • Compare • Difference • Source
Jan Lindåker213.204.139.355
View • Compare • Difference • Source
Jan Lindåker213.204.139.354
View • Compare • Difference • Source
Jan Lindåker213.204.139.353
View • Compare • Difference • Source