LanguagesResearch

Techniques for multiple language support ( Internationalization )

Created by: Jan Lindåker, Last modification: 20 May 2008 (06:23 UTC) by laetzer
Three possibilites for bitweaver Internalization:

  • GNU gettext
  • IntSmarty (described in PHP Architect, April 2004 by John Coggshell)
  • Internalization method used in bitweaver and TikiWiki 1.8 (getstrings), which is the method currently used in bitweaver - (:idea:) This is the method we ended up choosing for the LanguagesPackage

GNU gettext

I ruled out GNU gettext almost imedialtely, since the other two methods use a Smarty prefilter. Smarty prefilter replaces the strings during template compilation, which means that there is no translation overhead when a compiled template is used. Since templates are only compiled when they are changed, this method is much prefered over doing it in runtime.

IntSmarty

Appart from providing some utility functions, that will be described later, IntSmarty's advantage over GNU gettext is that the replacement of strings is not done at runtime, but at template compilation. The advantage of gettext is that it supports many lanugages, but since we have allready chosen PHP and Smarty, we can use their advantages to the fullest.

IntSmarty uses a prefilter function that replaces {l}Text to translate{\l} with the the translation. The translation is looked up in a translation array where a MD5 sum of the original text is used as index in the array and the translated text as value. Other Smarty tags, like variables, are left for the translator to handle correctly. Since the translator function is a prefilter all tags that are left in the translation will be handled by Smarty.

If the a MD5 sum of the text to translate is not found in the array the array is regenerated with the new string added. A typical language array looks like this:


$__LANG = array {
'12345678901234567890123456789012' => 'Text to translate'
...
};


The translator then translates 'Text to translate' to whatever that string is in the desired language. One of the downside with this mehtod is that there is no complete list of strings that needs to be translated before all templates have been compiled. Another downside is that if changes or additions are made in the templates, at least in the current implementation, it is hard to find the new strings, especially in those cases where they only differ in spelling. Furthermore the method saveLanguage needs to be called at the end of every script that uses IntSmarty, if the developper forgets this, there is no update of the language array if there was a need to update it. This will normally be detected by an annoyed tranlator. There also does not seem to be any way to detect when translations can be removed.

IntSmarty also redefines the compile_template method, which i do not think is nessesary (and may be problematic when upgrading Smarty). The way to do this is most probably to redefine the Complier Class but I'm curently not 100% sure if this aproach will work.

For the above reasons I do not reccomend a direct use of IntSmarty.

Useful things in IntSmarty

There are some useful things with int Smarty that should be adopted:

  • IntSmarty has support for determining the prefered language of the browser, so that language is used if it exists (there is however no way to select a default fallback language, but this is a minor improvement)

  • It has a functioni18nfile, which selects a file from a directory that is depending on the selected language. This e.g. allows for localized pictures.

TikiWiki and getstrings


The method for TikiWiki 1.8 is quite simillar to IntSmarty whith the following main exceptions:

  • The tags used for strings that should be translated in templates is {tr} and {\tr}. This is not a big deal, but in my opinion they stand out al little better than {l} and {\l}. Porting of TikiWiki features fill also be much simpler if the templates does not need to be overhauled.

  • TikiWiki uses a script, getstrings, to extract the strings that needs to be translated from both templates and PHP scripts (more on that later on). The strings are placed in sections where translated, untranslated, possibly untranslated and (possibly) unused strings are separated, which makes it much easer for a translator to update a translation. Furthermore all the strings are collected at one invocation of the getstrings program.

  • As mentioned above, the getstrings program also collects strings to translate from PHP scripts. These strings are aprameters to the tra function. Unfortunately these strings are looked up in the translation table at runtime in TIkiWIki 1.8. This can however be remedied by writing a postfilter.

  • TikiWiki also has an block function for translation. I have unfortunately not been able to figure out its use.

  • TikiWiki has the option to store it's tranlation tables in a database. I have a hard time to see any use for this feature, since translation string lookup is only done at compile time, so its impact should be minimal. This is only true if an output filter that replaces tra("string"); in PHP scripts is provided, prooved to be a problem.

  • TikiWiki's translation table is in principle the same as in IntSmarty. The only differnece is that the translation key is not the MD5 sum, instead it is the original key. This is a big advantage, since for a translator it is much easier to make corrections to translations that look like this:

"English string" => "Badly translated string"


than like this

"12345678901234567890123456789012" => "Badly translated string"


  • Tiki 1.8 also has specific language selection code (see langmapping.php), which enables the language to selection dropdown to both be shown in the native language and in the currently selected language.

All in all, I do not think that the current method should be abandoned.

Improvements

There are a number of improvements that the current translation feature in bitweaver.

  • Replace the tra function in the code with an Smarty postfilter, which takes care of the translation. Unfortunatelly this has prooved to be harder than expected since the php code that includes the template is never compiled by Smarty and thus not accessible for the tra postfilter.
There exists a solution to this problem, but as mentioned before it requres quite a lot of work.
    • Use the {php} and {/php} tags in the templates for atleast the code that handles translated strings. This means that code all code that contains tra functions has to be rewritten.
    • Write a custom include function that embeds php code in the template. Smarty allready has an include function, but the included file does not end up in the comlpied temlplate, so it is not usable for our purposes. This method has the same downside as the method above, but gives cleaner code. See The tra function, for furhter info.
    • Extract the tra functions separately to either make the lang-file smaller or to make the runtime database table smaller and make the runtime performance better.

  • The useful things in IntSmarty

  • Make a new plugin in bitweaver for translations, so that translators can handle the translations through the web browser and translations can be performed much like in a wiki (i.e. so many translators are able to help out with the translation to a specifc language and they do not even have to be technically oriented).

  • Make a "gibberish" translator, that make an automatic translation into "gibberish". "Gibbberish" is a language where e.g. the strings in the original language are reversed. I almost did this for TIkiWiki, but I was not able to finish it since the problem was not so easy as I had first imagined, since Smarty and string varibles needs special tratement. This will help developers verify that all strings are correctly marked with either PHP or Smarty translation tags.

  • Handle "dynamic" strings, see UnusedWords. Dynamic strings are strings that does neither exists in PHP code nor Smarty templates. Instead they are extracted from some external source (that must be static either in a specific Tiki verison or a specific Tiki setup). The most prominent example for this is the flags feature in Tiki. The flag names are not encoded in the code but are extracted from the filename of each image file. Currently this has been solved by translators by adding these strings manually. In later version of tiki, the strings has been added in specially crafted comments so that getstrings can extract them. This solution is however inelegant and error prone if the external source changes. To solve this problem satisfactory each module that uses "dynamic" strings, should provide an interface function in a specifically named file that allows getstrings to extract these "dynamic" strings.

  • Add plugin support for translations. Currently only one file that contains the translation table is generated. To support the plugin concept completetely getstrings has to be able to genereate one language file for each plugin, but at the same time reuse translations from other plugins and make consistency checks, so that if two plugins have the same original strings, the translations should also be the same.

Additional Links and Resources

A few URL's I've found researching internationalisation techniques for PHP.

PHP Internationalization Mailing list & Newsgroup
Discussions on PHP Internationalization (i18n) and localization (l10n) issues and features. A searchable archive is available. Light Traffic.

Flaimo's little package (FLP)
Free collection of PHP classes offering message catalogs, date formatting and message formatting facilities.

FLP I18N/L10N
The i18n package is a punch of classes for internationalization. It gives you the possibility to maintain multilanguage webpages more easily. The translation strings are stored in flat text files, special Gettext files which are basically precompiled translation files or in a MySQL database. And it works independently from PHP’s setlocale function.

STPhp, internationalization tool for PHP
STPhp is a PHP-based string translation suite designed as a simple internationalization tool to work without requiring non-standard dependencies. Open source project created by Jacob Moorman.

Internationalization and Localization with PHP
Internationalization (often abbreviated I18N--there are 18 letters between the first "i" and the last "n") is the process of taking an application designed for just one locale and restructuring it so that it can be used in many different locales.

Internationalization Using PHP and GetText
(This is from the founder of TikiWiki.)

ZPTInternationalizationSupport
This document is a proposal to extend Zope Page Templates to provide internationalization support. Note that statements of fact below should be read as proposals.

Comments

Definition: i18n

by Stephan Borg, 20 Jan 2005 (20:52 UTC)
Commonly used to abbreviate the word "internationalization". There are eighteen letters between the "i" and the "n".