PHP and Internalization

I was recently tasked with supporting multiple languages in a PHP script. I found an article on about.com that was a good starting point. The author basically suggested creating associative arrays with different translations:
(P.S. I modified his code just a little to better conform to PHP 5 standards)

class base {
    protected $messages;
    function msg($m)
    {
        if (isset($this->messages[$m])
           return $this->messages[$m];
        else
           error_log("$m is not set");
    }
}

class en_US extends base
{
    function __construct()
    {
        $this->messages = array('hello'=>'hello world!', 'bye'=>'goodbye world!');
    }
}

class fr_FR extends base
{
    function __construct()
    {
        $this->messages = array('hello'=>'bonjour a tout le monde!', 'bye'=>'au revoir a tout le monde!');
    }
}

//usage, in english
$eng = new en_US();
echo $eng->msg('hello'); //outputs 'hello world!'

//in french
$fr = new fr_FR();
echo $fr->msg('hello'); //outputs 'bonjour a tout le monde!'

As the author, Adam Trachtenberg, states in his blog, a translator does not want to — nor should he (or even be able to) — modify code. One misplaced semi-colon or an ommitted parenthese spells disaster for your script.

He goes on to suggest that translators should never even be able to put translations into the text. It is, in fact, the engineer’s job to take the text from the translator and manually input it into the code.
I argue that this is not a very good suggestion because this takes up engineering time (which can be very expensive compared to translators’ time). Instead, I propose that text be placed in XML files. This is a nice compromise since it requires that the translator know only XML. While I agree that there certainly is room for error in XML files — PHP is unforgiving when it comes to badly-formed xml — but that can be fixed with XML editors such as XmlSpy or XmlNotepad.
Now, the translator only needs to know how to use the XML editor, and possibly some insert some <br /> tags. (And yes, most XML editors will insert the correct escape sequences such as <br />)
Ok, enough philosophizing. How about some code?
First, we don’t need multiple class extensions. We just need the base class.

include_once(dirname(__FILE__) . '/Xml_Parser.php');

class Language_Base
{
    protected $messages;

    /** Language_Factory($locale_code Creates a
    * new instance of the Language_Base class with
    * the specified locale code, i.e. en_US, nl_BE...
* The function then looks for the matching xml file.
* If the locale is en_US, the script looks for en_US.xml
**/
    static function Language_Factory($locale_code)
    {
        $filename = dirname(__FILE__) . "/$locale_code.xml";

        if (file_exists($filename))
        {
            $parser = new Xml_Parser($filename);
            $lang_arr = $parser->getArray();
            return new Language_Base($lang_arr);
        }
        else
        {
            error("i18n error: Language $filename file not found");
            return false;
        }
    }

    function __construct($arr)
    {
        $this->messages = $arr;
    }

    function msg($key)
    {
        if (isset($this->messages[strtoupper($key)]))
            return $this->messages[strtoupper($key)];
        else
        {
            error("l10n error:LANG:" . "$this->lang,message:'$s'");
        }
    }
}

So, as you can see, every language has its own xml file named after the locale. For example, for english US, en_US.xml needs to exist with some data inside it. Also, I’ve defined the error($msg) function below to just output to the screen AND to the log file, but this is not necessary, just a convenience to me. You can just use the error_log() function
I trust anyone can make their own XML parser (mine is called Xml_Parser.php and returns an indexed array of messages), and if not, there are plenty of references.
The XML file should look like this:

<?xml version="1.0" encoding="UTF-8"?>
<Language>
<hello>Hello World!</hello>
<bye>Goodbye World!</bye>
</Language>

The important thing here is the UTF-8 encoding. Without that, the browsers will render in default Latin-1, and you’ll get little squares or question marks for foreign characters. Also, make sure to save it as UTF-8 encoded (from the drop-down menu), which means Notepad is NOT supported (thank God!).
Finally, the code to print “Hello World!” and “Goodbye World!” should look like this:

$obj = Language_Base::Language_Factory('en_US');
echo $obj->msg('hello');
echo $obj->msg('bye');

That’s it. Simple, right?
In one of my implementations, I have to display the correct page based on the top-level domain (de, fr, etc), so I modified my Language_Factory class to look up the top-level domain in another xml file to find the locale code.
Overall, this is a very simple solution. There is a potential for a lot of improvements (like differentiating between different official languages of one country, like Belgium), but it’s also very flexible in that it allows HTML to be inserted into the XML and allows “regular” users to modify the XML translations with XML editors without actually looking at code (with far less potential of breakage).

Leave a Reply