Multilingual site development: Part I language detection

The writing of this post has been on my back-burner for months now. In fact, ever since the I re-implemented Anna’s comic’s publishing system. I was finally kicked into documenting what I did by Asterisk’s latest post. I’ll probably write a series of entries dealing with the development of multilingual sites. I won’t concentrate on the usability issues that are inherent as there is already much documentation and discussion on such matters.

These posts will be much more technical and dig into how we can possibly implement multi-lingual sites that work without splash-pages (select your language before proceeding) or users ending up on pages in gobbly-gook (from their perspective) and hunting for a link to switch languages with. So, on with the techical details and nit-picking.

The approach I use with Anna’s comic is based on what I feel is a good approach to matters. Many automatic language guessers annoy me to no end. Google and many other high profile sites use IP-based location determination to determine the language and possibly mirror that will be used. This completely ignores the fact that users can set their preferred languages in the browser — that’s what should be honoured. My approach to language detection is the following:

  1. Check to see if a cookie has been set with the language information.
  2. If the user has logged in (in a site with registration) see what language they prefer from the user profile.
  3. See if the URL has any hints on the language that should be used (e.g. /archives/foo/fi/ and /archives/foo/en/)
  4. Check to see the browsers preferred languages and their order.
  5. If you really must, use location-based determination to see what language should be used.
  6. Now finally use your own default language.

The first two items are direct results of user preference by choice. The third option gives power to the referrer, but only if your site has a system that allows localization information in the URL. The fourth option gives power to the user (by changing browser settings or selecting the localized version of the browser) or local system administrator. The fifth option may be a good choice, but I’m still undecided on its benefits. I don’t use it (at least not yet) with any of my sites. And naturally the last option is your choice and depends on your default audience.

I already mentioned that Google’s location based mirroring and language selection annoys me. Let me elaborate a bit further on the problems it creates. Now first consider a user in Finland, sitting down in front of their personal computer and heading to google.com. What happens is that the browser is automatically redirected to google.fi and the user interface is now in Finnish. The option to switch to Swedish is fairly prominent (Finland is a bi-lingual country after all), but switching to English requires a bit more effort. Google.fi frontpage

While Google does allow the user to switch their default settings and even the server that is used (there is a difference in the results based on the server) it ignores the user’s default settings. The HTTP/1.1 RFC states that clients (browsers) may send the Accept-Language -header if the user can select the languages that are sent. An intellignet web application will honour the choices of the user.

These are just my opinions on the matter. I’d love to hear from others how they’ve decided to implement intelligent language switching and why.

Leave a Reply