Comparison of your localization options in Drupal 5 and 6

As a maintainer of Drupal's locale module I try to find creative ways to help people localize their sites. Our focus in Drupal 6 was on more features for content translation and interface translation imports, while the built-in locale interface was nearly untouched. We even complicated it a bit with the textgroups feature which might or might not get used by contributed modules at the end.

In a previous post, I announced the new localization client module which strives to solve some of the problems with the built-in locale module translation interface by bringing an AJAX powered widget close to the site translator. While this module is a very good looking way to solve the translation problem, it has two weaknesses:

  • You can only translate what you see on the site pages you browse by. Some text is only shown in emergency, when form values are not filled properly, when some backend data is not accessible, etc. Some text is even restricted to different user groups. So you can only translate the most visible parts of your site.
  • Closely connected, but slightly different issue is that you cannot translate strings with plural versions at once. If your page shows 3 years ago, you can translate @count years ago but not 1 year ago (the singular form) or @count[2] years ago and friends, which are used when the language in use has more then two plural forms. The Drupal database gives no clue in relating these for translation, so we cannot help users intending to translate all these at once.

Although locale module provides a more complete solution, allowing you to have a translation percentage overview as well as filter untranslated strings and work on them, you are still restricted to the same old, hard to use interface. If you'd like to improve on the interface issue, you can switch to use potx module to extract Gettext translation templates from your modules, then use some desktop Gettext editor which suits your taste and then import the translation back to your site. For most people though, the "favorite Gettext PO editor" question is like asking about the best time to go to the dentist. If we can do better, then why not?

Enter the localization server module suite also developed as part of my Google Summer of Code 2007 involvement (watch the video of the interface). Back in the SoC days, I made some bad decisions about the architecture of the module which I am trying to recover from now. The changes make the module suite much more complex looking, but at the same time allows you to use the fruits of the work, so this tool will not be restricted to drupal.org, and I'll not be the sole maintainer for life ;)

Warning: the localization server module suite is in heavy development. Some modules might be renamed along the way, complete functionalities might be broken out to submodules and so on. So only use it still to alfa test the functionality.

In recent weeks, I took a decent amount of my free time to rearchitect the module suite, and although the process is still underway, I needed to stop to ask for your help in making one of the most user facing components work. The module suite now consists of the following components:

l10n_community module
As it stands now, this should have been named l10n_editor or l10n_ui. It basically provides a user interface on top of some tables defined for storage of projects, releases, files, lines and strings, as well as string translations. I hope we get the Young Hahn touch for the interface soon, but functionally this works nicely. Allows you to translate strings to languages set up in locale module with some default role based permissions. Supports importing PO files as well as exporting PO packages in the form required by Drupal 6. Only requires a connector to work (see below), and the PEAR Tar classes for the package generation.
l10n_groups module
Builds on the simple role based permission model in l10n_community and maps languages to organic groups, adding a permission layer in which organic group membership and administrator status can restrict translation and suggestion submission permissions. Different groups can have different permission models: an open model where all members can suggest and translate as well as approve suggestions and a controlled model where only admins can approve suggestions and members can only post suggestions.

There are two connectors ready to be used and one connector planned to connect l10n_community to different environments:

l10n_drupalorg connector module
Synchronizes projects and releases, downloads tarballs from drupal.org and parses the files extracted from them to provide translatable strings. Also provides a nice welcome screen. Intended to be used on drupal.org itself to provide a community interface (with l10n_groups) for central translation of all modules and themes to lots of languages.
l10n_localpacks connector module
Looks into a configurable directory on the local file system for subdirectories (projects) and tarballs (releases). Uses the drupal.org standard tarball naming convention to identify project names and release versions form the file names. Extracts translatable strings from the tarballs. Intended to be used at companies for translating in-house modules and/or doing custom project translations for client needs (where forking official project translations is a requirement).
l10n_onsite connector module (planned!)
This would look into the projects installed locally (just as update_status can identify projects installed), would parse files installed locally and allow usage of the same UI for local site translation. There are a few open questions which need to be answered though.

All-in all the direction is to let the localization server operate in different environments: be it the drupal.org server where more then a thousand projects need to be translated into dozens of languages, as well as a company intranet where some modules need to be translated, to an actual Drupal site where the maintainer would like to make sure everything is properly translated. Download the comparison chart if you are interested how the different modules compare.

The basic problem I am stuck at now is that there is a considerable mismatch between what structure Drupal needs for optimized translation lookups and what my module suite components need for an optimized translation interface, and when we'd like to marry the two in l10n_onsite, this shows.

If you are interested in solving this issue and helping bring a complete localization tool set to your site, continue in the drupal.org issue I opened for this problem: Make l10n_server support local Drupal site translations.

Comments

Benjamin Melançon's picture

Hello Gábor, thank you for your amazing work on translation and localization, including bringing together and highlighting others efforts to improve this critical field.

Doing the World Social Forum site right now has brought up two questions for me:

First, is there a way to tell either the translation interface or a PO export to ignore all strings in /admin pages?

The use case for this is you have Drupal core translated, but then you add contributed modules and some custom strings get thrown in there, which you have to hand off to various teams to translate-- but mixed in are hundreds or thousands of contributed module strings which only site administrators will see, and for most multilingual sites these might as well stay in English.

Second, is there any tool or interface planned for merging translations? For instance, the WSF team adds a ton of translation for Hindi-- it would be great to contribute that back, but we'd have to drop strings we intentionally modified the meaning in localizing, and also drop custom strings and make sure strings from contributed modules went to those projects and not the Drupal core translation.

Or, conversely, the Russian translation gets updated after we've already added some of our custom strings to the previously imported translation. Could we bring it in and not overwrite what we've modified? Even an interface optimized for going down a list of translations, with English original, translation 1 and a radio button, translation 2 and a radio button, would be better than nothing.

Oh, and a third question thrown in-- if you want to use customized English on a multilingual site, there's no way to use it at "en", is there? We set up WSF2008 to use eng as a custom language, and that let's us localize the English to the site.

Thanks again and look forward to your thoughts!

Gábor Hojtsy's picture

No, unfortunately we don't know what ends up on /admin pages. By looking at the module code, even a human cannot tell what will end up on admin pages, let alone a machine. There is no way we can tell whether forms, altered forms, blocks, etc. display only on an admin page, or elsewhere too. Drupal reuses functions, form fragments, etc extensively. We can only tell what is used in the installer just because it needs to be marked explicitly with st() or $t() - which is not a clean solution either.

Yes, the localization server suite handles strings, so if you import a translation, it can save new translations properly. For strings which already have translations, the server saves the imported text as a suggestion, so admins can decide whether the newly imported translation is better or the existing one. (There are also plans to connect l10n_client installations with l10n_server, so the translations you submit there can instantly go back to the community, but that is not the only option :).

No, there is no way to use 'en' for anything but the built-in English language. You need to come up with some creative name instead for yourself. If you'd like to use the language in browser language negotiation too, then I'd suggest en-US or en-GB or something along the lines, so negotiation works nicely.

Benjamin Melançon's picture

1. The I10n client certainly helps with the goal of translating only "user-visible" strings, but lets keep an eye on the admin separation possibility-- I understand there are attempts to split up modules by admin sections (which don't have to be loaded for ordinary users), localization could piggy-back on this.

2. The possibility of customizing English without losing access to the "en" space (on single-language sites it doesn't matter) is something I would urge consideration of in future versions-- either that or a system for customizing strings that is separate from translating strings. (I will probably use it some, but I don't think overrides in settings.php is what I'm looking for.)

Thanks again for, well, everything!

moshe weitzman's picture

I submitted a patch which is now in D6 which vastly improves the use case where a client wants to just change a few strings. For example, change 'blog' to 'journal' and change 'profile' to 'space' and so on. You no longer have to enable loale module for this, and suffer all its extra queries and UI overhead.

In D6, you just hard code those replacements into an array at bottom of settings.php. The documentation is right in settings.php. See end of http://cvs.drupal.org/viewvc.py/drupal/drupal/sites/default/default.sett...

Thanks to Gabor all his hard work on localization.I'll look at the graphic and the issue and see if I can come up with some ideas.

wwwclaes's picture

I'm kind of a newbie at i18n, but I just wanted to mention the handling of dates, numbers, currencies etc. Is that considered of in the localization efforts for Drupal?

As a reference, Java seems to have pretty decent i18n handling, see:

http://java.sun.com/docs/books/tutorial/i18n/

Wikipedia mentions what might be considered during i18n:

http://en.wikipedia.org/wiki/Internationalization_and_localization

From what I have seen in PHP, it does not have that great support for i18n. Gettext solves the translation of messages (but requires PHP core compiled with Gettext support for efficient handling). PHP strftime offers some support for dates etc, but is not excellent.

Anyhow, this might be old news for you. Just wanted to inform you in case it might be helpful.

Jose A Reyero's picture

Thanks Gabor, great report and great work!

About the localization client, which is a great tool, we at Development Seed have been doing some improvements to the Drupal 5 version, like:
- A pair additional pages for all pending translations
- Besides the texts that show up in the page, add the texts that are stored also for the same path, which should fix the issues you mention wth strings showing up occasionally.
They're just waiting for me to find some time to roll out the patch. But in the meanwhile, they're already being tested in a real live site so I hope the result may be still better when I finally post the patch :-)

As for the textgroups feature, it will definitely be used in the next version of i18n suite for Drupal 6, which is on its early stages yet. It wont make a difference though if it's just kept as an only-backend feature for core, which can be used and made displayable by contrib modules later.

About the rest of the localization suite it is just amazing how it's starting to look and the wide possibilities it will open for boosting translations quantity and quality in all languages.

About these community translation tools, one feature that would be in my wish list, and I dont know if it's done or is coming or is at least in mind, as I havent had the chance to try the tools lately, is some 'translation approval workflow', like for having the translation tool open to all users, and some administrators/maintainers deciding which ones are eventually approved or not...?

Whatever, thanks for all these amazing tools you're doing.

Add new comment