l10n

Dear Drupal interface translators!

Your valuable work helps Drupal to actual world domination, so we try to support you all ways possible to be able to more efficiently organize your time to translate Drupal projects (the Drupal core system itself, as well as contributed modules, themes and install profiles).

There are big changes planned and in development for Drupal project translations. Make sure to read my Drupal Groups post if you are a Drupal translator or you would like to become one, but the current toolset scares you.

Bryan Ruby points out that many open source content management systems are started to think about multilanguage support as a core building block recently. Drupal 6 is one of these systems, and although it does not come with complete internationalization and translation features, it goes a long way compared to Drupal 5. Jose A. Reyero pulled together a nice comparision table of the Drupal 5 and 6 core multilanguage features. As his table shows, right to left (RTL, eg. Arabic, Hebrew) language support is improved considerably. Now we know about each post being written in an RTL language or not, and we know whether the language used to present the page is RTL. All is left is complete theme coverage, so themes can be RTL-aware. Drupal 6 comes with automatic discovery of RTL CSS files, so a theme can easily support RTL styles.

The basic core themes, such as Bluemarine, Marvin, and Chameleon already include RTL style sheets, but the three (actually two) bigger ones: Garland/Minnelli and Pushbutton lack RTL support. The efforts to bring RTL styles to Pushbutton are blocked on small CSS bugs. So if you care, please look into this issue: http://drupal.org/node/148084 There is also a garlandrtl theme released for Drupal 5, which has some custom hacks to recognize an RTL language, and otherwise needs to be cleaned up to get to Drupal 6 (as part of the Garland/Minnelli theme, not as a separate theme).

Help with these issues would be very welcome to have a complete RTL theme offering in Drupal 6!

I have been to REMIX 07 today, which was basically a rehash of some of the Microsoft MIX conference topics and presentations for a road show stop in Budapest, Hungary. Although I use much less Microsoft technology then I actually go to the conferences, I mostly enjoy going because it inspires me. I see cool new stuff which of course escalates into cool new stuff I would like to implement with my tool set.

One of the simple ideas that occurred to me today was about in-place interface translation editing. It is so common a request to ask for in-place editing tools for translations in Drupal. Just recently, Boris Mann posted a pointer to SLS, which aims to be a more generic solution for the problem (although I commented there why I don't think it fits Drupal well). Unfortunately we only know in t() that we are working with a localized string, so we could print a span or div with metadata to allow some jQuery to tap in and allow translation editing. Unfortunately in t(), we don't know whether we deal with output for email, SMS, watchdog logs and so on, where the extra div or span would look very unprofessional (in a text medium!). So that rules out the PHP way we could add an in-place translation widget (forget about introducing another wrapper, t() is wrapped deep enough).

This problem is so much on people's mind that Konstantin Käfer did include it in his Google Summer of Code 2006 proposal (but unfortunately did not get around to find a way to deal with it). [Unfortunately, I only found a broken link to that proposal.]

So what can we do? The locale() function in PHP stores a cache of all the text used for translation on the page. So the basic idea is to allow that function to return that cache once the page is done. Then a contributed module could request that in the page footer (once all page content and blocks are done), and export that in JSON. A simple block can be provided to give the user some overview of the strings on the page (translated, untranslated), as well as allowing the user to search for a string used on the page to fix translation for (an autocomplete field would suffice here).

So I thought this is all simple and dandy. Let $string be NULL in locale(), and if so, return the cache. This sound all good, but the cache includes all "short strings" for quick lookup, used on the page or not... Either we need some hidden setting to bypass that if using "in-place interface translation" or we need to track what was actually used from that cache. Any good ideas?

On the heels of my recent announcement that thanks to Raimund Bauer stepping in, the translation template extractor is now a separate project, I decided to look into where does this change need to get propagated into the Drupal Handbooks. To be honest, I have not really been around in this part of the handbooks before (although I am a lead member of the Drupal Hungarian translation project), and what I found was not actually pleasing. The Translator's guide seemed to be intimidating for newcomers, basic questions are sprinkled all around the guide's pages. The start page was in the first paragraph talking about what is *not* documented there, instead of trying to help people grasp how things work here.

So I decided we need a little rewrite, and I should put my keyboard and mouse where my mouth is (pun intended). While writing up the new introduction page, it turned out that a figure would show a lot more than what can be described in a reasonably short introduction (so people will actually read it). I tried to come up with a figure showing how Drupal core and contrib translations work, how translation templates are generated, tried to emphasize that existing work should be reused, where should translators put these files, and in what package these files will end up in. This resulted in a seemingly complex figure, but with using some hopefully sensible colors and text, I managed to simplify it as much as I was able to. The most important for me was to provide an overview and to communicate the tasks of a translator, and it' connection to the packaging system.

Drupal translation processes

Doing the actual figure was easy, given a great tool to visualize my thoughts. Gliffy is an incredibly fun tool, and it does a lot! It is a complex Flash application, so desktop like I always right-clicked (and got the Flash context menu which did not help in creating the figure). It is a free and easy to use online tool to create figures such as the one shown above. And it is not a closed tool, as far as exporting as SVG, PNG and JPEG goes. Unfortunately it does not export in a diagram format, so you cannot reuse your figures in Dia for example, but that was perfectly acceptable in this case.

The Hungarian Drupal interface translation team used to use a private Subversion repository to store translations. Our reason for that was that we initially had many people contributing, and it seemed to be difficult to apply for CVS accounts for each of them. It also happened that we had some of our own tools developed and used. Now there is not that many contributors and many of our tools are already migrated to Drupal.org (and the others can be migrated too), so we are moving to Drupal.org. This will most importantly be better for our users, so they can find Hungarian translations in the tarballs downloaded from Drupal.org, without browsing through our own translation repository.

The 'problem' with moving to Drupal.org is that it is quite hard to have an overview of what is happening with Hungarian translations. Although there is a Hungarian translation project, that only hosts the Drupal core files. Module and theme translations are under those module and theme projects. Although there is a move to have these module translations as their own projects (which would result in an explosion of projects there), until that is done, we would still need an overview of what is happening around Hungarian translations.

Yahoo! Pipes to the rescue! We need to watch CVS commit messages, but these commit messages are from all kinds of projects, so we need to watch the general CVS commit page. The patterns we need to watch for are translations/$language, /$language.po and .$language.po. Note, that the last two are delimited at the start so that substring matches are not possible.

Yahoo! Pipes editing screen

I needed to formulate these into Yahoo! Pipes objects and publish the pipe. The language code needed to be a user specified value, so that any translation team can use this pipe. The search strings needed to be built dynamically as a result of this, and the CVS RSS feed was filtered with these patterns. The result is the Drupal translation commits for a given language code pipe.

Finally the only problem with this pipe is that this works 'live' on the given feed, and historical information is not kept. To have an overview of what happened in the past, I have added the resulting RSS feed to the aggregator at drupal.hu, which stores historical data of our commits. A possible problem here is that the refresh interval can only be set as short as 15 minutes, which could be too long, given the frequency of commits at drupal.org. If commits run out of the ten commits long window showed in the RSS feed, we don't see them in Yahoo! Pipes and as a result, we don't aggregate them at drupal.hu. Thankfully aggregator.module can be form_alter()-ed, so we can set a shorter interval if need be.

For now however, we monitor how our new pipe works, and encourage other translation teams to get an overview of their work this way (unless they know something better in which we would be interested too).

In a recent blog entry titled The future of Drupal interface localization lies in install profiles I showed you a proof-of-concept way for a new Drupal interface translation packaging format. As the Drupal 5 release is closing on us, and we were able to fix quite a few small glitches around interface translation related problems, I decided to clean up the packaging scripts and release them to the public, so other translation groups can try this distribution format and we might eventually get this up at drupal.org as the default.

I have uploaded the packager shell and PHP scripts to the tricks contributions area. I hope I have provided adequate comments in there to let you know what directory structure is expected by the script to work right. If everything is done fine, it should generate packages like our hu-5.0.tar.gz which is our downloadable for Hungarians interested in more advanced interface localization. We don't even have the previously used package format for download anymore.

First Drupal user registration in HungarianDrupal 5 comes out with a nifty new feature (among a lot of others): it only creates database tables and imports CSS files for modules turned on. It is a logical step to do the same with interface translation files. The practice up to Drupal 4.7 was to generate smaller translation template files for translators, so they can better work with strings and collaborate with version tracking tools. These smaller files were merged into one big translation file, which was given to end users to import if they needed the Drupal package work in their language. What should be the new model, and how do we support it? Do I have a working (starter) solution? Yes. Read on!