localization

I've intended to announce this development at Drupalcon San Francisco but unfortunately the session on this was merged with a more general i18n session which was coupled by the ash cloud above Europe, so I could not go. Evidently, collaborative software translation is not a mainstream topic. On the other hand, I keep receiving requests of the general applicability of the tools Drupal.org uses about every two weeks, and this interest always amazes me. While the localization server tool used on localize.drupal.org grew out of needs of the Drupal community, the solutions were architected to be useful in a general purpose software translation environment. While the architecture was there, it was lacking useful UI controls to just run it as a generic software translation tool.

Existing non-Drupal users like the Gallery 2/3 project and the Musescore desktop app utilize custom data connector modules, which the localization server nicely supports, allowing for custom code to gather data for translation. Gallery even uses a custom Localization client port for clients to submit translations to the server, even though their software is not Drupal based. However, translating arbitrary software without writing your custom connector code was not possible earlier.

I remember how skeptical I was looking at some presenters traveling around to multiple conferences with "the same" presentation a decade or so ago. Having been a course instructor for years and being a presenter for even longer, it looks completely different now. It's not that the topics you cover under the same looking umbrella can be quite different, you also find much better ways to express whatever you want to tell your audience as you experience feedback.

Of course the best would be to present your story crystal clear from the start, but despite being an enthusiastic follower of Garr Reynolds and Nancy Duarte, you'll undoubtedly need lots of time anyway to take a relaxed look on your story and distill to the level needed to form a great presentation. I've actually found it quite hard to refine my slides without actually showing/presenting them to an audience. The faces, questions, smiles and sometimes plain staring expressions you get tell you how you'd done and you can derive ways of how can you improve.

Two interesting examples are my slides on Drupal 7 and localize.drupal.org.

Finally, the promise of a centralized localization interface for Drupal modules and themes looks to be coming true. I've started work on this project around two years ago under Google Summer of Code sponsorship and was continuing maintenance and improvements ever since. While I was spreading the word on it, not many people signed up to help clean up some possible performance problems, so it did not make into Drupal.org yet.

However, earlier this year I've got reviews from some key people in the infrastructure team, especially Gerhard Killesreiter, who persuaded me that setting this up is more important then it not being perfect yet. Software is evolving matter anyway, and we should improve as we see the problems. So I've started to set up localize.drupal.org. While we work out some of the kinks like single sign-on with drupal.org (one of the promises of the drupal.org redesign which will be delivered here), I thought it would be a good idea to discuss the implications.

Moshe Weitzman recently started the D7CX movement to rally people to get Drupal 7 contributed modules upgraded and released on time for the Drupal 7 release. This would be a great boon to the Drupal 7 release, which shapes up to be a huge improvement over Drupal 6 already. He also suggests a contributed module release manager who can help with (among other things) ensuring that tools are available to help people upgrade.

Me being the guy, who maintains modules important for the translation of Drupal interfaces, such as l10n_client, potx and l10n_server, I looked at this movement from my angle. While my modules are not among the highly popular top 40 modules Moshe highlights, these tools are used to translate them and reach many parts of the globe. So in part on Acquia sponsored time and my free time, I went ahead and ported both the Localization client and Translation template extractor to Drupal 7. Both had their own challenges. The client port is just a direct functionality port, so does not include real string context support yet (a feature new to Drupal 7). The template extractor however got full context support to match what Drupal 7 supports currently. It also got its coder_review integration updated. Both modules now have their 7.x-1.x development snapshots downloadable.

Thanks to these updates, I hope people working on the D7CX movement will not be in the dark about localization API changes and usage limitations of the new API. You can run the template extractor on your code or run the code review from Coder review to get error reports when the localization API was not properly used. While Coder review includes some rules on its own to test for some common errors, only using the actual tool translators use can tell you all the errors you can make while writing your code.

Future plans include backporting the Drupal 7 parsing support from the template extractor to Drupal 6 (which would be easy except a little API change required), so when integrated with the localization server, contexts would be properly stored. Which would require an update to the localization server too. The goal is to have a Localization server system which can be used to translate Drupal 7 core and modules, so translators can work on their part before the release too. Also, the localization client would pass on the context information too to the server, so people can keep using that to translate and share their work right away.

People keen on magic names can just refer to this effort as D7TX, the translator experience ;) Have fun using these tools, and as always report bugs and provide patches please!

I've had the luck again to join Jose A. Reyero and present about the multilanguage features of Drupal and its contributions at Drupalcon DC last week. I've had a presentation on the topic at Drupalcon Szeged last August, and we had a session on the topic at Drupalcon Boston last March with Jose. So looking back it almost felt like we are going to repeat ourselves.

What made this time special however is that we have a huge amount of experience gathered from users of the modules and Drupal core itself, and we see our strengths, real use cases and problems better. Previous sessions covered the concepts, but this time we had the fantastic Roger Lopez join us as third panelist, who talked about the Drupal 6 based multilanguage platform used to host Britney Spears', Pink's and other Sony stars' sites already. There was a nice Drupal.org front page post on their solutions and contributions while we were in DC. It is well worth a read!

UN Flags photo by clearrants on Flickr

So the panel ended up with me talking about Drupal core and some broader issues, Jose talking about Drupal 7 plans and i18n features and Roger talking about solving some of the tasks they have faced on Drupal 6 with core, the well known contributed modules and some custom development. He also called for involvement in some of the unresolved issues in his presentation. Finally we took some great questions and wrapped up.

I believe this was our best Drupal multilanguage talk so far, but unfortunately it was not videotaped. So all we can share with you are our slides in presentation order. Enjoy!

Just over a week ago, I've been in New Orleans to talk about multilingual Drupal website building at the Do It With Drupal event organized by Lullabot. I've been happy to join fellow Acquians for a short time at the office and then at the Do It With Drupal seminar to represent the company. It was a fun experience to hook up with so many other people looking into using Drupal for the first time or even selling Drupal services already. It was a good mix, and was a very different target audience compared to Drupalcons. This event was more focused on the path seekers and the beginners with high detail and cross-discipline talks over four days. I've enjoyed several sessions, including Robert's session on Solr.

Unfortunately (for my enjoyment of the conference), my session was at one of the last slots, but it had a good turnout nonetheless. I've been prepared well in advance with a completely rethought line of thought (compared to previous, more developer focused events), and a slideshow done from the ground up. So despite talking about this topic before elsewhere, I needed to have a totally fresh look at the topic and present all the latest developments to date.

Since I do not have the permissions to upload my session to the website of the event, and the slides I sent in by email were not uploaded yet, I figured I'd better share them here with those eager to look into them soon. Happy holiday's reading if you are about to take time to learn more about multilingual Drupal solutions!

This story started almost a year ago, when I published my cheat sheet for the Drupal 6 localization API. Although Drupal 6 was not ready at that time, the localization API was as stable that the cheat sheet is useful without modification even today.

My intention with the cheat sheet was to start a localization API guide on Drupal.org and get the intricate details of this API documented for the general good. Over the past few weeks, i've managed to have time to actually sit down and document best practices and tips for these functions, and published the Localization API guide as part of the Drupal.org developer handbook (it is worth to check out the printer friendly version for a quick glance). While some parts of the guide are still under discussion and finalization (and I still plan two pages: one on emails and the other one on pointers for people looking to translate user provided data), the guide is pretty much complete as far as localizing the interface goes.

Another side of that old blog post of mine was new Translation template extractor support for the coder module. Well, that was basically tapping the existing errors into coder and make you figure out the rest. The existing error messages in extractor were however quite cryptic, like Invalid marker: t($joe). This is not really helpful in finding out what is the issue at hand, when you are not familiar with the finer details. This was unhelpful for both module authors and code reviewers, who were eager to fix these problems. I got several support requests in the extractor issue queue to clarify guidelines. So updating the error messages was clearly in order.

The result of these two efforts is that the latest development version of the extractor on the 6.x-2.x branch (update from CVS or wait for today's tarball to materialize) now supports nicely understandable error messages for coder module (and way better error messages for its standalone mode just as well) with links to the actual documentation explaining the underlying causes and details. This will hopefully end up in a new release very soon.

So do you have any excuses left to not write nicely translatable Drupal module interfaces?

I am way behind in blogging about DrupalCon Boston 2008, which was truly a blast. It was the biggest and best organized Drupal conference so far, and was put together in record time. I was happy to come early to Boston and stay a bit more with people who had their flights cancelled, and others who simply live in Boston to tourist around the city as well.

The conference provided lots of opportunities to be productive on-site in the BoFs and on the code sprint which followed the conference. Honestly, I intended to work on some of my core modifications for filters which (unfortunately) are still not in patch form, but without network connection for a considerable time, I needed to look into what I have on my computer, and figured I should work on the top priority contrib issue in my projects, as identified at the BoFs. Read on to find out more.

It looks like the list of sessions for DrupalCon Boston is finalized, so I am happy to announce, that we are going to have a Multilanguage Drupal: a status report and a discussion session, which is going to cover the current state of Drupal 6 and a short overview of contributed modules, and should end up in a vibrant discussion on where Drupal 7 is headed as far as language support goes. There is a huge interest in multilingual support with around 20 modules hosted on drupal.org already. Come and discuss where Drupal is heading, Drupal 7 is in need of hands to advance in this area.

While most of what Drupal core lacks is user entered content translation and localization, and the above session will focus on this, I also added a BoF suggestion which deals with (built-in) interface localization exclusively. Localization tools for Drupal teams and users is expected to focus on tools like l10n_client and l10n_server and related technologies.

In my working hours, I am busy with better support for WYSIWYG editors in Drupal 7 these days, so I am co-hosting a working group BoF with Doug Green titled WYSIWYG Working Group for 7.x core which should be a discussion of proposals on fixing current WYSIWYG integration problems and weaknesses.

At last but not least, Kristof Van Tomme is proposing Szeged, Hungary for DrupalCon Europe 2008, and he intends to hold a discussion BoF on this. The Drupal Association also intends to have a discussion meeting (not open for the public) on the next DrupalCon, so whether this BoF happens is still to be seen. In any case, I am one of the firm supporters of a DrupalCon in Szeged, and I am confident Kristof would be able to lead effectively to get it done in good quality. The easily digestable version of the proposal is up at Proposing Szeged, Hungary for Drupalcon Europe 2008 (look for the attached PDF).

And, well, honestly this is all just peanuts to what all DrupalCon Boston has to offer. So if you are still wondering, whether to go or not to go, make sure you reserve your place! It's a must.

Several people asked me to post about the status of the localization server, so here it goes. This project was started originally by Bruno Massa, then picked up by me as part of Google Summer of Code 2007 aiming to replace the Gettext and CVS based workflow for translators, providing a fully web based translation interface. One of the cool things of working full time on Drupal at Acquia is that I have capacity for spare time developments like this one. That's great.

The ultimate goal of the localization server is to have a central translation server running on Drupal.org which knows about relevant releases of all drupal.org projects, can share translations between projects and provide a collaboration interface for translators. This server should generate tarballs of translations for download and make information available on drupal.org project pages about translation status (among several smaller integration goals).

Through the development of the Localization Server project, I decided that it is important that we use icons instead of boring text links especially that we need to communicate lots of different things and provide action buttons for multiple options in a small space.

We do not (yet?) have a graphics artist to help out here, so it turned out that whatever icon set we choose, there will be some problem with the icons size, the exact set of icons available, their color, and so on. So it occured to me that we have a huge set of symbols already in the Unicode character set which Drupal is using, so why not use those as icons?

GMail's labels, Mint's Peppermill site and others already use a trick to wrap a few tags with specific margins to get a rounded cornered button feel, and putting a Unicode symbol in as text makes for a useful button. It is definitely not as perfect as specially tailored icons, but it allows for a few neat things. Let's see...

The current Drupal process of translating with Gettext PO files, trying to get them into CVS before a release file is generated and then going over hops to update it properly is far from ideal. There are lots of drawbacks, and I started working on a web interface this summer, sponsored by the Google Summer of Code program to improve this situation. Unfortunately the server is not yet ready for prime time (on drupal.org), but there are a number of beta testing servers where some translation teams already try to leverage the cool things this tool offers, so I have lots of feedback on the issue queue.

Localization server 5.x-1.0-alpha2 user interfaceIn the last two weeks, I spent a sizable amount of my free time on improving the navigation user interface, and adding team features to the localization server, which resulted in a huge changeset, and consequently an 5.x-1.0-alpha2 release of the module, which is now available for download.

I put in a lot of thought into designing an interface which is both easier on the newcomers and on the experienced translators, but honestly I focused more on the experienced translators with as easy access to their work as possible, implementing "quick jump forms", direct linking possibility to the translation filter pages, and so on. Note that I am not a professional interface designer, I make plans up as I go along, based on user feedback and my own focus areas.

While there is still lot of room for improvement, I believe this user interface update makes using the application easier. I tried to concentrate on emphasizing the application aspects, but honestly this is not easy when you don't have control over the theme your application is displayed with. I played with adding a web application theme into the mix and requiring that for Localization Server onwards, but then decided that this can be done later if desired. For now the navigation changes can live well with any theme not exactly focused on web applications, but web sites. I see however that in the not so distant future, I might need to tie the interface to a theme, because that allows proper focus on a usable application interface.

Check out some screenshots of how the current interface looks on my Flickr account. Next up is fixing some remaining bugs, as well as new bugs introduced with this navigation interface update and finally improving on the translation interface itself.

In my short free hours the last few days, I was brainstorming on new features for the translation template extractor (this little module which extracts translatable strings from Drupal modules) to make both the translators and Drupal coders life easier. Today I am proud to announce, that I released the old stable code as Potx-5.x-1.0 and Potx-6.x-1.0 (which signifies that the development code was quite stable for some time now) and wandered to implement new features for the 2.0 versions of the modules. From today, the 6.x-2.0-dev branch contains the two new features I developed the last few days:

  • The module now extracts translation templates for themes too, not only modules. This was an obvious feature request, but the original implementation was quite shortsighted, so the relevant part needed a full code rethink to support themes. This is good for translators.
  • The bigger news for module and theme developers is that potx now comes with (experimental) coder module integration. For those who have not heard about coder module, this little piece of software helps you to upgrade modules and ensure they conform to coding guidelines. It even helps you avoid some common security problems. But until now, it did not help you review your translatability errors. In fact, I got bug reports on the translation template extractor that if a module passed coder's review, it should not have any localization errors. Well, when used together with potx-6.x-2-dev, coder module now offers a new code review option. You can check translatability errors of your modules right there!

How can we make this even better? Well, there are still some TODO items for potx module, which will be implemented later (and I am sure people would like to see a 5.x-2.0-dev backport of the new features), but obviously people will not be better if told they make mistakes, if we don't tell them what to do instead. So I sat down and carefully crafted the Drupal 6 translation cheat sheet for your consumption. This fine piece contains the PHP and JavaScript interface translation API functions as well as the functions used in the installer (such as .install files and install profiles). I also collected the three most common errors and provided two tools to help you ensure you do as best as you can. This cheat sheet also includes explanation of the different placeholder syntaxes used in t()-ed strings, which even I have not been able to get used to still.

I hope you will find the new features and the cheat sheet useful, and take some extra time to ensure your modules are properly coded for interface translation, when you upgrade them for Drupal 6. Remember, we are going to have a "multilingual release" with all the new language features, so it becomes increasingly important that contributed modules use the interface translation API properly.

Update: Replaced the file with the 1.1 version, as I noticed that the !html placeholder needs a security warning to ensure people are aware that usage of this placeholder is not advised.

Happy hacking!

Drupal 6 typoOur localization tools and approach help us a lot in making the Drupal interface better, but we did not make use of these great features so far. I hope to involve you in making Drupal 6 even better with two simple ideas, which only require very simple tools, so anyone can contribute.

A few days ago, a simple idea came to me, to export all the Drupal interface texts as one big text file and get people spell check and correct things in it, in the hopes that we can turn the fixes into patches. While there are tools for spell-checking Gettext files (the format we use for translations), admittedly, spell checking a simple text file can be easily done in most office suites, simple command line programs, and the fixes are easy to integrate into Drupal. Thankfully a few guys jumped on the idea, and the first batch of trivial typo fixes are now in Drupal 6, leaving space for debates on spelling and wording of some remaining parts.

The logical next step is to improve the wording of Drupal interface elements. Sometimes a shorter explanation would do better, because it would actually be more welcoming for readers, and at the same time in some places, the existing explanations leave some to be desired. We can much more easily spot these problem areas, when browsing around, so if we would have a tool to "touch up" interface text while browsing around, it would help our work. It turns out that what helps people localize sites, is also a time saver here. See how Localization client is useful for improving the English interface itself:

  • Install Drupal 6 normally, in English.
  • Enable locale module, add your custom English language. Go to Administer - Site configuration - Languages - Add language - Custom language. Add "en-my" with "My English" as native and English name. Provide "en-my" as path prefix. Back on the language listing page, choose this language as your default.
  • Download Localization client for Drupal 6 and enable as usual.
  • Now for every user with proper permission (including user 1), a tool will appear on the bottom of the page. This allows you to modify any text displayed on the page, effectively fixing the interface for yourself.

Of course the fun does not stop in making all these yourself. To save you time and effort when you update your site later and some string might get modified to serve users better, it is best to share results with the community. You can easily export "My English" on the Locale module export page: Administer - Site building - Translate interface - Export. From here, you get a Gettext PO file with all empty translations and the touch up you provided.

Now everything is left before submitting this in the Drupal issue queue, is to focus this file on the actual fixes. Using the command line msgattrib helper enables you to do just that: msgattrib "en-my.po" --translated > "Drupal-fixes.po". Unfortunately this part is not as easy as clicking around (for people without gettext tools on their machines), but let me ensure you if this is the only part stopping you from submitting considerable fixes for Drupal 6, let me know and I'll help you out or get someone to help you.

Of course because all the above is pretty much automatable and could even live on a central server, an enterprising folk could easily go and set up a temporary site for interface touch-up collaboration. At the end, the result of the group effort can be submitted. Although these suggestions will be made into patches at some point, this is again a good way to help out without knowing anything about writing Drupal patches.

As a maintainer of Drupal's locale module I try to find creative ways to help people localize their sites. Our focus in Drupal 6 was on more features for content translation and interface translation imports, while the built-in locale interface was nearly untouched. We even complicated it a bit with the textgroups feature which might or might not get used by contributed modules at the end.

In a previous post, I announced the new localization client module which strives to solve some of the problems with the built-in locale module translation interface by bringing an AJAX powered widget close to the site translator. While this module is a very good looking way to solve the translation problem, it has two weaknesses:

  • You can only translate what you see on the site pages you browse by. Some text is only shown in emergency, when form values are not filled properly, when some backend data is not accessible, etc. Some text is even restricted to different user groups. So you can only translate the most visible parts of your site.
  • Closely connected, but slightly different issue is that you cannot translate strings with plural versions at once. If your page shows 3 years ago, you can translate @count years ago but not 1 year ago (the singular form) or @count[2] years ago and friends, which are used when the language in use has more then two plural forms. The Drupal database gives no clue in relating these for translation, so we cannot help users intending to translate all these at once.

Although locale module provides a more complete solution, allowing you to have a translation percentage overview as well as filter untranslated strings and work on them, you are still restricted to the same old, hard to use interface. If you'd like to improve on the interface issue, you can switch to use potx module to extract Gettext translation templates from your modules, then use some desktop Gettext editor which suits your taste and then import the translation back to your site. For most people though, the "favorite Gettext PO editor" question is like asking about the best time to go to the dentist. If we can do better, then why not?