User provided vs. code provided translatables and translation sets

In my previous post titled Drupal's multilingual problem - why t() is the wrong answer posted on my blog and on groups.drupal.org for feedback, I've detailed issues with using t() as a translation tool for "user provided data". This post goes into some further details, a discussion of current solutions which could form basis for discussion of future solutions.

How can we even tell the difference between code and user provided translatables?

It is fair to assume that many multilingual sites will not have English as their default language (many not even as any of their supported languages), so we cannot assume that blocks, menus, and so on are entered in English. However, source code based strings are considered part of the user interface, and as such assumed to be written in English. What does this has to do with default configurations set up by modules and How do we reconcile this with the growing popularity of exportables and features (as in Feature module generated versioned export packages)? Let's look at these two questions.

Preset configuration from distributions and module installs

When you set up a site localized with .po files under your Drupal source tree per the Drupal instructions, you'll get your default "user provided" (preset) configuration localized. Most install files use t() or st() before they insert their data to the Drupal tables. Therefore default content types, admin shortcuts, etc. are saved in the language you install Drupal in (except some current bugs). This is very nicely in line with how Drupal assumes all your user provided data is in the site default language, and the assumption is that you'll keep building out your site in that language going forward. This is sort of an issue if you enable a module using an admin language that is different to the site default, and the module adds default configuration using translation functions. That default configuration will be saved in the active language and go against our assumption. To fix this, we could always pass the site default language to the translation functions in install routines. Granted, what's an install routine is not always easy to tell and API functions are used in situations, not all of which might be language aware. I think these can and should be hunted down on an individual basis.

I also think that every piece of configuration, like a menu item, your site's name or a contact form category should know the language it was entered in. This definitely needs a lot of work and even Drupal 7 can be augmented in some (limited) ways to make this a property of each configuration component universally.

Exported configuration

Let's consider a more interesting question, exported views. (Because you are probably familiar with the situation, not that Views would be anything special compared to other exportables). When you run an exported view from code, it sounds like the view should have t() calls to display the labels, empty text, headers, etc. translated when displayed in different langauges. This sounds like a desired format for exported features, so they can support multilingual use. It is all code after all, so t() is best, right? In these cases, we then would mandate that the exported configuration is in English. Far stretch? Maybe. Well, if it is not English, the export should definitely not include t() calls at all. Once we have each configuration component (such as a view) know which language it is in like discussed above, we can tell whether to export it either way.

When the user overrides the view however, it should always be imported in the database in the site default language in Drupal 7 at least, and its runtime value not run through t() anymore, because now it became user edited/editable, and all the usual permissions and workflows for editable configuration should apply. Suddenly we have a specific language requirement, and we should store with the view the language it was saved with, and make it translatable as configuration, not as user interface.

Now this is a pretty big difference. For your code based exportable, we assumed it would ideally use t() and be written in English and therefore translatable on the user interface translation screens. However, as soon as you override it, you'd go to the views UI to translate it there as configuration. How should we avoid this mess?

Well, there are a couple options. We can let source code text be written in non-English, but then the translation call should specify which source language it was written in. That still does not solve the problem of changing workflows, and all the rest of the permission, editing and other issues I've covered in my previous post.

So to be able to handle this consistently, a general rule we can deduct is that any user editable data should be considered as if it was user provided. Even if it is sourced from code. Then we can use the same user interfaces we'd use if the data was provided on the UI to start with to provide translations (which can of course come from code as well).

Application in practice

Unfortunately there are a few pieces missing compared to the ideal scenario.

  • We need to be able to tell the langauge of each configuration object. This is implemented for some use cases in the Internationalization module suite, but I think it needs a rethink in how it is applied.
  • When objects are exported, the language of the object should be exported with it.
  • When objects are used directly from their exports, instead of using t() for translation, they should use configuration translation APIs (currently only available from Internationalization module) to translate their pieces. This is not at all implemented that I know, due to the exports then being dependent on APIs of other contributed modules. Exports should be possible with and without these APIs for single language use and code-based multi-language use eventually I think.
  • When an object is imported to the database, a certain language version should be imported. Once we know the language of each object, there is no requirement to save them in the site default primarily, but that sounds like the ideal approach for site builder's sanity. The default editing UI for the object would show that language, so it makes most sense to use the site default language.

For the exportables, this only requires that they are uniquely identifiable for the configuration translation APIs, which needs machine names for them, instead of incremental IDs, which is already a requirement for sane exportable implementations anyway.

Expanding beyond simple object property translation

Now most contributed modules only focus on simple object translation if they care about translation at all. However, there are three scenarios for foreign language sites which are generally considered when building the pieces, as I've enumerated before in multiple posts, most recently in my post on blocks and textgroups:

  1. Being able to mark an object as in one language. With node translation this was achieved by language enabling nodes.
  2. Being able to mark an object as in one language and relate it to others as being a translation set. For nodes, this is supported by Drupal core's content translation module.
  3. Finally, being able to translate pieces of the object that need translation and leave the rest alone. Load the right language variant of the object dynamically as needed. In the case of nodes, this is achieved with the contributed entity_translation module (formerly translation.module).

Now, the previous post on the dangers of t() and this post only considered the first and the third scenario so far. We discussed that each object should have their language associated as property and discussed some difficulties in handling code based configuration vs. UI based configuration, and concluded we should treat them the same.

The second scenario applied to generic Drupal modules however, is about limiting certain configuration objects to certain languages, and organizing them into sets. Think about having a menu tree in one language and another tree in another, and when you switch languages, the menu trees should switch too. You need to have sets for your primary menus, secondary menus, etc. This does apply to a diverse set of objects, but clearly not to all types. Having a content type and a different content type for other languages sounds a bit far fetched.

For Drupal 7, this is implemented on a one-on-one basis for some core objects by the Internationalization module, and as said, for most other contributed modules, they are just ignorant of the possibility. I don't know how could we build a generic API for that scenario with the diversity that there is for Drupal 7 data structures. With the Drupal 8 Configuration Management Initiative in full swing though, it looks like the current proposal is to redo all user configuration pieces under a common API, which could make it possible for us to do object property translation as well as object sets as translations in a universal way. I've asked translation related questions on the proposal, to be considered as it is being worked out. You can help there too by validating the approach with your translation needs.

While I promised to do a run-down of the current i18n_string approach to object property translation, I think there is plenty to discuss here, so I'll post about that next instead. What do you think? Please share your opinion in the comments at http://groups.drupal.org/node/151169.