UTF-8 RAP I18N

I always wanted to write a blog post with a title consisting of just acronyms and numeronyms – here it is!

Obviously, this post is about internationalizing web applications based on the Eclipse Remote Application Platform (RAP), using property files encoded with UTF-8 character encoding.

Usually, Eclipse developers use resource bundles (pure Java) or message bundles (when working on Eclipse) to internationalize strings. Both approaches have the limitation that the properties files, where the translations are stored, are supposed to be encoded in ISO-8859-1 character encoding.

The ISO-8859-1 character encoding is intended for western European languages and only contains characters from the first block of Unicode characters. All of the other Unicode characters have to be encoded.

Obviously ISO-8859-1 works great for English:

cool_quote=To be, or not to be

And Portuguese:

cool_quote=Ser ou não ser

However, translating into Bulgarian is a mess:

cool_quote=\u0414\u0430 \u0431\u044A\u0434\u0435\u0448 \u0438\u043B\u0438 \u0434\u0430 \u043D\u0435 \u0431\u044A\u0434\u0435\u0448

As it is in Chinese:

cool_quote=\u751F\u5B58\u8FD8\u662F\u6BC1\u706D

When creating translations for more and more languages, you get more and more encoded characters, resulting in unreadable properties files.
A better character encoding would be UTF-8, where you can actually see the real characters:

cool_quote=Да бъдеш или да не бъдеш

And

cool_quote=生存还是毁灭

Unfortunately, there is no proper API for handling UTF-8 properties files, neither for Java resource bundles nor for Eclipse message bundles. There are ways to change the persistence of both mechanisms, but you have to program these yourself. Another possibility is to always convert the property files using external tools (like native2ascii) or additional Eclipse plug-ins (like the Eclipse ResourceBundle editor).

Luckily for programmers who develop on the Eclipse RAP platform or cross-platform (RCP/RAP) – support for UTF-8 property files is already included in the i18n API. RAP needs a different internationalization API than RCP anyway, due to the multi-user nature of web applications – the RAP team used this opportunity to also include UTF-8 support. A message bundle in RAP using UTF-8 resources looks like this:

   public class Messages {

      private static final String BUNDLE_NAME
         = "org.eclipse.rap.helloworld.messages"; //$NON-NLS-1$

      private Messages() {
         // prevent instantiation
      }

      public static Messages get() {
         Class clazz = Messages.class;
         return ( Messages )RWT.NLS.getUTF8Encoded( BUNDLE_NAME, clazz );
       }
   }

The way is paved to creating UTF-8 encoded properties files with Eclipse’s property editor … but beware: there is one pitfall in the Eclipse IDE. It assumes that all of the properties files are encoded in ISO-8859-1. If you now start putting UTF-8 encoded properties files into your workspace, you’re going to end up in encoding hell! First of all, you need to configure your Eclipse preferences to show that the properties are now in UTF-8:

Eclipse preferences

Finally, editing your Klingon translation property files in Eclipse works like a dream:

Klingon alphabet in Eclipse properties editor

1 Comment
  • Paul Verest
    Reply
    Posted at 11:43 am, July 1, 2013

    I think this result should be done as external library for example on GitHub
    A lot of developers will benefit if this effort become de-facto standard for UTF-8 in properties

Post a Comment

Comment
Name
Email
Website