views:

410

answers:

9

We have a Java project which contains a large number of English-language strings for user prompts, error messages and so forth. We want to extract all the translatable strings into a properties file so that they can be translated later.

For example, we would want to replace:

Foo.java

String msg = "Hello, " + name + "! Today is " + dayOfWeek;

with:

Foo.java

String msg = Language.getString("foo.hello", name, dayOfWeek);

language.properties

foo.hello = Hello, {0}! Today is {1}

I understand that doing in this in a completely automated way is pretty much impossible, as not every string should be translated. However, we were wondering if there was a semi-automated way which removes some of the laboriousness.

A: 

I think eclipse has some option to externalize all strings into a property file.

Noam Gal
Whether that correctly handles string concatenation in a meaningful way is doubtful, though. But that's a problem which should have been anticipated earlier :)
Joey
+2  A: 

Eclipse will externalize every individual string and does not automatically build substitution like you are looking for. If you have a very consistent convention of how you build your strings you could write a perl script to do some intelligent replacement on .java files. But this script will get quite complex if you want to handle

  • String msg = new String("Hello");
  • String msg2 = "Hello2";
  • String msg3 = new StringBuffer().append("Hello3").toString();
  • String msg4 = "Hello" + 4;
  • etc.

I think there are some paid tools that can help with this. I remember evaluating one, but I don't recall its name. I also don't remember if it could handle variable substitution in external strings. I'll try to find the info and edit this post with the details.

EDIT: The tool was Globalyzer by Lingport. The website says it supports string externalization, but not specifically how. Not sure if it supports variable substitution. There is a free trial version so you could try it out and see.

Brad C
A: 

It shouldn't be too hard to write a Python or Perl script to traverse the filestructure and replace all the strings in .java files with a pretty good success rate. The first few builds will inevitably fail, but you can look at the place where it failed and fix it, and after a few iterations of that you should be matching enough that the remaining failures can be cleaned up by hand.

If you paypal me $100 and send me your code base I'll do it for you. :)

Imagist
+1  A: 

As well as Eclipse's string externalizer, which generates properties files, Eclipse has a warning for non-externalized strings, which is helpful for finding files that you haven't internationalized.

String msg = "Hello " + name;

gives the warning "Non-externalized string literal; it should be followed by //$NON-NLS-$". For strings that truly do belong in the code you can add an annotation (@SuppressWarnings("nls")) or you can add a comment:

String msg = "Hello " + name; //$NON-NLS-1$

This is very helpful for converting a project to proper internationalization.

Mr. Shiny and New
A: 
OscarRyz
A: 

You can use "Externalize String" method from eclipse. Open your Java file in the editor and then click on "Externalize String" in the "Source" main menu. IT generates a properties file for you with all strngs you checked in the selection.

Hope this helps.

Lars
+2  A: 

What you want is a tool that replaces every expression involving string concatenations with a library call, with the obvious special case of expressions involving just a single literal string.

A program transformation system in which you can express your desired patterns can do this. Such a system accepts rules in the form of:

         lhs_pattern -> rhs_pattern  if condition ;

where patterns are code fragments with syntax-category constraints on the pattern variables. This causes the tool to look for syntax matching the lhs_pattern, and if found, replace by the rhs_pattern, where the pattern matching is over langauge structures rather than text. So it works regardless of code formatting, indentation, comments, etc.

Sketching a few rules (and oversimplifying to keep this short) following the style of your example:

  domain Java;

  nationalize_literal(s1:literal_string):
    " \s1 " -> "Language.getString1(\s1 )";

  nationalize_single_concatenation(s1:literal_string,s2:term):
    " \s1 + \s2 " -> "Language.getString1(\s1) + \s2"; 

  nationalize_double_concatenation(s1:literal_string,s2:term,s3:literal_string): 
      " \s1 + \s2 + \s3 " -> 
      "Language.getString3(\generate_template1\(\s1 + "{1}" +\s3\, s2);"
   if IsNotLiteral(s2);

The patterns are themselves enclosed in "..."; these aren't Java string literals, but rather a way of saying to the multi-computer-lingual pattern matching engine that the suff inside the "..." is (domain) Java code. Meta-stuff are marked with \, e.g., metavariables \s1, \s2, \s3 and the embedded pattern call \generate with ( and ) to denote its meta-parameter list :-}

Note the use of the syntax category constraints on the metavariables s1 and s3 to ensure matching only of string literals. What the meta variables match on the left hand side pattern, is substituted on the right hand side.

The sub-pattern generate_template is a procedure that at transformation time (e.g., when the rule fires) evaluates its known-to-be-constant first argument into the template string you suggested and inserts into your library, and returns a library string index. Note that the 1st argument to generate pattern is this example is composed entirely of literal strings concatenated.

Obviously, somebody will have to hand-process the templated strings that end up in the library to produce the foreign language equivalents.
You're right in that this may over templatize the code because some strings shouldn't be placed in the nationalized string library. To the extent that you can write programmatic checks for those cases, they can be included as conditions in the rules to prevent them from triggering. (With a little bit of effort, you could place the untransformed text into a comment, making individual transformations easier to undo later).

Realistically, I'd guess you have to code ~~100 rules like this to cover the combinatorics and special cases of interests. The payoff is that the your code gets automatically enhanced. If done right, you could apply this transformation to your code repeatedly as your code goes through multiple releases; it would leave previously nationalized expressions alone and just revise the new ones inserted by the happy-go-lucky programmers.

A system which can do this is the DMS Software Reengineering Toolkit. DMS can parse/pattern match/transform/prettyprint many langauges, include Java and C#.

Ira Baxter
A: 

since everyone is weighing in an IDE i guess i'd better stand up for Netbeans :)

Tools-->Internationalisation-->Internationalisation Wizard

very handy..

Nico
+1  A: 

Globalyzer has extensive capabilities to detect, manage and externalize strings and speeds up the work dramatically over looking at strings and externalizing one by one. You can filter the strings as well see them in context and then either externalize one by one, or in batches. It works for a wide variety of programming languages and resource types, of course including Java. Plus Globalyzer finds much more than embedded strings for your internationalization projects. You can read more at http://lingoport.com/globalyzer and there's links there to sign up for a demo account. Globalyzer was first built for performing big internationalization service projects and then over the years it's grown in to a full scale enterprise tool for making sure development gets and stays internationalized.

Adam