Static Analysis tool to detect Internationalization issues

views:

246

answers:

+1 Q:

Static Analysis tool to detect Internationalization issues

Are there any tools (free/commercial) that can audit an application for internationalization? (or localization-readiness, if you prefer)

Primarily interested in:

Mulitlingual Implementation tests

    Examples:  
    * [javascript] alert('Oops wrong choice!');  
    * [java] String msg = resourcebundle.getString("key.x").concat("4");  
    * [jdbc] String query=".. order by abc"; //should be NLS_SORT or equiv.

Date Implementation tests

    Examples:  
    * SimpleDateFormat used without Locale  
    * Apache's DateFormatUtils used

Numeric Implementation tests

    Examples:
    * NumberFormat used without Locale

javascript-validation tests

    Examples:
    * [javascript] checkIsDecimal { //decimal point checked against "." }  
    * [javascript] hardcoded character range [A-z]

Cheers.

I had studied IntelliJ IDEA's code analyzers, and it does have those that you requested. It's a commercial IDE, specialized in java, but knows other languages as well.

http://www.jetbrains.com/idea/

Mercer Traieste 2009-07-07 06:13:31

Maybe I'm missing something: http://www.jetbrains.com/idea/documentation/inspections.jsp doesnt do points 3 and 4.

Ryan Fernandes 2009-07-07 07:44:45

+1 A:

Based on your examples, you mostly want to diagnose functions that produce output, whose input isn't somehow internationalized.

So for the alert case, you want to find any print call that acquires a string that is not produced by one of possibly several well-know translation routines.

For the jdbc case, you want to identify ordering constraints that are not locale specific.

For the various date cases, you want date routines that are known to produce locale-specific answers.

The javascript validation is harder to guess at intent; presumaly you want to diagnose functions that are known to be wired to a particular locale; this seems a lot like the date case. For range checks, you want capture anything that compares a character to another for less or greater than.

For the wired-locale functions, it seems just knowing their name would be enough (although perhaps there has to be some overload resolution, e.g., by number of arguments), so NumberFormat(?,?) is bad, and NumberFormat(?,?,?) is OK.

Why can't you write a regular expression to look (hueristically) for the bad cases?

For the range case, you just need to recognize expressions of the form of [exp] < [literal-char] or [exp] < [literal-string]. A regexp to look for just "< '.+" would seem adequate.

Are there common cases that these would miss?

EDIT (from comment below: "I've been using regexp but...") If you want a tool that is deeper than regexp, you pretty much have to go to language parsing, name/type resolution, and having data flow analysis would be helpful. Since you want to process multiple (computer) languages, the tool has to be multi-lingual capable. And it appears you want to be able to customize it to check for the specific cases relevant to your application.

The DMS Software Reengineering Toolkit has all these properties, including parsers for Java, JavaScript and SQL. It is designed to be customized, so you have to do that in advance of using it.

Ira Baxter 2009-07-23 06:11:31

Unfortunately, I have been using regex to look for these bad cases but would love a pmd,findbugs-type of software. Makes for good continuous integration component

Ryan Fernandes 2009-07-23 09:07:25

Can you show a specific case that regexp does poorly? I've added the description of a much more sophisticated analysis engine, but you don't really want to go there if regexps are good enough.

Ira Baxter 2009-07-23 17:12:12

Have a look at Globalyzer - http://lingoport.com/globalyzer - as it is just that, a tool for performing static analysis on code specifically for internationalization. It works with a variety of programming languages too. Supports detection and correction for embedded strings (string externalization capabilities too), potential locale-limiting methods/functions/classes depending upon the programming language and requirements, as well as other issues like programming patterns and embedded images. There are default "rule sets" which get you a good start, and then you can customize your rules for both detection and filtering of issues. Plus there's an underlying database that helps you tag or keep track of i18n issues as you work with them. There's a server component, where you create and share your rule sets with your team members, then desktop and command line clients which run locally on your machine to analyze your source, so you're not sending any code or reporting off your local machine.

Adam 2009-12-09 23:08:45

Interesting.. Can you confirm with your team (based on your profile, I'm assuming you are from lingoport) that it will catch all the problems I've listed in my question. I will accept your answer post a confirmation.

Ryan Fernandes 2009-12-11 03:55:42

ansaurus

tags:

views:

answers:

Static Analysis tool to detect Internationalization issues

related questions