views:

347

answers:

6

I try to externalize all strings (and other constants) used in any application I write, for many reasons that are probably second-nature to most stack-overflowers, but one thing I would like to have is the ability to automate spell checking of any user-visible strings. This poses a couple problems:

  • Not all strings are user-visible, and it's non-trivial to spearate them, and keep that separation in place (but it is possible)
  • Most, if not all, string externalization methods I've used involve significant text that will not pass a spell checker such as aspell/ispell (eg: theStrName="some string." and comments)
  • Many spellcheckers (once again, aspell/ispell) don't handle many words out of the box (generally technical terms, proper nouns, or just 'new' terminology, like metadata).

How do you incorporate something like this into your build procedures/test suites? It is not feasible to have someone manually spell check all the strings in an application each time they are changed -- and there is no chance that they will all be spelled correctly the first time ;)

+1  A: 

We do it manually, if errors aren't picked up during testing then they're picked up by the QA team, or during localization by the translators, or during localization QA. Then we lodge a bug.

Most of our developers are not native English speakers, so it's not an uncommon problem for us. The number that slip through the cracks is so small that this is a satisfactory solution for us.

Nothing over a few hundred lines is ever 100% bug-free (well... maybe the odd piece of embedded code), just think of spelling mistakes as bugs and don't waste too much time on it.

As soon as your application matures, over 90% of strings won't change between releases and it would be a reasonably trivial exercise to compare two versions of your resources, figure out what'ts new (check them first), what's changed/updated (check next) and what hasn't changed (no need to check these)

So think of it more like I need to check ALL of these manually the first time, and I'm only going to have to check 10% of them next time. Now ask yourself if you still really need to automate spell checking.

sascha
A: 

Use aspell. It's a programme, it's available for unixoids and cygwin, it can be run over lots of kinds of source code. Use it.

Marcin
-1, he specifically called out aspell as being a problem in the question
slf
A: 

First point, please don't put it into you build process. I would be a vengeful coder if I (meaning my computer) had to spell check all the content on the site every time I tried to debug or build a new feature. I don't even think this kind of operation belongs as a unit test (you're testing a human interface, not a computerised one).

Second point, don't write a script. You're going to have so many false positives fall through the cracks that people will stop reading the reports and you are no better off than when you started.

Third point, this is probably most easily solved by having humans do it: QA team, copy writers, beta testers, translators, etc. All the big sites with internationalised content that I've built had the same process: we took the copy from the copy writers, sent it to the translating service/agency, put it into the persistence layer, and deployed it. Testers (QA, developers, PMs, designers, etc.) would find spelling or grammatical mistakes and lodge bug reports. There is just too much red tape and pairs of eyes for that many spelling/grammar errors to slip through.

Fourth point, there will always be spelling and grammar mistakes on your page. Even major newspaper web sites haven't gotten around this and they have whole office buildings filled with editors.

Sam Bisbee
+1  A: 

I can think of two ways to approach this semi-automatically:

Have the compiler help you differentiate between strings used in the UI and strings used elsewhere. Overload different variants of the string datatype depending on it's purpose, and overload the output methods to only accept that type - that way you can create a fake UI that just outputs the UI strings, and do the spell checking on that.

If this is doable of course depends on the platform and the overall architecture of the application.

Another approach could be to simply update the spell checkers database with all the strings that appear in the code - comments, xpaths, table names, you name it - and regard them as perfectly cromulent. This will of course reduce the precision of the spell checking.

Anders Lindahl
A: 

First thing, regarding string externalization - GNU GetText (if used properly) creates string files that are contain almost no text other then the actual content of the strings (there are some headers but its easy to cause a spell checker to ignore them).

Second thing, what I would do is to run the spell checker in a continuous integration environment and have the errors fed externally, probably through a web interface but email will also work. Developers can then review the errors and either fix them in the code or use some easy interface to let the spell check know that a misspelling should be ignored (a web interface can integrate both the error view and the spell checker interface).

Guss
+1  A: 

If you're using java and are storing your localized strings in resource bundles then you could check the Bundle.properties files and validate the bundle strings. You could also add a special comment annotation that your processor could use to determine if an entry should be skipped.

This method will allow you to give a hint as to the locale and provide a way of checking multiple languages within the one build process.

I can't answer how you would perform the actual spell checking itself, though I think what I've presented will guid you as for the method of performing the spell checking.

I hope this helps.

Brett Ryan