views:

291

answers:

2

I need good word-wrapping handling for Java. Not too difficult, except for one wrinkle: since I'm working on an internationalized application, it needs to handle Chinese, Japanese and Korean text properly. In those languages, word wrapping occurs between characters, since the characters themselves are words and there are no spaces. Not only that, but since that text may include foreign words rendered with Latin characters, those words must be treated specially and not broken between characters like the rest of the text. Wrapping needs to be supported for both text and graphics context (coordinates expressed in either character or pixel units).

Is there an existing package that does this? I haven't seen one. If not, can anyone show me a good algorithm for handling this scenario? The code would have access to a Locale object corresponding to the language of the text to be wrapped, if needed. A greedy algorithm (each line takes as much text as possible) is fine.

+1  A: 

It appears the the ICU4J library may do what you need. See boundary analysis. The examples given are for ICU4C, and are therefore in C/C++, but should work from the Java package as well.

Matthew Talbert
+3  A: 

BreakIterator should help here with breaking character sequences into words. If this is insufficient, I'd check the ICU project to see if it had something better (some of the Java implementation comes from there). Graphics handling is going to be dependent on your GUI library, but the AWT/Swing Font API has support for determining line metrics. (If you didn't have 'Locale' instances, you could probably do something heuristically using Unicode blocks.)

McDowell