views:

190

answers:

3

The TextWrapping feature in Silverlight is described here. However, I'm trying to find out the details of their line-breaking algorithm.

Obviously a space will cause the text to break (or split), encouraging the text to wrap to the next line. And through trial and error I've found that these characters also cause a split:

  • \t (tab)
  • -
  • !
  • ?

But I doubt this is the full list. Has anyone found the full list of split characters (including Unicode)? Or do you have any clever suggestions for figuring this out that I haven't thought of yet? Trial and error can be slow.

+2  A: 

I'd guess every character that qualifies as space or punctuation, excluding the explicitly non-breaking characters.

However, there is a specified algorithm for line breaking: Unicode Standard Annex #14: Unicode Line Breaking Algorithm.

Joey
+4  A: 

TextWrapping = Wrap will attempt to conform to the the standard Unicode Standard Annex #14 that Johannes has already linked.

However if using this approach the text still doesn't fit the width (due to very limited width or long words) then it'll break the word across two lines, it doesn't hyphnate or do anything clever. As soon a placing a letter would overrun the width and if it can't find something in the line that'll let it use the standard algorithm it'll continue the word on the next line.

AnthonyWJones
+1  A: 

I wrote a little test app to help me determine which of these characters cause a split. It's still a manual process but at least it's easier to look at now. The script started at ASCII code 1 and I've gone up to 3000.

These cause a split after the character:

Private arrSplitAfter() As Char = {CChar(" "), CChar("-"), ChrW(9), CChar("!"), CChar("?"), CChar("%"), CChar(")"), CChar("/"), CChar("]"), CChar("|"), CChar("}"), ChrW(133), ChrW(162), ChrW(176), ChrW(1418), ChrW(1478), ChrW(1547), ChrW(1548), ChrW(1563), ChrW(1566), ChrW(1567), ChrW(1642), ChrW(1748), ChrW(2404), ChrW(2405)}

And these cause a split before the character:

Private arrSplitBefore() As Char = {CChar("$"), CChar("("), CChar("+"), CChar("["), CChar("\"), CChar("{"), ChrW(163), ChrW(164), ChrW(165), ChrW(177), ChrW(180), ChrW(712), ChrW(716), ChrW(2546), ChrW(2547), ChrW(2801)}

Obviously there are a lot more characters to go. Unfortunately I've run into a hiccup. I was trying to write some super efficient highlighting code that would work on word-wrapped text. Until I can figure out a solution for my highlighting issues I won't bother continuing.

Steve Wortham