views:

195

answers:

4

I've written a Windows program in Delphi that places and wraps text very precisely to both the screen and printer using GetCharWidth and Em-Square. This has worked well with ANSI text where you only need to retrieve and calculate the widths of 255 characters but when you go to Unicode with 65535 characters its too slow. The problem is made worse by having to create 2 arrays of width, one for normal and one for bold.

 //Setup a reference canvas for measuring purposes
  RefDC := CreateCompatibleDC ( FCanvas.Handle ) ;
  DPI := GetDeviceCaps ( RefDC , LOGPIXELSY ) ;

  //find EmSquare
  GetOutlineTextMetrics ( RefDC , sizeof(otm) , @otm[0] ) ;
  EmSq := otm[0].otmEmSquare ;

  //calc NORMAL char sizes
  GetObject ( FCanvas.Font.Handle , SizeOf ( lf ) , @lf ) ;

  lf.lfHeight := -EmSq ;
  lf.lfWidth  := 0 ;
  lf.lfWeight   := FW_NORMAL ;

  hf := CreateFontIndirect ( lf ) ;
  hold := SelectObject ( RefDC , hf ) ;

  GetCharWidth ( RefDC , 0 , $FFFF , nCharWidth ) ;
  for a := 0 to $FFFF do
    fCharWidth[a] := nCharWidth[a]* PixelSize / EmSq ;

  SelectObject ( RefDC , hold ) ;
  DeleteObject ( hf ) ;

  //calculate line height
  PixelSize := abs ( fCanvas.Font.Size * DPI / 72 ) ;
  GetOutlineTextMetrics ( RefDC , sizeof(otm) , @otm[0] ) ;
  LineHt := round ( ( otm[0].otmTextMetrics.tmHeight +
                      otm[0].otmTextMetrics.tmExternalLeading ) *
                      PixelSize / EmSq ) ;

  //calculate Bold char sizes
  lf.lfWeight := FW_BOLD ;
  hf := CreateFontIndirect ( lf ) ;
  hold := SelectObject ( RefDC , hf ) ;

  GetCharWidth ( RefDC , 0 , $FFFF , nCharWidth ) ;
  for a := 0 to $FFFF do
    fBoldWidth[a] := nCharWidth[a] * PixelSize / EmSq ;

  SelectObject ( RefDC , hold ) ;
  DeleteObject ( hf ) ;

  DeleteDC ( RefDC ) ;`
A: 

Have you used a profiler to actually see where the bottleneck might be?
One common idea when using lookup tables and they seem too costly to build dynamically is to build them once and store them as a resource for instance...

François
There's about a 7 second initialization hit just in this routine. It could be optimized but probably wouldn't gain much. Calling it once would be acceptable but I actually will need to call it multiple times for other text blocks that use a different font and/or text size.
Mitch
+1  A: 

Apart from questioning the premise of the question itself, you could cut the processing significantly I think by obtaining the two nCharWidth arrays using separate arrays and font objects etc and processing them together, reducing your two 0..65535 loops to a single loop.

Also, you can exclude the range $D800-$DFFF from your loops since these can never represent characters on their own (being the first of a surrogate pair, which your code doesn't appear to be designed to handle).

Deltics
+6  A: 

Calculating individual character widths and adding them up in Unicode is not only very slow, but it's wrong and will not work correctly. Unicode combines character marks together, sometimes in complex ways.

With Unicode, the correct way to do it is to pass your whole string to the windows API function GetTextExtentExPoint along with the width of your line, and it will figure out how many characters will fit on the line for you.

There is an example of its use here.

lkessler
Thanks, this looks promising but I assume that like other Windows Text Extent functions it won't yield the same results when I scale the graphic up to printer resolution. In addition I'll need a way to break up sections with mixed styles (Bold, Italic, etc.). I'll give it a try.
Mitch
This is more than just promising. It is the way Windows does WYSIWYG text wrapping. It is the API everything calls: IE, Word, you name it.
lkessler
That's what I wanted to hear!
Mitch
+2  A: 

I take it, it is unlikely to use thousands of characters in a typical session. If so, in the first round calculate only the first 128 characters width, and put f.i. -1 to all the rest. When a lookup is made test if width is -1, if it is only then calculate the width, height etc. for that character.

Sertac Akyuz
Basically only calculate what you need to know the first time you need to know it.
Gerry
@Gerry - Can be.. But Mitch already said calculating for 255 characters is not a problem, so the ascii range should not pose a performance problem either.
Sertac Akyuz
I thought about only doing the ASCII range and then doing characters outside of the range only when needed (rarely) but that seems to penalize anyone using non-ASCII characters with a big performance hit.
Mitch
@Mitch - I'm a bit surprised to learn that getting the width etc. of just one character is a big performance hit, but anyway, so it is..
Sertac Akyuz
@Sertac - Anyone who goes outside the ASCII range is going to use lots of characters that haven't been pre-calculated. I haven't tested the speed of doing 1 character at a time but I suspect it will be slow. And yes I also considered scanning the text to see what characters are being used. I may come back to that if I other solutions don't work.
Mitch