views:

380

answers:

5

i have a function who's job is to convert an ADO Recordset into html:

class function RecordsetToHtml(const rs: _Recordset): WideString;

And the guts of the function involves a lot of wide string concatenation:

   while not rs.EOF do
   begin
      Result := Result+CRLF+
         '<TR>';

      for i := 0 to rs.Fields.Count-1 do
         Result := Result+'<TD>'+VarAsWideString(rs.Fields[i].Value)+'</TD>';

      Result := Result+'</TR>';
      rs.MoveNext;
    end;

With a few thousand results, the function takes, what any user would feel, is too long to run. The Delphi Sampling Profiler shows that 99.3% of the time is spent in widestring concatenation (@WStrCatN and @WstrCat).

Can anyone think of a way to improve widestring concatenation? i don't think Delphi 5 has any kind of string builder. And Format doesn't support Unicode.


And to make sure nobody tries to weasel out: pretend you are implementing the interface:

IRecordsetToHtml = interface(IUnknown)
    function RecordsetToHtml(const rs: _Recordset): WideString;
end;

Update One

I thought of using an IXMLDOMDocument, to build up the HTML as xml. But then i realized that the final HTML would be xhtml and not html - a subtle, but important, difference.

Update Two

Microsoft knowledge base article: How To Improve String Concatenation Performance

+1  A: 

Hi,

Yup, your algorithm is clearly in O(n^2).

Instead of returning a string, try returning a TStringList, and replace your loop with

   while not rs.EOF do
   begin
      Result.Add('<TR>');

      for i := 0 to rs.Fields.Count-1 do
         Result.Add( '<TD>'+VarAsString(rs.Fields[i].Value)+'</TD>' );

      Result := Result.Add('</TR>');
      rs.MoveNext;
    end;

You can then save your Result using TStringList.SaveToFile

LeGEC
Stringlist doesn't support WideStrings
Ian Boyd
A: 

Widestring is not reference counted, any modification means a string manipulation. If your content is not unicode encoded, you can internally use the native string (reference counted) to concatenate string and then convert it to a Widestring. Example is as follows:

var
  NativeString: string;
begin
   // ...
   NativeString := '';

   while not rs.EOF do
   begin
     NativeString := NativeString + CRLF + '<TR>';

     for i := 0 to rs.Fields.Count-1 do
       NativeString := NativeString + '<TD>'+VarAsString(rs.Fields[i].Value) + '</TD>';

     NativeString := NativeString + '</TR>';
     rs.MoveNext;
   end;

   Result := WideString(NativeString);

I have also seen another approach: Encode Unicode to UTF8String (as reference counted), concatenate them and finally convert UTF8String to Widestring. But I am not sure, if two UTF8String can be concatenated directly. The time on encoding should also be considered.

Anyway, although Widestring concatenation is much slower than native string operations. But it is IMO still acceptable. Too much tuning on such kind of thing should be avoided. Seriously considering of performance, you should then upgrade your Delphi to at least 2009. The costs on buying a tool is for long-term cheaper than doing heavy hacks on an old Delphi.

stanleyxu2005
i don't have a few tens of thousands of dollars at my disposal for all the costs of upgrading Delphi. (Also, i can't stand the IDE, or its performance. And we already transitioning to a new IDE, it's Microsoft's free C# based IDE)
Ian Boyd
And the notion of "too much tuning" or "early optimization is the root of all evil" doesn't apply here. i couldn't show a screenshot at the time, but profiling pinpointed the exact bottleneck in something that even end users complained about. If it hadn't had been so slow i wouldn't have cared. This wasn't some flight of fancy.
Ian Boyd
+1  A: 

WideString are inherently slow because they were implemented for COM compatibility and go through COM calls. If you look at the code, it will keep on reallocating the string and call SysAllocStringLen() & C which are APIs from oleaut32.dll. It doesn't use the Delphi memory manager but AFAIK it uses the COM memory manager. Because most HTML pages don't use UTF-16, you may get better result using the native Delphi string type and a string list, although you should be careful about conversion from UTF and the actual codepage, and the conversion will downgrade performance as well. Also you're using a VarAsString() function that probably converts a variant to an AnsiString then converted to a WideString. Check if your version of Delphi has a VarAsWideString() or something alike function to avoid it, or rely on Delphi automatic conversion if you could be sure your variant will never be NULL.

ldsandon
`SysAllocString` isn't inherently slower than allocating memory from Delphi's internal heap.
Ian Boyd
Going through the COM APIs IMHO may be a little slower than in process memory block allocations from the memory manager pool. I have no benchmarks, though.
ldsandon
+1  A: 

i found the best solution. The open source HtmlParser for Delphi, has a helper TStringBuilder class. It is internally used to build what he calls DomStrings, which is actually an alias of WideString:

TDomString = WideString;

With a little bit of fiddling of his class:

TStringBuilder = class
public
   constructor Create(ACapacity: Integer);
   function EndWithWhiteSpace: Boolean;
   function TailMatch(const Tail: WideString): Boolean;
   function ToString: WideString;
   procedure AppendText(const TextStr: WideString);
   procedure Append(const value: WideString);
   procedure AppendLine(const value: WideString);
   property Length: Integer read FLength;
end;

The guts of the routine becomes:

while not rs.EOF do
begin
   sb.Append('<TR>');

   for i := 0 to rs.Fields.Count-1 do
      sb.Append('<TD>'+VarAsWideString(rs.Fields[i].Value));

   sb.AppendLine('</TR>');

   rs.MoveNext;
end;

The code then feels to run infinitely afaster. Profiling shows much improvement; the WideString manipulation and length-counting became negligible. In its place was FastMM's own internal operations.

Notes

  1. Nice catch on the mistaken forcing of all strings into current code-page (VarAsString rather than VarAsWideString)
  2. Some HTML closing tags are optional; omitted ones that logically make no sense.
Ian Boyd
+1  A: 

I'm unable to spend the time right now to give you the exact code.

But I think the fastest thing you can do is:

  1. Loop through all the strings and total their length also adding for the extra table tags you'll need.

  2. Use SetString to allocate one string of the proper length.

  3. Loop through all the strings again and use the "Move" procedure to copy to the string to the proper place in the final string.

The key thing is that many concatenations to a string take longer and longer because of the constant allocating and freeing of memory. A single allocation will be your biggest timesaver.

lkessler
The ideal solution, a *string builder*, uses these concepts.
Ian Boyd