views:

295

answers:

3

I hoping for some pointers on the quickest way to do a replace when the document that needs bits replacing is constant (a sort of mailmerge scenario).

Of course there are lots of ways of doing replaces using string.replace and regexp but it looks like they need to parse the input document each time looking for the match. That's the bit I'm trying to optimise.

+2  A: 

I'd say your best bet would probably be to split the document into an array with each element being the text that's in between the previous replacement and the next. Then instead of replacing, you simply interleave the contents of your split array with each of the replacement tokens using string concatenation.

Some pseudocode:

doc_array = split(input_doc, "token marker")

for each replace_array in set_of_replace_arrays:
    this_doc = ""

    while elements remain in doc array:
        this_doc.concat(next doc element)

        if any elements remain in replace array:
            this_doc.concat(next replace element)

    output this_doc
Amber
Well, maybe replace "string concatenation" with "StringBuilder" ;-p
Marc Gravell
Right. I use the term string concatenation in a loose sense, in that in some way or another the result is going to be a concatenation of the substrings. The pseduocode uses concats simply because it's more language-agnostic that way, but C#-specific StringBuilder would be more efficient, or the equivalent in other languages if available.
Amber
Thanks, that got me thinking along the right lines.The fastest way I could do this was to create a fixed array of strings, (string[] s = new string [10]), populate the fixed sections (s[1] = "Dear Mr. ") and then loop through our variables swapping in at the correct points in the array (e.g. s[2] = firstName), then Join the whole thing up (return string.Join(string.Empty, s)). Superfast!The StringBuilder was a non-starter because in c# you can't overwrite the various strings. What I didn't want to do is rebuild the whole string each time.Thanks.
Ali Starfish
A: 

Well, as you don't want to parse and your input document is constant, you could use a MemoryStream to handle your original document and change your bits by using their absolute position.

Another way could be use that String.Format markers as placeholders:

string input = "Dear {0} {1}";
//...
return String.Format(input, "Mr.", "Farias");
Rubens Farias
I think the MemoryStream idea would only work if your variables were always the same length?The string.Format is very clean, but I wonder how optimized it is? I suspect that each time it is run, it has to search the entire text for the {0} and the {1} bit, which is what I was trying to avoid.
Ali Starfish
MemoryStream: yes, same length, but you can do something like `{0 }` (note whitespaces); probably using any technique you'll need somehow parse input template. I recommend you to create some performance tests, so you can compare each approach fairly.
Rubens Farias
A: 

For increased flexibility, you could use a XslCompiledTransform and have it output text. It's optimized for fast XML and text generation, and you could include some logic too if required.

Lucero