views:

102

answers:

6

Hi,

In my .NET program I allow a user to define "fields" which are values calculated by the business logic. These fields have a position and length, so that they can all be inserted into a single output string at a given index. I also allow a user to specify default content of this output string. If no field is defined to replace a given position, the default character is output instead

My question is, how can I do this efficiently? The StringBuilder class has an Insert(int index, string value) method, but this lengthens the output string each time rather than overwriting it. Am I going to have to set each char one at a time using the StringBuilder[int index] indexer, and is this inefficient? Since I am going to be doing this a lot of times I would like it to be as fast as possible.

Thanks.

+1  A: 

As long, as strings are immuteble, each manipulation with it, will cause GC load, even StringBuilder insert/remove calls. I would cut source string by insertion points, and then "zip" it with data, that need to be inserted. After that you can just concat strings inside list, to get resulting string.

Here is a sample code that do split/zip operaions. It assumes, that Fields are defined as touple of (position, length, value).

public class Field
{
    public int pos { get; set; }
    public int len { get; set; }
    public string value { get; set; }
    public string tag { get; set; }
}

class Program
{
    static void Main(string[] args)
    {
        var source = "You'r order price [price] and qty [qty].";
        var fields = new List<Field>();
        fields.Add(new Field()
        {
            pos = 18, 
            len = 7, 
            value = "15.99$",
            tag = "price"
        });
        fields.Add(new Field()
        {
            pos = 37-3,
            len = 5,
            value = "7",
            tag = "qty"
        });
        Console.WriteLine(Zip(Split(source, fields), fields));
        Console.WriteLine(ReplaceRegex(source, fields));

    }

    static IEnumerable<string> Split(string source, IEnumerable<Field> fields)
    {
        var index = 0;
        foreach (var field in fields.OrderBy(q => q.pos))
        {
            yield return source.Substring(index, field.pos - index);
            index = field.pos + field.len;
        }
        yield return source.Substring(index, source.Length - index);
    }
    static string Zip(IEnumerable<string> splitted, IEnumerable<Field> fields)
    {
        var items = splitted.Zip(fields, (l, r) => new string[] { l, r.value }).SelectMany(q => q).ToList();
        items.Add(splitted.Last());
        return string.Concat(items);
    }
    static string ReplaceRegex(string source, IEnumerable<Field> fields)
    {
        var fieldsDict = fields.ToDictionary(q => q.tag);
        var re = new Regex(@"\[(\w+)\]");
        return re.Replace(source, new MatchEvaluator((m) => fieldsDict[m.Groups[1].Value].value));
    }
}

BTW, would be better to replace special user markers, like [price], [qty] using regex?

Valera Kolupaev
StringBuilders, unlike normal strings, are not immutable.
Alex Zylman
But they operate on array of bytes inside, and insertion in the middel of the array, will cause memory reallocation.
Valera Kolupaev
@Valera : +1 for an interesting solution. But it would be too much work to change my existing code to work like this, especially as I have a tight deadline.
James
+2  A: 

The StringBuilder class lets you build a mutable string. Try using the Remove function before doing the Insert. Since its randomly accessible, it should be very quick. As long as the StringBuilder keeps the same capacity, it won't be taking time copying strings around in memory. If you know the string will become longer, try setting the capacity to be larger when you call New StringBuilder()

Justin
Using `Remove` and `Insert` *will* involve moving memory around.
LukeH
Also, if fields is defined as (position, length) touple, you need to do some math, if inserted text is shorter or longer than text, it substitutes.
Valera Kolupaev
+1 Valera thanks I didn't even think to mention that
Justin
+1 I had not thought to use the Remove method in conjunction with the Insert method. It is at least an option. Thanks.
James
A: 

I would recommend using the StringBuilder class. However you can do it with a string but there can be side effects. Here are a couple blog posts that show how to manipulate strings and the possible side effects.

http://philosopherdeveloper.wordpress.com/2010/05/28/are-strings-really-immutable-in-net/

http://philosopherdeveloper.wordpress.com/2010/06/13/string-manipulation-in-net-epilogue-plus-new-theme/

Jerod Houghtelling
A: 

If your string is already pre formated for the length then the StringBuilder class has

public StringBuilder Replace(string oldValue, string newValue, int startIndex, int count)

just set your start index and count = 1 so you can replace that specific instance.

Another thing you could do is use String.Format(). Convert all your pre defined fields into indexes so you get a string like "This {0} is very {1}" and then just match up the parameters to the specific index and do a String.Format(myString, myParams);

-Raul

HaxElit
+1  A: 

Doing it one character at a time is likely your best bet. I say this because calling Insert and Remove on a StringBuilder results in characters being shifted right/left, just as the analogous methods would in any mutable indexed collection such as a List<char>.

That said, this is an excellent candidate for an extension method to make your life a bit easier.

public static StringBuilder ReplaceSubstring(this StringBuilder stringBuilder, int index, string replacement)
{
    if (index + replacement.Length > stringBuilder.Length)
    {
        // You could throw an exception here, or you could just
        // append to the end of the StringBuilder -- up to you.
        throw new ArgumentOutOfRangeException();
    }

    for (int i = 0; i < replacement.Length; ++i)
    {
        stringBuilder[index + i] = replacement[i];
    }

    return stringBuilder;
}

Usage example:

var builder = new StringBuilder("My name is Dan.");
builder.ReplaceSubstring(11, "Bob");

Console.WriteLine(builder.ToString());

Output:

My name is Bob.
Dan Tao
A: 

If replacing substrings is going to be a big bottleneck, you may want to ditch the substrings thing altogether. Instead, break up your data into strings that can be independently modified. Something like the following:

class DataLine
{
    public string Field1;
    public string Field2;
    public string Field3;

    public string OutputDataLine()
    {
        return Field1 + Field2 + Field3;
    }
}

That's a simple static example, but I'm sure that could be made more generic so that if every user defines fields differently you could handle it. After breaking your data into fields, if you still need to modify individual characters in the fields at least you're not touching the whole set of data.

Now, this may push the bottle neck to the OutputDataLine function, depending on what you're doing with the data. But that can be handled separately if necessary.

Joe