views:

366

answers:

5

Say I have an array of Strings declared like this:

String[] strings = new String[ 1024 ];

And given an array of key value pairs declared like this:

KeyValuePair<String, String>[] pairs = new KeyValuePair<String, String>[ 50 ];

What are the possible workarounds for a solution that would do this:

for ( int i = 0; i < 1024; ++i )
{
  foreach ( var kv in pairs )
  {
      strings[ i ] = strings[ i ].Replace( kv.Key, kv.Value );
  }
}

This code is only arbitrary and just for showing what the actual problem is. Given a lot of key values, which are known at compile time, how can I do an efficient String.Replace, and possibly reduce the method calls and copying the String around ( each call to String.Replace will produce a new immutable String, not very efficient when having a lot of those replace calls )?

+2  A: 

I'd say dump the strings into a List and then with each of the stringbuilders perform the replace calls. That will save on the creation of extra immutable string objects. Most likely a little overhead if you're only going to have a small number of replacements, but since you've stated there will be a lot of them then this should help.

Agent_9191
StringBuilder is good suggestion. I'll benchmark it against regex and see which comes better. I'll let you know tomorrow.
Ivan Zlatanov
+2  A: 

It doesn't help with the number of loops, but if you use a StringBuilder as an intermediate it has a .Replace call with the same set of parameter signatures.

Edit:

Not sure if it's faster, but you can use Regex.Replace with an evaluator delegate.

If you build a search regex with your keys: (key1|key2|key3|key4...)

and then pass in the delegate to .Replace, you can return a lookup based on the Match's Value property.

  public string ReplaceData(Match m)
  {
      return pairs[m.Value];         
  }

...

  pairs.Add("foo","bar");
  pairs.Add("homer","simpson");
  Regex r = new Regex("(?>foo|homer)");
  MatchEvaluator myEval = new MatchEvaluator(class.ReplaceData);
  string sOutput = r.Replace(sInput, myEval);
Joe
You could speed this up further by using an atomic group: `(?>foo|homer)` and then by making sure the keys are in alphabetical order.
Tinister
@Tinister - yes, thanks. Edited.
Joe
I'll test this and the StringBuilder suggestion and see which performs better. I'll let you know tomorrow.
Ivan Zlatanov
A: 

This should do less string manipulation by selecting the values

 String[] strings = new String[1024];
 KeyValuePair<String, String>[] pairs = new KeyValuePair<String, String>[ 50 ];

 String[] replaced = strings.Select(x => 
                                         pairs.Any( y => y.Key == x ) ? 
                                         pairs.Where( z => z.Key == x).Select( val =>  val.Value).First() :  x )
                            .ToArray();
Stan R.
A: 

I think what Joe and Agent_9191 are suggesting is essentially the same:

const int NUM_STRINGS = 1024;
string[] strings = new string[NUM_STRINGS];
StringBuilder[] stringBuilders = new StringBuilder[NUM_STRINGS];

// ensure that all StringBuilder objects are initialized
for (int i = 0; i < NUM_STRINGS; i++) {
    stringBuilders[i] = new StringBuilder(strings[i]);
}

KeyValuePair<string, string>[] pairs = new KeyValuePair<string, string>[50];

for (int i = 0; i < NUM_STRINGS; i++) {
    foreach (var kv in pairs) {
         // StringBuilder is mutable;
         // this actually modifies the instance
         stringBuilders[i].Replace(kv.Key, kv.Value);
    }

    strings[i] = stringBuilders[i].ToString();
}
Dan Tao
+1  A: 

You can use the advices received (use StringBuilder for example) and with Parallel extensions, use how many cores you have in your machine to do the work in parallel.

Look at this code:

class Program {

    static void Main(String[] args) {

        // Filling the data
        List<KeyValuePair<String, String>> map = new List<KeyValuePair<String, String>>();
        List<StringBuilder> strings = new List<StringBuilder>();
        List<StringBuilder> strings2 = new List<StringBuilder>();

        for (Int32 i = 0; i < 50; i++) {
            String key = String.Format("[KEY{0}]", i);
            String value = String.Format("Text of KEY{0}", i);
            KeyValuePair<String, String> keyValuePair = new KeyValuePair<String, String>(key, value);
            map.Add(keyValuePair);
        }

        for (Int32 i = 0; i < 1024; i++) {
            StringBuilder text = new StringBuilder();

            foreach (KeyValuePair<String, String> keyValuePair in map) {
                text.AppendFormat("Some text before - {0} - Some text after.", keyValuePair.Key);
                text.AppendLine();
            }

            strings.Add(text);
            strings2.Add(text);
        }


        // Measuring the normal loop
        Stopwatch stopwatch = new Stopwatch();
        stopwatch.Start();

        foreach (StringBuilder text in strings) {
            foreach (KeyValuePair<String, String> eachMap in map) {
                text.Replace(eachMap.Key, eachMap.Value);
            }
        }

        stopwatch.Stop();

        Console.WriteLine("Time with normal loop: {0}", stopwatch.Elapsed);


        // Measuring the parallel loop
        stopwatch.Reset();
        stopwatch.Start();

        Parallel.ForEach(strings2, text => {
            foreach (KeyValuePair<String, String> eachMap in map) {
                text.Replace(eachMap.Key, eachMap.Value);
            }
        });

        stopwatch.Stop();

        Console.WriteLine("Time with parallel: {0}", stopwatch.Elapsed);
        Console.ReadLine();
    }
}

And look at some measures running on my nootebook (AMD Turion64 X2 - 2 cores):



Time with normal loop: 00:00:03.5956428
Time with parallel: 00:00:01.8707367

Time with normal loop: 00:00:02.1467821
Time with parallel: 00:00:01.4627365

Time with normal loop: 00:00:03.4123084
Time with parallel: 00:00:01.6704408



Hope this helps.

Ricardo Lacerda Castelo Branco

Ricardo Lacerda Castelo Branco