views:

330

answers:

4
using System;
using System.IO;
using System.Reflection;
using System.Text.RegularExpressions;

namespace regex
{
  class MainClass
  {
    public static void Main(string[] args)
    {
      Regex exp = new Regex(@"e(-)?m[a@]il(s)?|input|output|padr(ão|ões)|máquina(s)?|reconhecimento",
                            RegexOptions.IgnoreCase | RegexOptions.Compiled |
                            RegexOptions.Multiline  | RegexOptions.ExplicitCapture);

      for (int filecount = 0 ; filecount < 22 ; filecount++)
      {
        string file = "/home/files/file"+ string.Format("{0:0#}",filecount) + ".txt";
        StreamReader reader = new StreamReader(file);

        string text = reader.ReadToEnd();
        int c=0;

        MatchCollection matchList = exp.Matches(text);
        c = matchList.Count;

        Console.WriteLine("Reading " + file + " -> " + c + " matches");
      }
    }
  }
}

If I comment out the line

c = matchList.Count;

it is pretty fast. But I need to know the number of matches it has found.

Is this the fastest way to do this? For the group of files I have, it's taking me 14 seconds to parse every file. Perl takes 1 second to output exactly the same information.

PS: Each file (text file) has +/- 1Mb so it's ~20Mb to process.

Thanks ;)

+2  A: 

You could use BackgroundWorker to parallelize the searching of the files. You will have to keep track of the count and aggregate it at the end. You could have a BGWorker per file or per a group of files. 4.0 Framework will ease this code as it has parallel data structures.

Adam Fyles
-1: foosnazzy -- add code example (per stackoverflow guidelines) and I will give you back +1.
This sounds suspiciously like you're holding his +1 for ransom. Deposit the code sample behind the 3rd pillar from the south-east side of the bridge. Come alone.
Dinah
+1 Dinah! I entirely agree.
jrista
A: 

One thing that might be working against you here is that you are leaving your file connections open, that adds some un-necessary overhead.

Be sure to call reader.Close(); after doing the ReadToEnd();

Mitchel Sellers
I tried that, but it didn't make a difference ;) thnks anyway
Ricardo Mestre
A: 

Using

StreamReader reader = new StreamReader(file);

is dangerous, it does not close your file handle.

Use:

using(Streamreader reader = new StreamReader(file).

to be sure your file handles are closed.

Carra
A: 

Intriguing. Jeff to the rescue?

jeroenh