tags:

views:

80

answers:

3

When you are using a Regex instance, that is used in a method where that method is called a couple thousand times to parse things in a certain way, should that method include the Regex instance, or should the Regex instance be part of the class as a static member?

I get the feeling that initializing the same Regex thousands of times might be an overhead. But I am mainly concerned as best practice.

Where should I declare and define the Regex?

EDIT: Pseudo code:

static Regex regex ...

IEnumerable<string> Parse (string str)
{
    var matches = // use regex
    foreach (var match in matches)
    {
        ...
    }
}

void Main()
{
    foreach (var page in pages)
    {
         Parse (page); ...
    }
}
+1  A: 

If you're calling the regex thousands of times in a loop, the static methods implement LRU-based caching for you.

I'd just rely on that, unless your method gets called thousands of times sporadically over the life of your application, in which you'd probably better off putting a static reference in your class. It depends on your particular use case.

see: http://blogs.msdn.com/bclteam/archive/2006/10/19/regex-class-caching-changes-between-net-framework-1-1-and-net-framework-2-0-josh-free.aspx

Jimmy
Thanks, I added some sample code to show how it looks.
Joan Venge
The automatic cache is nice, but I like the idea of a private static to indicate that "this regex is used an awful lot, let's keep it around".
clintp
+2  A: 

Your class that wraps your parsing functionality should contain a private reference (possibly static if it's a static usage) to the regex, if you're concerned about that kind of stuff.

EDIT:

To me, it's not really about the performance, as since there's the internal caching and all that jazz that Jimmy mentioned, I'd imagine the creation of a regex is probably not as expensive as the actual regex processing. It's more about design principles: the factory method or parsing utility is conceptually operating using some internal filter (regex) to generate a list for you. If it's the same one used over and over, that's conceptually something you're creating once and then keeping around and using over and over.

Tanzelax
Is it because creating new Regex instances are expensive. I seem to remember reading something like that, but not sure.
Joan Venge
@Joan: It's more about design principles, to me. I'll edit my answer.
Tanzelax
+1  A: 

I almost always create the regex as a static member of the class where it will be used, or in a common utils class if it is used in many places.

Ray
Is it because creating new Regex instances are expensive. I seem to remember reading something like that, but not sure.
Joan Venge
Creating any object over and over again can be expensive. I don't know if regexes are relatively more or less expensive than other useful objects. However, if I am going to use something many times, I want to keep it around, ready to go so that I can respond to users requests as quickly as possible.
Ray