tags:

views:

127

answers:

5

I need to be able to check for a pattern with | in them. For example an expression like d*|*t should return true for a string like "dtest|test".

I'm no regular expression hero so I just tried a couple of things, like:

Regex Pattern = new Regex("s*\|*d"); //unable to build because of single backslash
Regex Pattern = new Regex("s*|*d"); //argument exception error
Regex Pattern = new Regex(@"s*\|*d"); //returns true when I use "dtest" as input, so incorrect
Regex Pattern = new Regex(@"s*|*d"); //argument exception error
Regex Pattern = new Regex("s*\\|*d"); //returns true when I use "dtest" as input, so incorrect
Regex Pattern = new Regex("s*" + "\\|" + "*d"); //returns true when I use "dtest" as input, so incorrect
Regex Pattern = new Regex(@"s*\\|*d"); //argument exception error

I'm a bit out of options, what should I then use? I mean this is a pretty basic regular expression I know, but I'm not getting it for some reason.

A: 

How about s.*\|.*d?
The problem of your tries is, that you wrote something like s* - which means: match any number of s(including 0). You need to define the characters following the s by using . like in my example. You can use \w for alphanumerical characters, only.

tanascius
+4  A: 

In regular expressions, the * means "zeros or more (the pattern before it)", e.g. a* means zero or more a, and (xy)* expects matches of the form xyxyxyxy....

To match any characters, you should use .*, i.e.

Regex Pattern = new Regex(@"s.*\|.*d");

(Also, | means "or")

Here . will match any characters[1], including |. To avoid this you need to use a character class:

new Regex(@"s[^|]*\|[^d]*d");

Here [^x] means "any character except x".

You may read http://www.regular-expressions.info/tutorial.html to learn more about RegEx.

[1]: Except a new line \n. But . will match \n if you pass the Singleline option. Well this is more advanced stuff...

KennyTM
It might be easier to understand to use non-greedy wildcards instead of a negative character class (eg. `.*?d` instead of `[^d]*d`).
eyelidlessness
@eye: non greedy match can lead to slow backtracking, and some flavors of RegEx (e.g. ERE) doesn't support non-greedy pattern.
KennyTM
I found the first point counter-intuitive because the two patterns should be identical in practice, but I looked it up and it seems you're right (for most engines, some do optimize); the second point doesn't seem relevant as it's clear which engine is being used by the questioner (though I can understand providing a more general-purpose answer for future searchers).
eyelidlessness
Mixed greediness in REs is a formally ill-defined concept. In practice, that means if you're mixing your greediness then you're writing REs that different engines will match in different ways (and which can surprise you when the planets align wrong). Sticking to *just* greedy *or* non-greedy is fine though; that just means you use different types of automata to do the matching. I prefer to use greedy REs with character classes so that things say exactly what I want; YMMV of course.
Donal Fellows
+1  A: 

A | inside a char class will be treated literally, so you can try the regex:

[|]
codaddict
A: 

Try this.

string test1 = "dtest|test";
string test2 = "apple|orange";
string pattern = @"d.*?\|.*?t";

Console.WriteLine(Regex.IsMatch(test1, pattern));
Console.WriteLine(Regex.IsMatch(test2, pattern));
Anthony Pegram
A: 

Regex Pattern = new Regex(@"s*\|*d"); would work, except that having |* means "0 or more pipes". So You probably want Regex Pattern = new Regex(@"s.*\|.*d");

Tesserex