ansaurus

Question

C#: Determing if string is like this pattern; possible regex

Answer 1

+6 A:

This should work:

^[Rr][Rr]\d+ *[Ss]\d+ *[Cc]\d+$

or as per other comment

^[Rr][Rr][0-9]+ *[Ss][0-9]+ *[Cc][0-9]+$

What it all means:

^ - start of string
[Rr] - next char must be a R or r
[Rr] - next char must be a R or r
\d+ or [0-9]+ - next part must be 1 or more digits
(space)* - allow for 0 or more spaces
[Ss] - next char must be a S or s
\d+ or [0-9]+ - next part must be 1 or more digits
(space)* - allow for 0 or more spaces
[Cc] - next char must be a C or c
\d+ or [0-9]+ - next part must be 1 or more digits
$ - end of string

There might be a more elegant solution but this is pretty easy to read.

Edit: Updated to include input from some of the comments

Kelsey 2009-08-26 22:12:50

Simplicity is a good thing with regex's.

Ron Warholic 2009-08-26 22:26:58

Definately... I wish more people would break down their solutions as I have above to make them easier to understand since regex is not the most readable syntax.

Kelsey 2009-08-26 22:36:50

Answer 2

+3 A:

How about...

someString = someString.Trim(); // eliminate leading/trailing whitespace
bool isRural = Regex.Match(
   someString,
   @"^rr\d+\s*s\d+\s*c\d+$",
   RegexOptions.IgnoreCase);

This eliminates the uppercase/lowercase switching within the pattern and uses \s to allow any (non-newline) whitespace character (e.g. tabs). If you want spaces only, then '\s' should be changed to ' '.

bobbymcr 2009-08-26 23:03:04

+1, this is the simplest and most correct answer yet, **but**, be aware that `\d` matches more than just `[0-9]`. It matches any character for which char.IsDigit returns true, which by my count includes some **230** unicode code points.

P Daddy 2009-08-27 01:14:03

Yeah, that's true, and a similar claim can be made for `\s` (`char.IsWhiteSpace`).

bobbymcr 2009-08-27 03:23:20

@P - thanks for the insight on the `\d`!

p.campbell 2009-08-27 15:25:03

Answer 3

+1 A:

Let's clear up the following presumptions:

There three sections to the string.
section 1 always start with RR uppercase or lowercase and ends with one or more decimal digits.
section 2 always start with S uppercase or lowercase and ends with one or more decimal digits.
section 3 always start with C upper or lower and ends with one or more decimal digits.

For simplicity, the following would suffice.

[Rr][Rr][0-9]+[ ]+[Ss][0-9]+[ ]+[Cc][0-9]+

[Rr] means exactly one alphabet R, upper or lower case.
[0-9] means exactly one decimal digit.
[0-9]+ means at least one, or more, decimal digits.
[ ]+ means at least one, or more, spaces.

However, to be useful, normally, when you use regex, we would also detect individual sections to exploit the matching capability to help us assign individual section values to their respective/individual variables.

Therefore, the following regex is more helpful.

([Rr][Rr][0-9]+)[ ]+([Ss][0-9]+)[ ]+([Cc][0-9]+)

Let's apply that regex to the string

string inputstr = "Holy Cow RR12 S53 C21";

This is what your regex matcher would let you know:

start pos=9, end pos=21
Group(0) = Rr12 S53 C21
Group(1) = Rr12
Group(2) = S53
Group(3) = C21

There are three pairs of elliptical/round brackets. Each pair is a section of the string, which the regex compiler calls a group.

The regex compiler would call the match of

the whole matched string as group 0
rural route as group 1
site as group 2 and
compartment as group 3.

Naturally, groups 1, 2 & 3 will encounter matches, if and only if group 0 has a match.

Therefore, your algorithm would exploit that with the following pseudocode

string postalstr, rroute, site, compart;
if (match.group(0)!=null)
{
  int start = match.start(0);
  int end = match.end(0);
  postalstr = inputstr.substring(start, end);

  start = match.start(1);
  end = match.end(1);
  rroute = inputstr.substring(start, end);

  start = match.start(2);
  end = match.end(2);
  site = inputstr.substring(start, end);

  start = match.start(3);
  end = match.end(3);
  compart = inputstr.substring(start, end);
}

Further, you may want to enter into a database table with the columns: rr, site, compart, but you only want the numerals entered without the alphabets "rr", "s" or "c". This would be the regex with nested grouping to use.

([Rr][Rr]([0-9]+))[ ]+([Ss]([0-9]+))[ ]+([Cc]([0-9]+))

And the matcher will let you know the following when a match occurs for group 0:

start=9, end=21
Group(0) = Rr12 S53 C21
Group(1) = Rr12
Group(2) = 12
Group(3) = S53
Group(4) = 53
Group(5) = C21
Group(6) = 21

Blessed Geek 2009-08-27 00:54:01

Answer 4

A:

FYI: If you're going to be using this RegEx to test a lot of data, your best bet would be to tell .NET to precompile it - it will be compiled into IL and grant a performance boost, rather than simply interpreting the RegEx pattern each time. Specify it as a static member on whichever class contains your method, like so:

private static Regex re = new Regex("pattern", RegexOptions.Compiled | RegexOptions.IgnoreCase);

...and the method to test whether a string matches the pattern is...

bool matchesString = re.IsMatch("string");

Good luck.

Tullo 2009-08-27 02:08:56

*Maybe*. `RegexOptions.Compiled` isn't always a win, and profiling is necessary. See: http://www.codinghorror.com/blog/archives/000228.html and http://stackoverflow.com/questions/414328/using-static-regex-ismatch-vs-creating-an-instance-of-regex/414411#414411

P Daddy 2009-08-27 02:35:20

Thanks Tullo and PDaddy. An update in the question around the expected usage!

p.campbell 2009-08-27 03:38:35

ansaurus

tags:

views:

answers:

C#: Determing if string is like this pattern; possible regex

related questions