Let's clear up the following presumptions:
- There three sections to the string.
- section 1 always start with RR uppercase or lowercase and ends with one or more decimal digits.
- section 2 always start with S uppercase or lowercase and ends with one or more decimal digits.
- section 3 always start with C upper or lower and ends with one or more decimal digits.
For simplicity, the following would suffice.
[Rr][Rr][0-9]+[ ]+[Ss][0-9]+[ ]+[Cc][0-9]+
- [Rr] means exactly one alphabet R,
upper or lower case.
- [0-9] means exactly one decimal
digit.
- [0-9]+ means at least one, or more,
decimal digits.
- [ ]+ means at least one, or more,
spaces.
However, to be useful, normally, when you use regex, we would also detect individual sections to exploit the matching capability to help us assign individual section values to their respective/individual variables.
Therefore, the following regex is more helpful.
([Rr][Rr][0-9]+)[ ]+([Ss][0-9]+)[ ]+([Cc][0-9]+)
Let's apply that regex to the string
string inputstr = "Holy Cow RR12 S53 C21";
This is what your regex matcher would let you know:
start pos=9, end pos=21
Group(0) = Rr12 S53 C21
Group(1) = Rr12
Group(2) = S53
Group(3) = C21
There are three pairs of elliptical/round brackets.
Each pair is a section of the string, which the regex compiler calls a group.
The regex compiler would call the match of
- the whole matched string as group 0
- rural route as group 1
- site as group 2 and
- compartment as group 3.
Naturally, groups 1, 2 & 3 will encounter matches, if and only if group 0 has a match.
Therefore, your algorithm would exploit that with the following pseudocode
string postalstr, rroute, site, compart;
if (match.group(0)!=null)
{
int start = match.start(0);
int end = match.end(0);
postalstr = inputstr.substring(start, end);
start = match.start(1);
end = match.end(1);
rroute = inputstr.substring(start, end);
start = match.start(2);
end = match.end(2);
site = inputstr.substring(start, end);
start = match.start(3);
end = match.end(3);
compart = inputstr.substring(start, end);
}
Further, you may want to enter into a database table with the columns: rr, site, compart, but you only want the numerals entered without the alphabets "rr", "s" or "c".
This would be the regex with nested grouping to use.
([Rr][Rr]([0-9]+))[ ]+([Ss]([0-9]+))[ ]+([Cc]([0-9]+))
And the matcher will let you know the following when a match occurs for group 0:
start=9, end=21
Group(0) = Rr12 S53 C21
Group(1) = Rr12
Group(2) = 12
Group(3) = S53
Group(4) = 53
Group(5) = C21
Group(6) = 21