tags:

views:

118

answers:

5

I'll preface this question by mentioning that while I'm far from a regular expressions guru, they are not completely foreign to me. Building a regular expression to search for a pattern inside a particular string generally isn't a problem for me, but I have a (maybe?) unique situation.

I have a set of values, say:

028938
DEF567987
390987.456
GHI345928.039

I want to match a certain set of strings, such as:

  • Strings composed of exactly 6 digits
  • Strings composed of exactly 6 digits, a decimal, followed by exactly 3 more digits

In the above examples, the first and third values should be matched.

I'm using the regular expressions:

[0-9]{6}
[0-9]{6}.[0-9]{3}

Unfortunately, since all the above examples contain the specified pattern, all values are matched. This is not my intention.

So my question, in a nutshell, is how to write a regular expression that matches a string exactly and completely, with no additional characters to the right or left of the matched pattern? Is there a term for this type of matching? (Google was no help.) TIA

+16  A: 

use ^ and $ to match the start and end of your string

^[0-9]{6}$
^[0-9]{6}\.[0-9]{3}$

Reference: http://www.regular-expressions.info/anchors.html

Also, as noted by Mikael Svenson, you can use the word boundary \b if you are searching for this pattern in a larger chunk of text.

Reference: http://www.regular-expressions.info/wordboundaries.html

You could also write both those regexes in one shot

^\d{6}(\.\d{3})?$
Chad
What the ....? This is being voted down by some people?
Chad
Welcome to SO, Chad, there is no explaining some user actions. *sigh*
msw
your answer is wrong, you need to escape the dot, it should also use \d instead of [0-9]
fuzzy lollipop
@msw, true, I did just copy/paste craig's *working* regex, I'll fix it now
Chad
Thanks a lot Chad, both for a working regex and for explaining *why* it works.
craig
@craig, no problem.
Chad
@fuzzy: Using `[0-9]` instead of `\d` is not necessarily an error. Especially not in Java, where they mean exactly the same thing.
Alan Moore
+3  A: 

You can use ^ to require the matching at the start of a line and $ to require the end of a line

^[0-9]{6}\.[0-9]{3}$

[0-9] can also be written as \d

^\d{6}\.\d{3}$

You can also use \b for word boundaries if you want to match your pattern in a line with eg. spaces in them

\btest\b

will match the word test in this line

this is a test for matching
Mikael Svenson
+1 for mentioning word boundaries
Chad
this won't do what he wants you need to escape the dot
fuzzy lollipop
@fuzzy: true.. and you could edit the question to make it proper :) And it will work as . for wildcard will also match a dot ;) But it will also match wrong answers of-course if they are present.
Mikael Svenson
the question is wrong and needs to stay wrong, that is part of the question, he has the wrong syntax, fixing his question won't let people that have the same type of problem understand the __correct__ answers
fuzzy lollipop
@fuzzy: The question is about matching the start and end of a pattern/line, not how the . wildcard works.
Mikael Svenson
no the question is about how to EXACTLY match an input with a regular expression, the un-escaped do prevents that EXACT match, the begin and end markers are a red herring to correct answer. his example was not doing what he thought it was doing and needs to be corrected for a totally useful correct solution.
fuzzy lollipop
Some people just love to be annoying don't they?
Chad
+1  A: 
^\d{6}$
^\d{6}\.\d{3}$

are the correct patterns you can test them 6 digits only and 6 digits dot 3 digits.

^\d{6}((\.\d{3}$)|$)

will match either 6 digits or 6 digits dot 3 digits

Rubular is your friend!

fuzzy lollipop
+1  A: 

Match this regex:

"^\d{6}((\.\d{3}$)|$)"
Gopi
@Gopi, or I think, just `^\d{6}(\.\d{3})?$`
Chad
@Chad No. This would not match the first condition - "Strings composed of exactly 6 digits"
Gopi
Yes now that you added a '?' it would work
Gopi
+1  A: 

i think you want something like this:

"^\d{6}(\.\d{3})?$"

you need to escape the "dot" as it is "any" character in regexp.

Ryan Conrad