tags:

views:

100

answers:

4

I'm trying to match a SEDOL (exactly 7 chars: 6 alpha-numeric chars followed by 1 numeric char)

My regex

([A-Z0-9]{6})[0-9]{1}

matches correctly but strings greater than 7 chars that begin with a valid match also match (if you see what I mean :)). For example:

B3KMJP4

matches correctly but so does:

B3KMJP4x

which shouldn't match.

Can anyone show me how to avoid this?

+5  A: 

Dollar sign at the end of the regex (called an anchor) signifies end of string:

^([A-Z0-9]{6})\d$

I also added "^" at the start which signifies start of string and prevents matching xB3KMJP4 I also simplified the original regex.

By the way, as per Wikipedia, for the first character, vowels are not used. I'm not quite sure if that's a rule or a convention.

DVK
I have removed the space as per Tim's comment above (Cut'n'paste - the source of 78.3% of all bugs). BUT... I'm a at a bit of a loss to actually verify whether it is a valid character or not for a SEDOL - "Alphanumeric" can include spaces under some interpretations. I'm inclined to believe that Tim's interpretation is correct.
DVK
I believe this is correct, none of the hundreds of SEDOLs I have listed begin with a vowel char.
Simon
+3  A: 

You need to use both start and end anchors like this:

^([A-Z 0-9]{6})[0-9]{1}$

This will match a string which has 6 alphanumeric+space char followed by one digit. It does not match if such a string is found as a suffix or prefix of a bigger string.

Also you you can get rid of {1} because [0-9] matches a single digit by itself.

Also \d represents a single digit. So you can shorten your regex as follows:

^([A-Z \d]{6})\d$
codaddict
@downvoter: care to explain ?
codaddict
It seems that "A-Z \d" looks less readable and maintanable that A-Z0-9.
Lemurik
+1 for noting the space in the char class.
Webdev
+3  A: 
 ^([A-Z\d]{6})\d$
  • Use ^ for start of string
  • $ for end of string
  • Remove extra space,just noticed that one
  • Swapped out 0-9 with \d
  • Removed {1} since this is redundant
gmcalab
+3  A: 

You're forgetting that regex matches anywhere in the string. To fix it, try this.

^([A-Z 0-9]{6})[0-9]{1}$

The ^ means to match the beginning of the string, and the $ means to match the end of the string.

Kibbee