tags:

views:

83

answers:

3

Hi,

I was creating a regex for following condition

a string can contain any alphabet, digit and ' and ? the string should start with either alphabet or digit

for ex:

adsfj
asfj's
jfkd'sdf?
df
ds?
afjdk?

are all valid

I use C# 2.0

I tried something like this

  ^[a-zA-Z0-9]+[']\*[a-zA-Z0-9]\*[?]\*[a-zA-Z0-9]\*$

which did not solve the problem.... any idea..?

+1  A: 

How about:

Regex rx = new Regex(@"^[a-z\d][a-z\d'?]*$", RegexOptions.IgnoreCase);

This would match just as you say: Starts with: any letter from the alphabet or a digit. Until the end it can contain: any letter from the alphabet, a digit, ' or ? characters.

Huppie
you have an extra \w in that first part, no?
Jimmy
Warning: `[a-Z]` is NOT the same as `[a-zA-Z]`. The ranges are simple ASCII and there are characters between `z` and `A`.
Max Shawabkeh
`\w` allows underscores too.
Amarghosh
Also be aware that `\w` and `\d` match **Unicode** letters and digits, not just the ASCII characters the OP specified.
Alan Moore
Wow, I don't know what went wrong but I must've fallen asleep while writing that ;) Corrected.
Huppie
`<Nitpicking>` @Max `[a-Z]` is wrong even if you want to allow the characters between upper and lowercase letters. `A-z` would be the correct one in that case. (A is 65, a is 97).`</Nitpicking>`
Amarghosh
@Amarghosh: Right. I was concentrating on the obvious problem and missed the larger one.
Max Shawabkeh
A: 
^[a-zA-Z0-9][a-zA-Z0-9?']*$

This would even allow strings of length one. To specify a minimum and maximum lengths, change the regex to:

^[a-zA-Z0-9][a-zA-Z0-9?']{minlen-1,maxlen - 1}$

For example, the following allows strings of length five to 10.

^[a-zA-Z0-9][a-zA-Z0-9?']{4,9}$
Amarghosh
+5  A: 

It's simpler than you made it: ^[a-zA-Z\d][a-zA-Z\d'?]*$

^            # Start of string anchor.
[a-zA-Z\d]   # First character is a letter or a digit.
[a-zA-Z\d'?] # Subsequent characters are letters, digits, apostrophes or question marks...
*            # ...repeated any number of times.
$            # Until the end of the string.

If you allow underscores with your other characters, it can be simplified to: ^\w[\w'?]*$

Max Shawabkeh
`\w` is a superset of `\d`.
Alan Moore
Right you are. Edited accordingly. It would still work, of course, but this is definitely cleaner.
Max Shawabkeh