tags:

views:

186

answers:

7

is the regex [a-Z] valid and if yes then is it the same as [a-zA-Z]? please note that in [a-Z] the a is lowercase and the Z is capital

Edit:

i received some answers specifiying that while [a-Z] is not valid then [A-z] is valid (but wont be the same as [a-zA-Z]) and this is really what i was looking for. since i wanted to know in general if its possible to replace [a-zA-Z] with a more compact version. thanks for all who contributed to the answer.

+2  A: 

You could always try it:

 print "ok" if "monkey" =~ /[a-Z]/;

Perl says

Invalid [] range "a-Z" in regex; marked by <-- HERE in m/[a-Z <-- HERE ]/ at a-z.pl line 4.
Kinopiko
Exactly what I said. My favorite saying is "try it 'n c" because if you happen to be developing in C at the time it has two meanings.
Robert Massaioli
I don't like "try it and see" because if he had tried `[A-z]` there'd be no error message but it wouldn't work right either.
John Kugelman
This is because in ASCII, uppercase comes first. So, [A-z] is valid, but [a-Z] is not.
jheddings
But he's not asking that question. The question is very clear. Why are you deliberately misinterpreting it?
Kinopiko
+2  A: 

i'm not sure about other languages' implementations, but in php you can do

"/[a-z]/i"

and it will case insensitive. There is probably something similar for other languages.

contagious
Most of PHP's features come from Perl, including this one. (PHP used to be written in Perl. Actually one of the P's used to stand for Perl)
Brad Gilbert
A: 

Um, why not try it and find out. It looks like it should be and it looks like it should match.

Try:

string = "AbCdefG"
string =~ s/[a-Z]+//g

And see if that works. That will tell you straight away. You should end up with the empty string.

Robert Massaioli
This misses the fail case for most regex engines. a-z is not back to back with A-Z in ascii.
Stefan Kendall
...and just because it "works" it doesn't mean that it will not match too much, which is the case if you do `[A-z]` (will also accept some punctuation).
Lucero
+1  A: 

No its not valid probably because the acsii values are not consecutive from z to A

ennuikiller
+16  A: 

No, a (97) is higher than Z (90). [a-Z] isn't a valid character class. However [A-z] wouldn't be equivalent either, but for a different reason. It would cover all the letters but would also include the characters between the uppercase and lowercase letters: [\]^_` .

John Kugelman
Adding link to http://web.cs.mun.ca/~michael/c/ascii-table.html for reference, beat me by 15 seconds ;) - Fast fingers... +1
gnarf
That isn't what he asked though.
Kinopiko
Yes it is... `[a-Z]` is invalid because `Z` comes before `a`
gnarf
"Yes it is"? How do you make "is the regex [a-Z] valid and if yes then is it the same as [a-zA-Z]?" into a question about [A-z]?
Kinopiko
The original poster has even specified that the a is lowercase and the Z is capital.
Kinopiko
I explained why both `[a-Z]` and `[A-z]` are invalid. Don't downvote me for doing extra credit. :-)
John Kugelman
You seem to be muddying the waters.
Kinopiko
It was perfectly clear to me. No muddy waters.
vmarquez
I'll wait until the original poster comments.
Kinopiko
I am unsure whether regexes are only specified for ASCII. Couldn't this also be dependent on the encoding and collation?
Svante
+2  A: 

You don't specify what language, but in general [a-Z] won't be a valid range, as in ASCII the lower-case alpha characters come after the upper-case ones. [A-z] might be a valid range (indicating all upper- and lower-cased alphas as well as the punctuation that appears between Z and a), but it might not be, depending on your particular implementation. The i flag can be added to the regex to make it case-insensitive; check your particular implementation for instructions on how to specify that flag.

Ether
+2  A: 

If it's valid, it won't do what you expect.

The character code of Z is lower than the character code of a, so if the codes are swapped to mean the range [Z-a], it will be the same as [Z\[\\\]^_`a], i.e. it will include the characters Z and a, and the characters between.

If you use [A-z] to get all upper and lower case characters, that is still not the same as [A-Za-z], it's the same as [A-Z\[\\\]^_`a-z].

Guffa