tags:

views:

85

answers:

3

I had this regex in java that matched either an alphanumeric character or the tilde (~)

^([a-z0-9])+|~$

Now I have to add also the characters - and _ I've tried a few combinations, neither of which work, for example:

^([a-zA-Z0-9_-])+|~$ ^([a-zA-Z0-9]|-|_)+|~$

Sample input strings that must match:

woZOQNVddd

00000

ncnW0mL14-

dEowBO_Eu7

7MyG4XqFz-

A8ft-y6hDu ~

Any clues / suggestion?

+3  A: 

You need to escape the -, like \-, since it is a special character (the range operator). _ is ok.

So ^([a-z0-9_\-])+|~$.

Edit: your last input String will not match because the regular expression you are using matches a string of alphanumeric characters (plus - and _) OR a tilde (because of the pipe). But not both. If you want to allow an optional tilde on the end, change to:

^([a-z0-9_\-])+(~?)$

danben
Or you could put the dash as the first character which tells the parser not to consider it a range delimiter.
Max Shawabkeh
Tried using `^([a-zA-Z0-9_\\-])+|~$`. Doesn't work
Pablo Fernandez
@Max tried `^([-a-zA-Z0-9_])+|~$`. Doesn't work
Pablo Fernandez
What does it do?
danben
If it isn't throwing an error, and is just not matching, please post the input string.
danben
I cannot group everything because I need either a number or just the tilde. If there's a tilde it must be the only character there
Pablo Fernandez
Ok, I moved that out. Not sure where you want the - and _. But if you're having problems, please post the input and observed behavior.
danben
+3  A: 

If you put the - first, it won't be interpreted as the range indicator.

^([-a-zA-Z0-9_])+|~$

This matches all of your examples except the last one using the following code:

String str = "A8ft-y6hDu ~";
System.out.println("Result: " + str.matches("^([-a-zA-Z0-9_])+|~$"));

That last example won't match because it doesn't fit your description. The regex will match any combination of alphanumerics, -, and _, OR a ~ character.

Bill the Lizard
fails at `ncnW0mL14-`
Pablo Fernandez
That's because I hadn't added the A-Z yet. It matches now.
Bill the Lizard
Thanks @bill it works! (I'm using @cletus's since it's more succinct)
Pablo Fernandez
@Pablo: Yeah, I upvoted @cletus, since his answer is the one I would use, too. :)
Bill the Lizard
+4  A: 

- is a special character within square brackets. It indicates a range. If it's not at either end of the regex it needs to be escaped by putting a \ before it.

It's worth pointing out a shortcut: \w is equivalent to [0-9a-zA-Z_] so I think this is more readable:

^([\w-]+|~$
cletus
Actually I had a typo. It works, thanks
Pablo Fernandez
+1 - Somehow it had escaped (no pun intended) my notice that `\w` would also match an `_` character. I guess that's so we can use it to match really bad variable names.
Bill the Lizard
@Bill: it's often forgotten or either not known. It's a weird one. I think I'd prefer if that weren't the case because I've not used `\w` many times because of it but its easy to do `\w_` if you had to. And I think most of the time you don't care about `_`.
cletus
Note: `\w` might match way more than just ASCII in some RegEx flavors (not JavaScript, though). In some flavors it's more equivalent to `[\p{L}\p{N}_-]` (that is, matching any Unicode letter or number).
Joey
@Johannes: what flavours?
cletus
.NET, Perl, Tcl and a few others: http://www.regular-expressions.info/refflavors.html
Joey