views:

1011

answers:

4

I need to create a regular expression that allows a string to contain any number of:

  • alphanumeric characters
  • spaces
  • (
  • )
  • &
  • .

No other characters are permitted. I used RegexBuddy to construct the following regex, which works correctly when I test it within RegexBuddy:

\w* *\(*\)*&*\.*

Then I used RegexBuddy's "Use" feature to convert this into Java code, but it doesn't appear to work correctly using a simple test program:

public class RegexTest
{
  public static void main(String[] args)
  {
    String test = "(AT) & (T)."; // Should be valid
    System.out.println("Test string matches: "
      + test.matches("\\w* *\\(*\\)*&*\\.*")); // Outputs false
  }
}
  • I must admit that I have a bit of a blind spot when it comes to regular expressions. Can anyone explain why it doesn't work please?
+4  A: 

Maybe I'm misunderstanding your description, but aren't you essentially defining a class of characters without an order rather than a specific sequence? Shouldn't your regexp have a structure of [xxxx]+, where xxxx are the actual characters you want ?

Uri
+13  A: 

That regular expression tests for any amount of whitespace, followed by any amount of alphanumeric characters, followed by any amount of open parens, followed by any amount of close parens, followed by any amount of ampersands, followed by any amount of periods.

What you want is...

test.matches("[\\w \\(\\)&\\.]*")

As mentioned by mmyers, this allows the empty string. If you do not want to allow the empty string...

test.matches("[\\w \\(\\)&\\.]+")

Though that will also allow a string that is only spaces, or only periods, etc.. If you want to ensure at least one alpha-numeric character...

test.matches("[\\w \\(\\)&\\.]*\\w+[\\w \\(\\)&\\.]*")

So you understand what the regular expression is saying... anything within the square brackets ("[]") indicates a set of characters. So, where "a*" means 0 or more a's, [abc]* means 0 or more characters, all of which being a's, b's, or c's.

Illandril
Good, but you should mention that this allows the empty string, which may not be his intent.
Michael Myers
It's okay, it only gets to the regex test if the string isn't empty.
John Topley
Well I expanded the answer anyway to include a few other alternatives, and explain how it works a little.
Illandril
Excellent, thanks.
John Topley
+2  A: 

the regex

\w* *\(*\)*&*\.*

will give you the items you described, but only in the order you described, and each one can be as many as wanted. So "skjhsklasdkjgsh((((())))))&&&&&....." works, but not mixing the characters.

You want a regex like this:

\[\w\(\)\&\.]+\

which will allow a mix of all characters.

edit: my regex knowledge is limited, so the above syntax may not be perfect.

Scott M.
Tomalak
the square brackets are not escaped, the regex is contained in backslashes. Other than that, thanks for the info. Im not sure what needs to be escaped and what doesnt, so thanks.
Scott M.
Why on Earth would anybody want to contain a regular expression in backslashes? Nothing in your regex needs to be escaped.
Jan Goyvaerts
Alan Moore
well at least every day is a learning experience at stackoverflow :). Like i said im just trying to help, but i do realize my regex knowledge is limited. Thanks for clearing those issues up.
Scott M.
+4  A: 

The difference between your Java code snippet and the Test tab in RegexBuddy is that the matches() method in Java requires the regular expression to match the whole string, while the Test tab in RegexBuddy allows partial matches. If you use your original regex in RegexBuddy, you'll see multiple blocks of yellow and blue highlighting. That indicates RegexBuddy found multiple partial matches in your string. To get a regex that works as intended with matches(), you need to edit it until the whole test subject is highlighted in yellow, or if you turn off highlighting, until the Find First button selects the whole text.

Alternatively, you can use the anchors \A and \Z at the start and the end of your regex to force it to match the whole string. When you do that, your regex always behaves in the same way, whether you test it in RegexBuddy, or whether you use matches() or another method in Java. Only matches() requires a full string match. All other Matcher methods in Java allow partial matches.

Jan Goyvaerts
Great tip - thanks!
John Topley