views:

172

answers:

4

Below is example of text:

String id = "A:abc,X:def,F:xyz,A:jkl";

Below is regex:

Pattern p = Pattern.compile("(.*,)?[AC]:[^:]+$");
if(p.matcher(id).matches()) {
  System.out.println("Hello world!")
}

When executed above code should print Hello world!.

Does this regex can be modified to gain more performance?

+7  A: 

As I can't see your entire code, I can only assume that you do the pattern compilation inside your loop/method/etc. One thing that can improve performance is to compile at the class level and not recompile the pattern each time. Other than that, I don't see much else that you could change.

JasCav
Pattern.compile was inside loop and I have moved out, But I am checking if there is chance to improve in regex.
Jaydeep
+2  A: 
Pattern p = Pattern.compile(".*[AC]:[^:]+$");
if(p.matcher(id).matches()) {
  System.out.println("Hello world!")
}

As you seem to only be interested if it the string ends in A or C followed by a colon and some characters which aren't colons you can just use .* instead of (.*,)? (or do you really want to capture the stuff before the last piece?)

If the stuff after the colon is all lower case you could even do

Pattern p = Pattern.compile(".*[AC]:[a-z]+$");

And if you are going to match this multiple times in a row (e.g. loop) be sure to compile the pattern outside of the loop.

e,g

Pattern p = Pattern.compile(".*[AC]:[a-z]+$");
Matcher m = p.matcher(id);
while(....) {
  ...
  // m.matches()
  ...
  // prepare for next loop m.reset(newvaluetocheck);
}
jitter
Yeah that's more like what I'd use. If the part before colons can be more than one character you should probably add a word boundary too. Or a look-behind that checks for beginning of string/comma.
Blixt
One reason to leave the comma check is to avoid finding something like NA:jkl at the end of the string.
PSpeed
+1  A: 

Move Pattern instantiation to a final static field (erm, constant), in your current code you're recompiling essentially the same Pattern every single time (no, Pattern doesn't cache anything!). That should give you some noticeable performance boost right off the bat.

Esko
A: 

Do you even need to use regualr expressions? It seems there isn't a huge variety in what you are testing.

If you need to use the regex as others have said, compiling it only once makes sense and if you only need to check the last token maybe you could simplify the regex to: [AC]:[^:]{3}$.

Could you possibly use something along these lines (untested...)?

private boolean isId(String id)
    {
     char[] chars = id.toCharArray();
     boolean valid = false;
     int length = chars.length;

     if (length >= 5 && chars[length - 4] == ':')
     {
      char fifthToLast = chars[length - 5];

      if (fifthToLast == 'A' || fifthToLast == 'C')
      {
       valid = true;

       for (int i = length - 1; i >= length - 4; i--)
       {
        if (chars[i] == ':')
        {
         valid = false;
         break;
        }
       }
      }
     }

     return valid;
    }
Ed