views:

1666

answers:

6

I need a Regex that will match a java method declaration. I have come up with one that will match a method declaration, but it requires the opening bracket of the method to be on the same line as the declaration. If you have any suggestions to improve my regex or simply have a better one then please submit an answer.

Here is my regex: "\w+ +\w+ *\(.*\) *\{"

For those who do not know what a java method looks like I'll provide a basic one:

int foo()
{

}

There are several optional parts to java methods that may be added as well but those are the only parts that a method is guaranteed to have.

Update: My current Regex is "\w+ +\w+ *\([^\)]*\) *\{" so as to prevent the situation that Mike and adkom described.

+2  A: 

Have you considered matching the actual possible keywords? such as:

(?:(?:public)|(?:private)|(?:static)|(?:protected)\s+)*

It might be a bit more likely to match correctly, though it might also make the regex harder to read...

Mike Stone
That regex ended up matching the signature of all of the methods I had such as System.out.println() instead of just the declarations of methods.
Anton
+2  A: 

I'm pretty sure Java's regex engine is greedy by default, meaning that "\w+ +\w+ *\(.*\) *\{" will never match since the .* within the parenthesis will eat everything after the opening paren. I recommend you replace the .* with [^)], this way you it will select all non-closing-paren characters.

NOTE: Mike Stone corrected me in the comments, and since most people don't really open the comments (I know I frequently don't notice them):

Greedy doesn't mean it will never match... but it will eat parens if there are more parens after to satisfy the rest of the regex... so for example "public void foo(int arg) { if (test) { System.exit(0); } }" will not match properly...

akdom
Greedy doesn't mean it will never match... but it will eat parens if there are more parens after to satisfy the rest of the regex... so for example "public void foo(int arg) { if (test) { System.exit(0); } }" will not match properly...
Mike Stone
A: 

From the little testing I did with my regex it seemed to work with .* in it but [^)] does seem like a better idea, so I'm replacing it.

Anton
+1  A: 

I came up with this:

\b\w*\s*\w*\(.*?\)\s*\{[\x21-\x7E\s]*\}

I tested it against a PHP function but it should work just the same, this is the snippet of code I used:

function getProfilePic($url)
 {
    if(@open_image($url) !== FALSE)
     {
     @imagepng($image, 'images/profiles/' . $_SESSION['id'] . '.png');
     @imagedestroy($image);
     return TRUE;
     }
    else 
     {
     return FALSE;
     }
 }

MORE INFO:

Options: case insensitive

Assert position at a word boundary «\b»
Match a single character that is a “word character” (letters, digits, etc.) «\w*»
   Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.) «\s*»
   Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match a single character that is a “word character” (letters, digits, etc.) «\w*»
   Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the character “(” literally «\(»
Match any single character that is not a line break character «.*?»
   Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the character “)” literally «\)»
Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.) «\s*»
   Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the character “{” literally «\{»
Match a single character present in the list below «[\x21-\x7E\s]*»
   Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
   A character in the range between ASCII character 0x21 (33 decimal) and ASCII character 0x7E (126 decimal) «\x21-\x7E»
   A whitespace character (spaces, tabs, line breaks, etc.) «\s»
Match the character “}” literally «\}»


Created with RegexBuddy
Unkwntech
+1  A: 

A tip:

If you are going to write the regex in Perl, please use the "xms" options so that you can leave spaces and document the regex. For example you can write a regex like:

 m{\w+ \s+      #return type
   \w+ \s*      #function name
   [(] [^)]* [)] #params
   \s* [{]           #open paren
  }xms

One of the options (think x) allows the # comments inside a regex. Also use \s instead of a " ". \s stands for any "blank" character. So tabs would also match -- which is what you would want. In Perl you don't need to use / /, you can use { } or < > or | |.

Not sure if other languages have this ability. If they do, then please use them.

is there a similar option in java?
Anton
+3  A: 
(public|protected|private|static|\s) +[\w\<\>\[\]]+\s+(\w+) *\([^\)]*\) *(\{?|[^;])

I think that the above regexp can match almost all possible combinations of Java method declarations, even those including generics and arrays are return arguments, which the regexp provided by the original author did not match.

Georgios Gousios
I think that would do it. It will also exclude constructors too, which is nice :-)
James Camfield