views:

139

answers:

2

I am having this particular requirement where a method has to be identified by different regular expressions for different components. For example, there need to be a regex for return parameter, one for method name, one for argument type and one for argument name. I was able to come up with an expression till this step as follows -

([^,]+) ([^,]+)\((([^,]+) ([^,]+))\)

It works well for a method signature like -

ReturnType foo(Arg parameter) The regular expression identifies ReturnType, foo, Arg and parameter separately.

Now the problem is that a method can have no/one/multiple arguments separated by commas. I am not able to get a repeating expression for this. Help will be appreciated.

+1  A: 

If you choose to go down the road of using regex/String manipulation, you could pull out the entire argument string, split it on commas and split the resulting strings on white space.

Though I would agree with JaredPar's comment on your question, at least if you expect to be able to handle all the possibilities that are valid in a java api.

For example there are a series of keywords that can prefix your method (public/private, static, final). There is also the possibility of annotations on either the method or the parameters. Something as simple as using a tab or newline between the return value instead of a space will break your current regex.

Good Luck

Angelo Genovese
+1  A: 

Let's abstract this out a bit, and say we want to match a (possibly empty) list of digits separated by commas.

(empty)
12
12,34
12,34,56

The pattern is therefore

^$|^\d+(,\d+)*$

Now you can try to replace the components to match what you want:

  • Instead of \d+, whatever regex you use to match type name and identifier
  • Maybe allow \s* around the comma
  • Maybe you'd even add the special varargs last argument (which can also be the first and only)

Note that if you allow generic type parameters, then you definitely can't use regex since you can nest the <...> and the language of balanced balanced parentheses of arbitrary depth is not regular.

Although you can argue that in practice, no one would ever nest type parameters deeper than, say, 3 levels, so then it becomes regular again.

That said, a proper parser is really the best tool for this. Just look for implementation of Java grammar, say, in ANTLR.


See also

polygenelubricants
Thank you polygenelubricants for the pointersI was able to generate expected output for mentioned problem on the similar lines.
nitesh