tags:

views:

375

answers:

4

I have the following pattern:

(COMPANY) -277.9887 (ASP,) -277.9887 (INC.)

I want the final output to be:

COMPANY ASP, INC.

Currently I have the following code and it keeps returning the original pattern ( I assume because the group all falls between the first '(' and last ')'

Pattern p = Pattern.compile("((.*))",Pattern.DOTALL);
Matcher matcher = p.matcher(eName);
while(matcher.find())
{
    System.out.println("found match:"+matcher.group(1));
}

I am struggling to get the results I need and appreciate any help. I am not worried about concatenating the results after I get each group, just need to get each group.

+1  A: 
Pattern p = Pattern.compile("\\((.*?)\\)",Pattern.DOTALL);
chaos
chaos, you rock! I never thought to try that pattern and it worked exactly as I needed. Thanks for taking the time to answer!
northpole
You're welcome. :)
chaos
A: 

Not a direct answer to your question but I recommend you use RegxTester to get to the answer and any future question quickly. It allows you to test in realtime.

Oliver
It supports .NET regexes though, not java.
wds
+4  A: 

Your .* quantifier is 'greedy', so yes, it's grabbing everything between the first and last available parenthesis. As chaos says, tersely :), using the .*? is a non-greedy quantifier, so it will grab as little as possible while still maintaining the match.

And you need to escape the parenthesis within the regex, otherwise it becomes another group. That's assuming there are literal parenthesis in your string. I suspect what you referred to in the initial question as your pattern is in fact your string.

Query: are "COMPANY", "ASP," and "INC." required?

If you must have values for them, then you want to use + instead of *, the + is 1-or-more, the * is zero-or-more, so a * would match the literal string "()"

eg: "((.+?))"

ptomli
this is a great tip thanks, I will only have () if there is in fact a value between them. But I will mark this for the future. Thanks!
northpole
You can and possibly should try to be more specific than '.' in the matching of your groups too. If the string is expected to have only uppercase letters and maybe the illustrated punctuation, then you can try something like "\(([A-Z,.]+?)\)". In this way, dodgy data is at least noticed and can be corrected.
ptomli
A: 

If your strings are always going to look like that, you could get away with just using a couple calls to replaceAll instead. This seems to work for me:

String eName = "(COMPANY) -277.9887 (ASP,) -277.9887 (INC.)";
     String eNameEdited = eName.replaceAll("\\).*?\\("," ").replaceAll("\\(|\\)","");
     System.out.println(eNameEdited);

Probably not the most efficient thing in the world, but fairly simple.

Brent Nash