views:

11255

answers:

7

If I have a string like this:

FOO[BAR]

I need a generic way to get the "BAR" string out of the string so that no matter what string is between the square brackets it would be able to get the string.

e.g.

FOO[DOG] = DOG
FOO[CAT] = CAT
+10  A: 

You should be able to use non-greedy quantifiers, specifically *?. You're going to probably want the following:

Pattern MY_PATTERN = Pattern.compile("\\[(.*?)\\]");

This will give you a pattern that will match your string and put the text within the square brackets in the first group. Have a look at the Pattern API Documentation for more information.

To extract the string, you could use something like the following:

Matcher m = MY_PATTERN.matcher("FOO[BAR]");
while (m.find()) {
    String s = m.group(1);
    // s now contains "BAR"
}
Bryan Kyle
It's worth mentioning that if there is a newline between the square brackets, this will fail and you should use the Pattern.DOTALL flag to avoid that.
cletus
Using the above pattern, how would you then use that to extract the string containing the string BAR? I'm looking at the Pattern API and the Matcher API but I'm still not sure how to get the string itself.
digiarnie
@cletus: Good call!@digiarnie: I've added a revision to the answer that contains some straw-man code for getting the match.
Bryan Kyle
+6  A: 

the non-regex way:

String input = "FOO[BAR]", extracted;
extracted = input.substring(input.indexOf("["),input.indexOf("]"));

alternatively, for slightly better performance/memory usage (thanks Hosam):

String input = "FOO[BAR]", extracted;
extracted = input.substring(input.indexOf('['),input.lastIndexOf(']'));
zaczap
I would use `lastIndexOf(']')` instead, which would handle nested brackets. Additionally, I believe using the `indexOf(char)` would be faster than `indexOf(String)`.
Hosam Aly
agreed, i'll add in an edit
zaczap
You're welcome. Your note about performance is also very relevant, since `lastIndexOf` will certainly be faster to find the closing bracket.
Hosam Aly
+1  A: 

I think your regular expression would look like:

/FOO\[(.+)\]/

Assuming that FOO going to be constant.

So, to put this in Java:

Pattern p = Pattern.compile("FOO\\[(.+)\\]");
Matcher m = p.matcher(inputLine);
lacqui
A: 

assuming that no other closing square bracket is allowed within, /FOO\[([^\]]*)\]/

Manu
A: 

I'd define that I want a maximum number of non-] characters between [ and ]. These need to be escaped with backslashes (and in Java, these need to be escaped again), and the definition of non-] is a character class, thus inside [ and ] (i.e. [^\\]]). The result:

FOO\\[([^\\]]+)\\]
Fabian Steeg
+1  A: 

If you simply need to get whatever is between [], the you can use \[([^\]]*)\] like this:

Pattern regex = Pattern.compile("\\[([^\\]]*)\\]");
Matcher m = regex.matcher(str);
if (m.find()) {
    result = m.group();
}

If you need it to be of the form identifier + [ + content + ] then you can limit extracting the content only when the identifier is a alphanumerical:

[a-zA-Z][a-z-A-Z0-9_]*\s*\[([^\]]*)\]

This will validate things like Foo [Bar], or myDevice_123["input"] for instance.

Main issue

The main problem is when you want to extract the content of something like this:

FOO[BAR[CAT[123]]+DOG[FOO]]

The Regex won't work and will return BAR[CAT[123 and FOO.
If we change the Regex to \[(.*)\] then we're OK but then, if you're trying to extract the content from more complex things like:

FOO[BAR[CAT[123]]+DOG[FOO]] = myOtherFoo[BAR[5]]

None of the Regexes will work.

The most accurate Regex to extract the proper content in all cases would be a lot more complex as it would need to balance [] pairs and give you they content.

A simpler solution

If your problems is getting complex and the content of the [] arbitrary, you could instead balance the pairs of [] and extract the string using plain old code rathe than a Regex:

int i;
int brackets = 0;
string c;
result = "";
for (i = input.indexOf("["); i < str.length; i++) {
    c = str.substring(i, i + 1);
    if (c == '[') {
        brackets++;
    } else if (c == ']') {
        brackets--;
        if (brackets <= 0) 
            break;
    }
    result = result + c;
}

This is more pseudo-code than real code, I'm not a Java coder so I don't know if the syntax is correct, but it should be easy enough to improve upon.
What count is that this code should work and allow you to extract the content of the [], however complex it is.

Renaud Bompuis
A: 
String input = "FOO[BAR]";
String result = input.substring(input.indexOf("[")+1,input.lastIndexOf("]"));

This will return the value between first '[' and last ']'

Foo[Bar] => Bar

Foo[Bar[test]] => Bar[test]

Note: You should add error checking if the input string is not well formed.