If I have a string like this:
FOO[BAR]
I need a generic way to get the "BAR" string out of the string so that no matter what string is between the square brackets it would be able to get the string.
e.g.
FOO[DOG] = DOG
FOO[CAT] = CAT
If I have a string like this:
FOO[BAR]
I need a generic way to get the "BAR" string out of the string so that no matter what string is between the square brackets it would be able to get the string.
e.g.
FOO[DOG] = DOG
FOO[CAT] = CAT
You should be able to use non-greedy quantifiers, specifically *?. You're going to probably want the following:
Pattern MY_PATTERN = Pattern.compile("\\[(.*?)\\]");
This will give you a pattern that will match your string and put the text within the square brackets in the first group. Have a look at the Pattern API Documentation for more information.
To extract the string, you could use something like the following:
Matcher m = MY_PATTERN.matcher("FOO[BAR]");
while (m.find()) {
String s = m.group(1);
// s now contains "BAR"
}
the non-regex way:
String input = "FOO[BAR]", extracted;
extracted = input.substring(input.indexOf("["),input.indexOf("]"));
alternatively, for slightly better performance/memory usage (thanks Hosam):
String input = "FOO[BAR]", extracted;
extracted = input.substring(input.indexOf('['),input.lastIndexOf(']'));
I think your regular expression would look like:
/FOO\[(.+)\]/
Assuming that FOO going to be constant.
So, to put this in Java:
Pattern p = Pattern.compile("FOO\\[(.+)\\]");
Matcher m = p.matcher(inputLine);
assuming that no other closing square bracket is allowed within, /FOO\[([^\]]*)\]/
I'd define that I want a maximum number of non-] characters between [
and ]
. These need to be escaped with backslashes (and in Java, these need to be escaped again), and the definition of non-] is a character class, thus inside [
and ]
(i.e. [^\\]]
). The result:
FOO\\[([^\\]]+)\\]
If you simply need to get whatever is between []
, the you can use \[([^\]]*)\]
like this:
Pattern regex = Pattern.compile("\\[([^\\]]*)\\]");
Matcher m = regex.matcher(str);
if (m.find()) {
result = m.group();
}
If you need it to be of the form identifier + [ + content + ]
then you can limit extracting the content only when the identifier is a alphanumerical:
[a-zA-Z][a-z-A-Z0-9_]*\s*\[([^\]]*)\]
This will validate things like Foo [Bar]
, or myDevice_123["input"]
for instance.
Main issue
The main problem is when you want to extract the content of something like this:
FOO[BAR[CAT[123]]+DOG[FOO]]
The Regex won't work and will return BAR[CAT[123
and FOO
.
If we change the Regex to \[(.*)\]
then we're OK but then, if you're trying to extract the content from more complex things like:
FOO[BAR[CAT[123]]+DOG[FOO]] = myOtherFoo[BAR[5]]
None of the Regexes will work.
The most accurate Regex to extract the proper content in all cases would be a lot more complex as it would need to balance []
pairs and give you they content.
A simpler solution
If your problems is getting complex and the content of the []
arbitrary, you could instead balance the pairs of []
and extract the string using plain old code rathe than a Regex:
int i;
int brackets = 0;
string c;
result = "";
for (i = input.indexOf("["); i < str.length; i++) {
c = str.substring(i, i + 1);
if (c == '[') {
brackets++;
} else if (c == ']') {
brackets--;
if (brackets <= 0)
break;
}
result = result + c;
}
This is more pseudo-code than real code, I'm not a Java coder so I don't know if the syntax is correct, but it should be easy enough to improve upon.
What count is that this code should work and allow you to extract the content of the []
, however complex it is.
String input = "FOO[BAR]";
String result = input.substring(input.indexOf("[")+1,input.lastIndexOf("]"));
This will return the value between first '[' and last ']'
Foo[Bar] => Bar
Foo[Bar[test]] => Bar[test]
Note: You should add error checking if the input string is not well formed.