views:

713

answers:

5

How should I parse the following String using Java to extract the file path?

? stands for any number of random charaters

_ stands for any number of white spaces (no new line)

?[LoadFile]_file_=_"foo/bar/baz.xml"?

Example:

10:52:21.212 [LoadFile] file = "foo/bar/baz.xml"

should extract foo/bar/baz.xml

+1  A: 

java.util.regex is your friend.

starblue
That's only slightly helpful
jjnguy
Some people, when confronted with a Stack Overflow question, answer "java.util.regex is your friend" Now the person asking the question has two problems. (Liberally paraphrased from http://blogs.msdn.com/oldnewthing/archive/2006/03/22/558007.aspx) -- If you're going to suggest using regular expressions, provide an example.
Grant Wagner
@Grant Wagner I don't see anything wrong with pointing people in the right direction, even if I don't have time to work out a full solution. If you are not happy with the answer then give a better one instead of wasting your time complaining.
starblue
+12  A: 
String regex = ".*\\[LoadFile\\]\\s+file\\s+=\\s+\"([^\"].+)\".*";

Matcher m = Pattern.compile(regex).matcher(inputString);
if (!m.find()) 
    System.out.println("No match found.");
else
    String result = m.group(1);

The String in result should be your file path. (assuming I didn't make any mistakes)

You should take a look at the Pattern class for some regular expression help. They can be a very powerful string manipulation tool.

jjnguy
".*\\[LoadFile\\]\\s*file\\s*=\\s* \"([^\\\"].*)\".*" would be better to match any number of white spaces
Jean
".*\"([^\\\"].*)\".*" would be even better as we don't care about the prefix format at all (known by default) and it does not contains any quote.
gizmo
FYI, Jean's regex would match no white space as well, ex. [LoadFile]file="foo/bar/baz.xml". So if you want at least one white space character, use + instead of * as jinguy originally specified.
Peter Di Cecco
@weenaak: I would have read "any number of white spaces" as meaning any number INCLUDING ZERO.
Stephen C
Two corrections: the method that creates the Matcher object is `.matcher(inputString)` (not capitalized), and you have to *apply* the regex by calling `.matches()` or `.find()` on the Matcher.
Alan Moore
Thanks for the input
jjnguy
+1  A: 

You could make the regular expression a bit shorter than jinguy's. Basically just the RHS without the "'s.

    String regex = ".* = \"(.*)\"";
willcodejavaforfood
I think jinguy assumed that the path should only be extracted if the line has [LoadFile] in it ...
Jean
When I write a regex, I try to be as specific as possible.
jjnguy
+2  A: 

While regular expressions are nice and all, you can also use class java.util.StringTokenizer to do the job. The advantage is a more human-friendly code.

StringTokenizer tokenizer = new StringTokenizer(inputString, "\"");
tokenizer.nextElement();
String path = tokenizer.nextElement();

And there you go.

Yuval
Another advantage of StringTokenizer is that it will probably be more efficient ... provided that it is capable of doing the job at hand.
Stephen C
It's just that if there happens to be a number of " characters in the first set of random characters the tokenizer will happily return that as the next element. However, the example suggests the first part of the input line is just a timestamp.A regex is harder to write, but much more capable of handling wildly different input.
Jeroen van Bergen
I agree that a StringTokenizer is not an ideal solution to every parsing problem, but in this case it really seems to me that using a regex is a little like hunting flies with a cannon...
Yuval