views:

49

answers:

3

Is it better to use regex or Stringtokenizer to separate the author and title in this string:

William Faulkner - 'Light In August'

Is this the simplest regex that would work?

Pattern pattern = Pattern.compile("^\\s*([^-]+)-.*$");
Matcher matcher = pattern.matcher("William Faulkner - 'Light In August'");
String author = matcher.group(1).trim();
String bookTitle = matcher.group(2).trim();

Is that overkill or is there a simpler way to do this with a Stringtokenizer?

Basically I'm looking for the most transparent and maintainable solution since I don't have a good understanding of regex and got help with the one above.

+1  A: 

It depends on what the input looks like. Your regex, for example, would fail on author names that contain a hyphen.

Perhaps something like

Pattern.compile("^\\s*(.*?)\\s+-\\s+'(.*)'\\s*$")

might fit a little better.

Tim Pietzcker
+1  A: 

How about using String.split?

String s = "William Faulkner - 'Light In August'";
String[] parts = s.split(" - ", 2);
String author = parts[0];
String title = parts[1];

ideone

One thing to watch out for is that some authors' names and book titles contain hyphens so splitting just on a hyphen won't always work in general.

Mark Byers
+1  A: 

How much control do you have over the input? Can you guarantee that author and title will always be separated by " - " (a space, a dash, and a space)? Do you know for sure that the author won't contain " - "? And so on.

If the input is quite rigid, then you can simply use String#split(), which should make it very clear what you're doing. Don't use a StringTokenizer (source):

StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead.

Mark Byers' answer shows you how to use split().

However, if you have to worry about more variation in the input (e.g., can the whitespace amount of whitespace around the dash be variable or not exist at all?) then using a regex will be terse and concise. The tradeoff then is code readability and clarity of intent.

Matt Ball