views:

53

answers:

2

Is there a default/easy way in Java for split strings, but taking care of quotation marks or other symbols?

For example, given this text:

There's "a man" that live next door 'in my neighborhood', "and he gets me down..."

Obtain:

There's
a man
that
live
next
door
in my neighborhood
and he gets me down
+1  A: 

Doubtful based on your logic, you have differentiation between an apostrophe and single quotes, i.e. There's and in my neighborhood

You'd have to develop some kind of pairing logic if you wanted what you have above. I'm thinking regular expressions. Or some kind of two part parse.

Jason McCreary
Yes. I tried to put a simple edge case, where single quotes have two "meanings". I thought the logic wasn't obviuos, and this is the reason why I asked for.
Sinuhe
+2  A: 

Something like this works for your input:

    String text = "There's \"a man\" that live next door "
        + "'in my neighborhood', \"and he gets me down...\"";

    Scanner sc = new Scanner(text);
    Pattern pattern = Pattern.compile(
        "\"[^\"]*\"" +
        "|'[^']*'" +
        "|[A-Za-z']+"
    );
    String token;
    while ((token = sc.findInLine(pattern)) != null) {
        System.out.println("[" + token + "]");
    }

The above prints (as seen on ideone.com):

[There's]
["a man"]
[that]
[live]
[next]
[door]
['in my neighborhood']
["and he gets me down..."]

It uses Scanner.findInLine, where the regex pattern is one of:

"[^"]*"      # double quoted token
'[^']*'      # single quoted token
[A-Za-z']+   # everything else

No doubt this doesn't work 100% always; cases where quotes can be nested etc will be tricky.

References

polygenelubricants
Good solution and good page!
Sinuhe