views:

58

answers:

3

How would I separate the below string into its parts. What I need to separate is each < Word > including the angle brackets from the rest of the string. So in the below case I would end up with several strings 1. "I have to break up with you because " 2. "< reason >" (without the spaces) 3. " . But Let's still " 4. "< disclaimer >" 5. " ."

I have to break up with you because <reason> . But let's still <disclaimer> .

below is what I currently have (its ugly...)

boolean complete = false;
    int begin = 0;
    int end = 0;
        while (complete == false) {
        if (s.charAt(end) == '<'){
            stack.add(new Terminal(s.substring(begin, end)));
            begin = end;
        } else if (s.charAt(end) == '>') {
            stack.add(new NonTerminal(s.substring(begin, end)));
            begin = end;
            end++;
        } else if (end == s.length()){
            if (isTerminal(getSubstring(s, begin, end))){
                stack.add(new Terminal(s.substring(begin, end)));
            } else {
                stack.add(new NonTerminal(s.substring(begin, end)));
            }
            complete = true;
        }
        end++;
+1  A: 

Have a look at using a StringTokenizer

Ben S
Thanks! That should work wonders :)
defn
A: 

Use regex.

for (String token : text.split("(?=<)|(?<=>)")) {
    boolean isNT = token.startsWith("<");
    System.out.format("%s |%s|%n", isNT ? "NT" : " T", token);
}
polygenelubricants
A: 

Actually using a BreakIterator would be a better way of doing this.

The BreakIterator class also provides static getCharacterInstance(), getWordInstance, and getLineInstance() methods. These methods return BreakIterator instances that allow you to parse at the character, word, and line level, respectively.

fuzzy lollipop