views:

485

answers:

6
+1  Q: 

String Tokenizer

Can anybody help me understand how this string tokenizer works by adding some comments into the code? I would very much appreciate any help thanks!

public String[] split(String toSplit, char delim, boolean ignoreEmpty) {

    StringBuffer buffer = new StringBuffer();
    Stack stringStack = new Stack();

    for (int i = 0; i < toSplit.length(); i++) {
        if (toSplit.charAt(i) != delim) {
            buffer.append((char) toSplit.charAt(i));
        } else {
            if (buffer.toString().trim().length() == 0 && ignoreEmpty) {
            } else {
                stringStack.addElement(buffer.toString());
            }
            buffer = new StringBuffer();
        }
    }

    if (buffer.length() !=0) {
        stringStack.addElement(buffer.toString());
    }

    String[] split = new String[stringStack.size()];
    for (int i = 0; i < split.length; i++) {
        split[split.length - 1 - i] = (String) stringStack.pop();
    }

    stringStack = null;
    buffer = null;

//        System.out.println("There are " + split.length + " Words");
    return split;
}
+3  A: 

Not the best written method in the world! But comments below. Overall, what it does is to split a string into "words", using the character delim to delimit them. If ignoreEmpty is true, then empty words are not counted (i.e. two consecutive delimiters act as one).

public String[] split(String toSplit, char delim, boolean ignoreEmpty) {

    // Buffer to construct words
    StringBuffer buffer = new StringBuffer();
    // Stack to store complete words
    Stack stringStack = new Stack();

    // Go through input string one character at a time
    for (int i = 0; i < toSplit.length(); i++) {
        // If next character is not the delimiter,
        // add it to the buffer
        if (toSplit.charAt(i) != delim) {
            buffer.append((char) toSplit.charAt(i));
        // Else it is the delimiter, so process the
        // complete word
        } else {
            // If the word is empty (0 characters) we
            // have the choice of ignoring it
            if (buffer.toString().trim().length() == 0 && ignoreEmpty) {
            // Otherwise, we push it onto the stack
            } else {
                stringStack.addElement(buffer.toString());
            }
            // Clear the buffer ready for the next word
            buffer = new StringBuffer();
        }
    }

    // If there are remaining characters in the buffer,
    // then a word rather than the delimiter ends the
    // string, so we push that onto the stack as well
    if (buffer.length() !=0) {
        stringStack.addElement(buffer.toString());
    }

    // We set up a new array to store the contents of
    // the stack
    String[] split = new String[stringStack.size()];

    // Then we pop each element from the stack into an
    // indexed position in the array, starting at the
    // end as the last word was last on the stack
    for (int i = 0; i < split.length; i++) {
        split[split.length - 1 - i] = (String) stringStack.pop();
    }

    stringStack = null;
    buffer = null;

    // Then return the array
//        System.out.println("There are " + split.length + " Words");
    return split;
}

You could write a far more efficient one using the string.split method, translating the delimiter into a suitable regular expression (ending with + if ignoreEmpty is true).

David M
Ok thank you very much, that is very helpful.
Jaron787
A: 

This code loops through a string, splits it in words by looking for a delimiter and returns a string array with all found words.

In C# you could to write same code as:

toSplit.Split(
    new char[]{ delim }, !ignoreEmpty ? 
        StringSplitOptions.None:
        StringSplitOptions.RemoveEmptyEntries);
Rubens Farias
If I'm not mistaken, this isn't quite equivalent because the OP's method filters out whitespace.
McDowell
A: 
public String[] split(String toSplit, char delim, boolean ignoreEmpty) { 

    StringBuffer buffer = new StringBuffer(); //Make a StringBuffer
    Stack stringStack = new Stack();          //Make a set of elements, a stack

    for (int i = 0; i < toSplit.length(); i++) { //For how many characters are in the string, run this loop
        if (toSplit.charAt(i) != delim) { //If the current character (while in the loop, is NOT equal to the specified delimiter (passed into the function), add it to a buffer
            buffer.append((char) toSplit.charAt(i));
        } else { //Otherwise...
            if (buffer.toString().trim().length() == 0 && ignoreEmpty) { //If it's whitespace do nothing (only if ignoreempty is true
            } else { //otherwise...
                stringStack.addElement(buffer.toString()); //Add the previously found characters to the output stack
            }
            buffer = new StringBuffer(); //Make another buffer.
        }
    }

    if (buffer.length() !=0) { //If nothing was added
        stringStack.addElement(buffer.toString()); //Add the whole String
    }

    String[] split = new String[stringStack.size()]; //Split
    for (int i = 0; i < split.length; i++) {
        split[split.length - 1 - i] = (String) stringStack.pop();
    }

    stringStack = null;
    buffer = null;

//        System.out.println("There are " + split.length + " Words");
    return split;
}
CodeJoust
A: 

This piece of code splits a string into substrings based on a given delimiter. For example, the string:

String str = "foo,bar,foobar";
String[] strArray = split(str, ',' true);

would get returned as this array of strings:

strArray ==> [ "foo", "bar", "foobar" ];


public String[] split(String toSplit, char delim, boolean ignoreEmpty) {

    StringBuffer buffer = new StringBuffer();
    Stack stringStack = new Stack();

    // Loop through each char in the string (so 'f', then 'o', then 'o' etc).
    for (int i = 0; i < toSplit.length(); i++) {
        if (toSplit.charAt(i) != delim) {
            // If the char at the current position in the string does not equal 
            // the delimiter, add this char to the string buffer (so we're 
            // building up another string that consists of letters between two 
            // of the 'delim' characters).
            buffer.append((char) toSplit.charAt(i));
        } else {
            // If the string is just whitespace or has length 0 and we are 
            // removing empty strings, do not include this substring
            if (buffer.toString().trim().length() == 0 && ignoreEmpty) {
            } else {
                // It's not empty, add this substring to a stack of substrings.
                stringStack.addElement(buffer.toString());
            }
            // Reset the buffer for the next substring.
            buffer = new StringBuffer();
        }
    }

    if (buffer.length() !=0) {
        // Make sure to add the last buffer/substring to the stack!
        stringStack.addElement(buffer.toString());
    }

    // Make an array of string the size of the stack (the number of substrings found)
    String[] split = new String[stringStack.size()];
    for (int i = 0; i < split.length; i++) {
        // Pop off each substring we found and add it into the array we are returning.
        // Fill up the array backwards, as we are taking values off a stack.
        split[split.length - 1 - i] = (String) stringStack.pop();
    }

    // Unnecessary, but clears the variables
    stringStack = null;
    buffer = null;

//        System.out.println("There are " + split.length + " Words");
    return split;
}
Callum Rogers
+1  A: 
public String[] split(String toSplit, char delim, boolean ignoreEmpty) {

    // Holds each character efficiently while parsing the string
    // in a temporary buffer
    StringBuffer buffer = new StringBuffer();
    // Collection for holding the intermediate result
    Stack stringStack = new Stack();

    // for each character in the string to split
    for (int i = 0; i < toSplit.length(); i++) 
    {
        // if the character is NOT the delimeter
        if (toSplit.charAt(i) != delim) 
        {
            // add this character to the temporary buffer
            buffer.append((char) toSplit.charAt(i));
        } else { // we are at a delimeter!
            // if the buffer is empty and we are ignoring empty
            if (buffer.toString().trim().length() == 0 && ignoreEmpty) {
              // do nothing
            } else { // if the buffer is not empty or if ignoreEmpty is not true
                // add the buffer to the intermediate result collection and
                stringStack.addElement(buffer.toString());
            }
            // reset the buffer 
            buffer = new StringBuffer();
        }

    }
    // we might have extra characters left in the buffer from the last loop
    // if so, add it to the intermediate result
    // IMHO, this might contain a bug
    // what happens when the buffer contains a space at the end and 
    // ignoreEmpty is true?  Seems like it would still be added
    if (buffer.length() !=0) {
        stringStack.addElement(buffer.toString());
    }
    // we are going to convert the intermediate result to an array
    // we create a result array the size of the stack
    String[] split = new String[stringStack.size()];
    // and each item in the stack to the return array
    for (int i = 0; i < split.length; i++) {
        split[split.length - 1 - i] = (String) stringStack.pop();
    }

    // release our temp vars
    // (to let the GC collect at the earliest possible moment)
    stringStack = null;
    buffer = null;

    // and return it
    return split;
}

Is this directly from String.Split or is it something else? Because it seems to me there's a bug in the code (empty result added if left over at the end even when IgnoreEmpty is true)?

Will
A: 

Ok, before proceeding to the answer, I should point out that there are multiple issues with this code. Here goes:

/**
*
*/
public String[] split(   
    String toSplit       //string to split in tokens, delimited by delim
,   char delim           //character that delimits tokens
,   boolean ignoreEmpty  //if true, tokens consisting of only whitespace are ignored
) {

StringBuffer buffer = new StringBuffer();
Stack stringStack = new Stack();

for (int i = 0; i < toSplit.length(); i++) {     //examine each character
    if (toSplit.charAt(i) != delim) {            //no delimiter: this char is part of a token, so add it to the current (partial) token.
        buffer.append((char) toSplit.charAt(i)); 
    } else {
        if (buffer.toString().trim().length() == 0 && ignoreEmpty) {   //'token' consists only of whitespace, and ignoreEmpty was set: do nothing
        } else {
            stringStack.addElement(buffer.toString());  //found a token, so save it.
        }
        buffer = new StringBuffer();                    //reset the buffer so we can store the next token.
    }
}

if (buffer.length() !=0) {                              //save the last (partial) token (if it contains at least one character)
    stringStack.addElement(buffer.toString());
}

String[] split = new String[stringStack.size()];        //copy the stack of tokens to an array
for (int i = 0; i < split.length; i++) {
    split[split.length - 1 - i] = (String) stringStack.pop();
}

stringStack = null;                                     //uhm?...
buffer = null;

//        System.out.println("There are " + split.length + " Words");
return split;                                           //return the array of tokens.

}

Issues:

  1. There is a perfectly good buuilt-in string tokenizer, java.util.StringTokenizer
  2. The code allocates a new StringBuffer for each token! it should simply reset the length of the StringBuffer
  3. The nested ifs inside the loop can be written more efficiently, more readable at least
  4. The tokens are copied to an array for return. Any callers should simply be satisfied with being passed some structure that can be iterated. If you need an array, you can copy it outside of this function. This may save considereable memory and cpu resources

Probably all issues should be resolved by simply using the built-in java.util.StringTokenizerr

Roland Bouman