views:

432

answers:

7

Here's the method. I want to know if I am violating any best practices here or if I am doing something wrong as far as the language is concerned.

private List<String> breakStringInChunks(String text, int chunkSize) {
        List<String> chunks = new ArrayList<String>();
        String temporary = "";
        int numberOfChunks = text.length() / chunkSize;
        int beginIndex = 0;
        int endIndex = 0;

        // Add one iteration if numberOfChunks*chunkSize is less than the length of text.
        if ((numberOfChunks * chunkSize) < text.length()) {
            numberOfChunks++;
        }

        // Cut strings and add in the list.
        for (int i = 0; i < numberOfChunks; i++) {
            endIndex+=chunkSize;
            if ((i + 1) == numberOfChunks) {
                temporary = text.substring(beginIndex);
            }
            else {
                temporary = text.substring(beginIndex, endIndex);
            }
            beginIndex=endIndex;
            chunks.add(temporary);
        }

        return chunks;
    }
+1  A: 

It's a bit verbose, and there is no need to declare the temporary string at the start of your method, which could make garbage collection a bit slower. The following would be briefer:

private List<String> breakStringInChunks(String text, int chunkSize) {
    int nChunks = (int)Math.ceil(((double)text.length())/chunkSize));
    List<String> chunks = new ArrayList<String>(nChunks);
    // Cut strings and add in the list.
    for (int i = 0; i < text.length(); i+=chunkSize) {
        int endIndex=i+chunksize;
        if (endIndex >= text.length()) {
            chunks.add(text.substring(i));
        } else {
            chunks.add(text.substring(i, endIndex));
        }
    }
    return chunks;
}

One good thing about your method and the text above is that because you always call substring() on the original String, Java will only reference the original character array, so it will save you some memory allocations.

I think the } else { is a more common coding standard for Java.

Nick Fortescue
If there will be many chunks, it makes sense to allocate the ArrayList with an initialCapacity we know will suffice,e.g., final List<String> chunks = new ArrayList<String>(1 + (text.length() / chunkSize));
Hank Gay
agreed, I'll edit. Strictly speaking I'd guess it should be a ceil
Nick Fortescue
I think that should be i += chunkSize rather than i++ in the for loop, otherwise you'll have one chunk for each character in the string. I agree that the beginIndex variable isn't necessary though.
Mike Houston
And correspondingly, the test should be i < text.length()
Mike Houston
The last chunk is not right if length isn't divisible by the chunkSize.
Joset
+1  A: 

Unless I'm misunderstanding your intent, it seems like a massive overkill, and uses string creation quite a lot of times, making the algorithm quite inefficient in java, since strings are immutable.

Try this:

public List<String> breakStringsInChunks(String text,int chunkSize) {
    if (chunkSize<=1) {
        throw new IllegalArgumentException("Chunk size must be positive");
    }
    if (text==null || text.isEmpty()) {
        return Collections.emptyList();
    }

    List<String> chunks= new LinkedList<String>();

    int index=0;
    int len = text.length();

    //guaranteed to succeed at least once since 0 length strings we're taken care of
    do {
        chunks.add(text.substring(index, Math.min(index + chunkSize, len)));
        index+=chunkSize;
    } while (index<len);

    return chunks;
}
This is nice too. Why did you choose LinkedList over ArrayList?
Joset
Surely it /is/ fairly efficient /because/ strings are immutable? String.substring() can return a new string which points to the original data, without needing to copy it. Also, your answer makes the same number of calls to substring.
Mike Houston
LinkedList avoids a large memory allocation up front (though that might slow down reads later due to poor locality of reference) by allocating memory for the nodes as they are created/added, and avoids unpredictable slowdowns on adds to the List (because sometimes ArrayList needs to grow its backing array which involves copying everything that's already there to the new, larger location). In *general* ArrayList is better if you know how much space to allocate or if you're reasonably sure it won't need to resize very often.
Hank Gay
Since 90%+ of what I do with list is use them as default collection types to iterate over, I don't use get(i) anyway. I tend to use ArrayLst when the task is fairly localized and I need good performance, since then I dump the for..each pattern and iterate directly on the indices.
Even just iterating over the results is likely to be faster in an ArrayList (the memory is contiguous, whereas a LinkedList introduces the *possibility* of fragmentation). I personally prefer LinkedList in situations where I don't know how large the list will wind up being, but in this instance I do know, so I'd favor ArrayList.
Hank Gay
To be clear, I'd know how big the list will be in my example; in your example it's not explicitly known, and in those instances, I usually fall back on LinkedList as well.
Hank Gay
+3  A: 

Briefer still, and avoids potential resizing of the resulting list.

private static List<String> breakStringInChunks(final String text, final int chunkSize) {
    final int numChunks = 0 == (text.length() % chunkSize) ? text.length() / chunkSize : 1 + (text.length() / chunkSize);
    final List<String> chunks = new ArrayList<String>(numChunks);
    for (int startIndex = 0; startIndex < text.length(); startIndex += chunkSize) {
        final int endIndex = Math.min(text.length(), startIndex + chunkSize);
        chunks.add(text.substring(startIndex, endIndex));
    }
    return chunks;
}
Hank Gay
Nick, this is a nice solution.
Joset
Actually, I'm the commenter on Nick's original answer. I went off and modified his solution before I realized that he'd already seen my comment.
Hank Gay
Cool Hank. Thanks.
Joset
I think if text.length() is directly divisible by chunksize() this allocates an array that is 1 element too long, and puts an empty String ("") as the last chunk
Nick Fortescue
Right you are, Nick. Updated accordingly.
Hank Gay
Cleaner to loop on startIndex directly?for (int startIndex = 0; startIndex < text.length(); startIndex += chunkSize)
John Pirie
A: 
public static final List<String> chunk(final String text, final int chunkSize) {
 // Figure out how many chunks we are going to make.
 final int textLength = text.length();
 final int numberOfChunks =
  textLength % chunkSize == 0
  ? textLength / chunkSize
  : textLength / chunkSize + 1;

 // Create an array list of just the right size.
 final ArrayList<String> chunks = new ArrayList<String>(numberOfChunks);

 // Do all the chunking but the last one - here we know that all chunks
 // are exactly chunkSize long.
 for (int i = 0; i < numberOfChunks - 1; i++) {
  chunks.add(text.substring(i * chunkSize, (i + 1) * chunkSize));
 }

 // Add final chunk, which may be shorter than chunkSize, so we use textLength
 // as the end index.
 chunks.add(text.substring((numberOfChunks - 1) * chunkSize, textLength));

 return chunks;
}
Zarkonnen
A: 

Here is my solution. I tried to implement this to be very efficient:

public static List<String> breakStringInChunks(String text, int chunkSize) {
 if (chunkSize < 2) {
  throw new IllegalArgumentException("Chunk size must be > 1");
 }
 if (null == text || text.isEmpty()) {
  return Collections.emptyList();
 }

 List<String> chunks = new ArrayList<String>(1 + (text.length() / chunkSize));

 int length = text.length() - (text.length() % chunkSize);

 for (int i = 0; i < length;) {
  chunks.add(text.substring(i, i += chunkSize));
 }
 if (length < text.length())
  chunks.add(text.substring(length));

 return chunks;
}
bruno conde
A: 

How about something like this?

private List<String> breakStringInChunks(String text, int chunkSize)
{
    List<String> chunks = new ArrayList<String>();
    while (text.length() > 0)
    {
        if (chunkSize > text.length())
        {
            chunkSize = text.length();
        }
        chunks.add(text.substring(0, chunkSize));
        text = text.substring(chunkSize);
    }
    return chunks;
}
Harry Lime
That would lock in an infinite loop adding the first chunk over and over again - the String.length() doesn't change. I think you're thinking of a stream interface, which I suspect would be quite slow in comparison due to actually performing copies of the original data.
Mike Houston
The String length *does* change. The last line of the loop essentially removes the already processed characters from the "text" variable.
Harry Lime
The problem here is, the arguments are being modified.
Joset
NO! The arguments are only being modified within the method. The values will not change for the caller of this method! "String" is immutable and "int" is passed by value. Thanks for your concern though :)
Harry Lime
This is a clear and simple algorithm, but the creation of the new Strings being assigned to 'text' can be very costly in time and memory. If the Strings are large, the chunk size is small, and/or the method is called a lot, I'd opt for a substring() approach.
Jim Ferrans
A: 

Here's mine. Not much different from some of the other answers, but test-driven, fwiw.

public class ChunkTest extends TestCase {
    public void testEmpty() throws Exception {
     assertEquals(0, breakStringInChunks("", 1).size());
    }

    public void testOneChunk() throws Exception {
     String s = "abc";
     List<String> chunks = breakStringInChunks(s, s.length());
     assertEquals(s, chunks.get(0));
     assertEquals(1, chunks.size());
    }

    public void testPartialChunk() throws Exception {
     String s = "abc";
     List<String> chunks = breakStringInChunks(s, s.length() + 1);
     assertEquals(s, chunks.get(0));
     assertEquals(1, chunks.size());
    }

    public void testTwoChunks() throws Exception {
     String s = "abc";
     List<String> chunks = breakStringInChunks(s, 2);
     assertEquals("ab", chunks.get(0));
     assertEquals("c", chunks.get(1));
     assertEquals(2, chunks.size());
    }

    public void testTwoEvenChunks() throws Exception {
     String s = "abcd";
     List<String> chunks = breakStringInChunks(s, 2);
     assertEquals("ab", chunks.get(0));
     assertEquals("cd", chunks.get(1));
    }

    private List<String> breakStringInChunks(String text, int chunkSize) {
     if (text.isEmpty())
      return Collections.emptyList();
     int n = (text.length() + chunkSize - 1) / chunkSize;
     List<String> chunks = new ArrayList<String>(n);
     for (int i = 0; i < n; ++i)
      chunks.add(text.substring(i * chunkSize, Math.min((i + 1) * chunkSize, text.length())));
     return chunks;
    }
}
Carl Manaster