tags:

views:

870

answers:

6

Right Now I have

;; buffer->string: BufferedReader -> String
(defn buffer->string [buffer]
 (loop [line  (.readLine buffer) sb (StringBuilder.)]
     (if(nil? line)
        (.toString sb)
        (recur (.readLine buffer) (.append sb line)))))

This is too slow.

Edit:

  • I have a BufferedReader

  • when i try to do (str BufferedReader) it gives me "java.io.BufferedReader@1ce784b"

  • the above loop is too slow and I run out of memory space.

A: 

buffer.ToString()? Or in your case, maybe (.toString buffer)?

Michael Schmitz
I get 1:5 Trial=> (str *yyy*) => "java.io.BufferedReader@c79809"
kunjaan
+2  A: 

I don't know Clojure, so I can't tell if you have some detail wrong in your code, but using StringBuffer and appending the input line by line is the correct way to do it (well, using a StringBuilder initialized to the expected final size if known would bring significant but not dramatic improvements).

If you run out of memory, then maybe the content of your BufferedReader is simply too large to fit into your memory and there is no way to have it as a single string - in that case, you'll either have to increase your heap size or find a way to process the data one small chunk at a time.

BTW, if you know the size of your input, a more efficient method would be to use a CharBuffer and fill it by using Reader.read() (you'll have to pay attention to the return method and use it in a loop).

Michael Borgwardt
A: 

in java you would do something like;

public String getStringFromBuffer(){
BufferedReader bRead = new BufferedReader();
String line = null;
StringBuffer theText = new StringBuffer();
while((line=bRead.readLine())!=null){
   theText.append(line+"\n);
}

return theText.toString();
}
Johnny Boy
A: 

I don't know clojure, just Java. Lets work from there.

Some points to consider:

  • If your target JVM version is >= 1.5 you can use StringBuilder instead of StringBuffer for a small performance improvement (no synchronization and you don't need it). Read about it here

    http://java.sun.com/j2se/1.5.0/docs/api/java/lang/StringBuilder.html

  • But your big performance cost is probably on the buffer expansion. When you instantiate a StringBuffer/StringBuilder without using the constructor with the capacity argument you get a small capacity.

    When starting with a small capacity (the internal buffer size) you have many expansions - every time you exceed that capacity, its internal buffer is reallocated to a new capacity, just large enough to hold the newly appended text, which means copying all previously held text to the new buffer.

    This is very slow when you are appending more text to an already very large string.

    If you have access to the size of the text you are reading (a file size would be an approximation) you can significantly reduce the amount of expansions.

  • I could also tell you to use read() method of the BufferedReader, the one with 3 arguments, this one:

    BufferedReader.read(char[], int, int)

    You could then use one of the String's class constructors that accept a char array to convert the char buffer into a String:

    String.String(char[], int, int)

    ...however, I suspect that the performance improvement will not be that big, especially when compared with the one of reducing how many StringBuilder expansions you'll have.

  • Whatever the approximation, you seem to have memory capacity problem:

    In the end you will need at least twice as much memory as the whole text occupies.

    Either if you use the StringBuilder/StringBuffer approach or the other one, in the end you will have to copy the text contents to the new String holding the result.

In the end you will probably need to work out of this box:

  • Are you sure you only have a BufferedReader as a start and a String as an end? You should provide the broader picture!

If this is the broadest you have, you will need at least a JVM instance configured with more heap since you will probably run out of memory with any of this solutions anyway.

Paulo Gaspar
Actually, StringBuffer and StringBuilder will *not* expand just enough to hold the new length - the underlying array will *double* in size (unless that is enough), which puts a tight bound on the number of expansions.
Michael Borgwardt
You are right and I stand corrected.I had that idea and then did misread the AbstractStringBuilder source code (when double checking) by missing the 1st line of expandCapacity().But for the current case that still means:1) An increased risk of memory overflow;2) Still a lot of time wasted on expansion.On 2) well... been there and done that and it was like that... a couple of CPU generations ago.
Paulo Gaspar
+5  A: 
(clojure.contrib.duck-streams/slurp* your-buffer) ; is what you want

Your code is slow because buffer isn't hinted.

cgrand
How do I hint it?
kunjaan
(defn buffer->string [#^java.io.BufferedReader buffer]see http://clojure.org/java_interop#toc35
cgrand
Make sure that clojure tells you of anything that's not hinted:Turn on *warn-on-reflection*.http://clojure.org/api#toc27
Leonel
A: 

use slurp to read (reasonably sized files) in
use spit to write them back out again.

Arthur Ulfeldt