tags:

views:

13996

answers:

11

I've been using this idiom for some time now. And it seems to be the most wide spread at least in the sites I've visited.

Does anyone have a better/different way to read a file into a string in Java.

Thanks

 private String readFile( String file ) throws IOException {
    BufferedReader reader = new BufferedReader( new FileReader (file));
    String line  = null;
    StringBuilder stringBuilder = new StringBuilder();
    String ls = System.getProperty("line.separator");
    while( ( line = reader.readLine() ) != null ) {
        stringBuilder.append( line );
        stringBuilder.append( ls );
    }
    return stringBuilder.toString();
 }
A: 

Better? I don't know. Different? Sure. Instead of reading line by line, you can read char by char, with an InputStream instead of a Reader.

luiscubal
That would read byte by byte. Streams are for binary data, readers are for text data.
Jon Skeet
+1  A: 

You could try:

FileInputStream input = new FileInputStream(filePath);

byte[] fileData = new byte[input.available()];

input.read(fileData);
input.close();

return new String(fileData, "UTF-8");

I'm not sure what problems might occur with the bytes and character sets etc, but it works for me.

Richie_W
I have always wondered, is it possible that input.available() return less bytes count than those in the file; I guess with big files.
OscarRyz
Probably. Like I said, it works for me. Perhaps there is a more suitable java.io.File method (something like getLength()?) which could provide a more reliable value.
Richie_W
Given the doco for read(byte[]) this is very risky code. What if the file is on a network share or SAN? Then you might get an available count of less than the file size. You need to use a readFull() method and File.length().
Software Monkey
Like I said, it works for me and my purposes. It is concise and I've yet to encounter a problem with it. That said, there are probably a hundred better ways to do it (just, this method has less code).
Richie_W
Actually this is very similar the way the library posted by Willi aus Rohr. Don't see why the downvote
OscarRyz
Thanks! Neither did I. :)
Richie_W
Problems with this code: 1) Stream is left open when there's an exception. 2) Use of available() to guess at file size (and assume it's constant). 3) Assumption that a single call to read() will read everything. 4) Use of platform-default character encoding.
Jon Skeet
In short, please don't do this :)
Jon Skeet
I agree with Jon on all points. No offense intended, but this is like a checklist of what NOT to do.
Alan Moore
Cry me a river, I'll do it this way all the time just to spite you all. Joking aside, I've NEVER had a problem with it so I'm just going to wave my hands like I just don't care (because I don't)
Richie_W
+7  A: 

Commons IOUtils:

http://commons.apache.org/io/api-1.4/index.html?org/apache/commons/io/IOUtils.html

public static String readFileToString(File file)
                           throws IOException

Reads the contents of a file into a String using the default encoding for the VM. The file is always closed.

Parameters:
    file - the file to read, must not be null 
Returns:
    the file contents, never null 
Throws:
    IOException - in case of an I/O error
Since:
    Commons IO 1.3.1

Edit by Oscar Reyes

I've found the code used ( indirectly ) by that class:

IOUtils.java under Apache Licence 2.0

    public static long copyLarge(InputStream input, OutputStream output)
           throws IOException {
       byte[] buffer = new byte[DEFAULT_BUFFER_SIZE];
       long count = 0;
       int n = 0;
       while (-1 != (n = input.read(buffer))) {
           output.write(buffer, 0, n);
           count += n;
       }
       return count;
   }

Very similar to the one use by Ritche_W

Willi aus Rohr
I don't find that method in the URL you provide.
OscarRyz
It's in the class org.apache.commons.io.FileUtils
ckarmann
I'm using FileUtils too, but I'm wondering what is better betwwen using FileUtils or the accepted nio answer?
Guillaume
+2  A: 

Java attempts to be extremely general and flexible in all it does. As a result, something which is relatively simple in a scripting language (your code would be replaced with "open(file).read()" in python) is a lot more complicated. There doesn't seem to be any shorter way of doing it, except using an external library (like Willi aus Rohr mentioned). Your options:

  • Use an external library.
  • Copy this code into all your projects.
  • Create your own mini-library which contains functions you use often.

Your best bet is probably the 2nd one, as it has the least dependencies.

Claudiu
Yeap. It makes the "high" level language take a different meaning. Java is high level compared with C but low compared with Python or Ruby
OscarRyz
+29  A: 

In general, you should specify the character encoding to use when converting the bytes of a file to text. There are some special cases when you just want to use the platform default, but they are rare, and you should be able to explicitly justify why this is okay.

Anyway, here's an efficient way to it:

private static String readFile(String path) throws IOException {
  FileInputStream stream = new FileInputStream(new File(path));
  try {
    FileChannel fc = stream.getChannel();
    MappedByteBuffer bb = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());
    /* Instead of using default, pass in a decoder. */
    return Charset.defaultCharset().decode(bb).toString();
  }
  finally {
    stream.close();
  }
}
erickson
Quite interesting. What does the channel means. I know that is to avoid block the "thread?" They can be bidirectional ( or that's what I understood ) But, in more simple word, what are they? Can you elaborate further?
OscarRyz
In many ways, a ReadableByteChannel is like an InputStream, and WritableByteChannel is like an OutputStream. Many concrete Channels implement both of these interfaces, so one object is bi-directional. Some channels (SocketChannel) support non-blocking IO, but this isn't true of all channels.
erickson
Do you know the time- and memory-efficiencies of this idiom, or can at least estimate? It's a beautiful idiom!
Beau Martínez
Technically speaking, it's O(n) in time and space. Qualitatively, due the immutability requirement of Strings, it's pretty hard on memory; temporarily there are two copies of the char data in memory, plus the room for the encoded bytes. Assuming some single-byte encoding, it will (temporarily) require 5 bytes of memory for each character in the file. Since the question asks specifically for a String, that's what I show, but if you can work with the CharBuffer returned by "decode", the memory requirement is much less. Time-wise, I don't think you'll find anything faster in the core Java libs.
erickson
Possible typo? NIO has a Charset (not CharSet) class called java.nio.charset.Charset. Is this what CharSet should have been?
Jonathan Wright
Note : after exercising a bit that code, I found out that you can't reliably delete the file right after reading it with this method, which may be a non issue in some case, but not mine.May it be in relation with this issue : http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4715154 ?I finally went with the proposition of Jon Skeet which doesn't suffer from this bug.Anyways, I just wanted to give the info, for other people, just in case...
Sébastien Nussbaumer
+2  A: 

If you're looking for an alternative that doesn't involve a 3rd party library (e.g. commons IO), you can use the Scanner class

private String readFile(String pathname) throws IOException {
    StringBuilder stringBuilder = new StringBuilder();
    Scanner scanner = new Scanner(new File(pathname));

    try {
        while(scanner.hasNextLine()) {        
            stringBuilder.append(scanner.nextLine() + "\n");
        }
    } finally {
        scanner.close();
    }
    return stringBuilder.toString();
}
Don
I think this is the best way. Check out http://java.sun.com/docs/books/tutorial/essential/io/scanning.html
Tarski
Doesn't this drops line terminators? Let me try ...
OscarRyz
I've updated the code to add the line terminators
Don
The Scanner constructor that accepts a String doesn't treat the string as the name of a file to read, but as the text to be scanned. I make that mistake all the time. :-/
Alan Moore
@Alan, good catch. I edited Don's answer slightly to fix that (I hope).
Jonik
missing a semicolon
Brandon Thomson
+5  A: 

That code will normalize line breaks, which may or may not be what you really want to do.

Here's an alternative which doesn't do that, and which is (IMO) simpler to understand than the NIO code (although it still uses java.nio.charset.Charset):

public static String readFile(String file, String csName)
            throws IOException {
    Charset cs = Charset.forName(csName);
    return readFile(file, cs);
}

public static String readFile(String file, Charset cs)
            throws IOException {
    // No real need to close the BufferedReader/InputStreamReader
    // as they're only wrapping the stream
    FileInputStream stream = new FileInputStream(file);
    try {
        Reader reader = new BufferedReader(new InputStreamReader(stream, cs));
        StringBuilder builder = new StringBuilder();
        char[] buffer = new char[8192];
        int read;
        while ((read = reader.read(buffer, 0, buffer.length)) > 0) {
            builder.append(buffer, 0, read);
        }
        return builder.toString();
    } finally {
        // Potential issue here: if this throws an IOException,
        // it will mask any others. Normally I'd use a utility
        // method which would log exceptions and swallow them
        stream.close();
    }        
}
Jon Skeet
Which one is "that" code?
OscarRyz
The code in the question.
Jon Skeet
+2  A: 

There is a variation on the same theme that uses a for loop, instead of a while loop, to limit the scope of the line variable. Whether it's "better" is a matter of personal taste.

for(String line = reader.readLine(); line != null; line = reader.readLine()) {
    stringBuilder.append(line);
    stringBuilder.append(ls);
}
Dan Dyer
This will change the newlines to the default newline choise. This may be desirable, or unintended.
Peter Lawrey
+1  A: 
public static String slurp (final File file)
throws IOException {
    StringBuilder result = new StringBuilder();

    try {
        BufferedReader reader = new BufferedReader(new FileReader(file));

        char[] buf = new char[1024];

        int r = 0;

        while ((r = reader.read(buf)) != -1) {
            result.append(buf, 0, r);
        }
    }
    finally {
        reader.close();
    }

    return result.toString();
}
Scott S. McCoy
I think this has the inconvenience os using the platform default encoding. +1 anyway :)
OscarRyz
+4  A: 

Guava has a method similar to the one from Commons IOUtils that Willi aus Rohr mentioned:

import com.google.common.base.Charsets;
import com.google.common.io.Files;

// ...

String text = Files.toString(new File(path), Charsets.UTF_8);

EDIT by Oscar Reyes

This is the ( simplified ) underlying code on the cited library:

InputStream in = new FileInputStream(file);
byte[] b  = new byte[file. length()];
int len = b.length;
int total = 0;

while (total < len) {
  int result = in.read(b, total, len - total);
  if (result == -1) {
    break;
  }
  total += result;
}

return new String( b , Charsets.UTF_8 );
finnw
A: 

To read a File as binary and convert at the end

public static String readFileAsString(String filePath) throws IOException {
    DataInputStream dis = new DataInputStream(new FileInputStream(filePath));
    try {
        long len = new File(filePath).length();
        if (len > Integer.MAX_VALUE) throw new IOException("File "+filePath+" too large, was "+len+" bytes.");
        byte[] bytes = new byte[(int) len];
        dis.readFully(bytes);
        return new String(bytes, "UTF-8");
    } finally {
        dis.close();
    }
}
Peter Lawrey