ansaurus

Question

Read large amount of data from file in Java

Answer 1

+1 A:

How much memory do you have in the computer? You could be running into GC issues.

The best thing to do is to process the data one line at a time if possible. Don't load it into an array. Load what you need, process, write it out, and continue.

This will reduce your memory footprint and still use the same amount of File IO

Pyrolistical 2010-04-22 18:37:01

It looks like his second line is one looong line that contains a million numbers..

SB 2010-04-22 18:39:14

If my calculations are correct 1 mln of `int` costs me only 7 MB of memory - that's not so much. I just need to load that data from file to memory - I'll need that for some calculations that requires whole data to be loaded.

Crozin 2010-04-22 18:45:07

Answer 2

+1 A:

StreamTokenizer may be faster, as suggested here.

trashgod 2010-04-22 18:39:47

In fact StreamTokenizer seems to be the fastest solution so far (please check my question update). But it still needs about 1400 ms to read necessary data.

Crozin 2010-04-22 18:59:58

thanks TG, StreamTokenizer is very nice.

KevinDTimm 2010-04-22 19:07:34

Excellent. See also @Kevin Brock's informative answer: http://stackoverflow.com/questions/2693223/read-large-amount-of-data-from-file-in-java/2694507#2694507

trashgod 2010-04-23 03:00:39

Answer 3

+1 A:

It it's possible to reformat the input so that each integer is on a separate line (instead of one long line with one million integers), you should be seeing much improved performance using Integer.parseInt(BufferedReader.readLine()) due to smarter buffering by line and not having to split the long string into a separate array of Strings.

Edit: I tested this and managed to read the output produced by seq 1 1000000 into an array of int well under half a second, but of course this depends on the machine.

Arkku 2010-04-22 18:47:02

Unfortunately I cannot change file format. It has to be two integers separated by a single space in the first line and 1 mln of integers in the second line (also separated by a single space).

Crozin 2010-04-22 19:04:42

Answer 4

A:

Defrag your drive. Close all other applcations. Use hdparm to optimze the drive. Try Java's NIO package.

Dave Jarvis 2010-04-22 18:54:10

Answer 5

A:

I would extend FilterReader and parse the string as it is read in the read() method. Have a getNextNumber method return the numbers. Code left as an exercise for the reader.

Skip Head 2010-04-22 19:00:21

Answer 6

+1 A:

You can reduce the time for the StreamTokenizer result by using a BufferedReader:

Reader r = null;
try {
    r = new BufferedReader(new FileReader(file));
    final StreamTokenizer st = new StreamTokenizer(r);
    ...
} finally {
    if (r != null)
        r.close();
}

Also, don't forget to close your files, as I've shown here.

You can also shave some more time off by using a custom tokenizer just for your purposes:

public class CustomTokenizer {

    private final Reader r;

    public CustomTokenizer(final Reader r) {
        this.r = r;
    }

    public int nextInt() throws IOException {
        int i = r.read();
        if (i == -1)
            throw new EOFException();

        char c = (char) i;

        // Skip any whitespace
        while (c == ' ' || c == '\n' || c == '\r') {
            i = r.read();
            if (i == -1)
                throw new EOFException();
            c = (char) i;
        }

        int result = (c - '0');
        while ((i = r.read()) >= 0) {
            c = (char) i;
            if (c == ' ' || c == '\n' || c == '\r')
                break;
            result = result * 10 + (c - '0');
        }

        return result;
    }

}

Remember to use a BufferedReader for this. This custom tokenizer assumes the input data is always completely valid and contains only spaces, new lines, and digits.

If you read these results a lot and those results do not change much, you should probably save the array and keep track of the last file modified time. Then, if the file has not changed just use the cached copy of the array and this will speed up the results significantly. For example:

public class ArrayRetriever {

    private File inputFile;
    private long lastModified;
    private int[] lastResult;

    public ArrayRetriever(File file) {
        this.inputFile = file;
    }

    public int[] getResult() {
        if (lastResult != null && inputFile.lastModified() == lastModified)
            return lastResult;

        lastModified = inputFile.lastModified();

        // do logic to actually read the file here

        lastResult = array; // the array variable from your examples
        return lastResult;
    }

}

Kevin Brock 2010-04-22 21:07:12

Thanks for the answer - I'll check it tomorrow - I hope that this is what I am looking for.

Crozin 2010-04-22 21:32:30

+1 It might be worth specifying the buffer size when constructing the `BufferedReader`, too.

trashgod 2010-04-23 02:57:06

Answer 7

+1 A:

Thanks for every answer but I've already found a method that meets my criteria:

BufferedInputStream bis = new BufferedInputStream(new FileInputStream("./path"));
int n = readInt(bis);
int t = readInt(bis);
int array[] = new int[n];
for (int i = 0; i < n; i++) {
    array[i] = readInt(bis);
}

private static int readInt(InputStream in) throws IOException {
    int ret = 0;
    boolean dig = false;

    for (int c = 0; (c = in.read()) != -1; ) {
        if (c >= '0' && c <= '9') {
            dig = true;
            ret = ret * 10 + c - '0';
        } else if (dig) break;
    }

    return ret;
}

It requires only about 300 ms to read 1 mln of integers!

Crozin 2010-04-23 13:12:41

ansaurus

tags:

views:

answers:

Read large amount of data from file in Java

My first attempt was `java.util.Scanner`:

Then I tried `java.io.BufferedReader`:

According to trashgod answer:

related questions

ansaurus

tags:

views:

answers:

Read large amount of data from file in Java

My first attempt was java.util.Scanner:

Then I tried java.io.BufferedReader:

According to trashgod answer:

related questions

My first attempt was `java.util.Scanner`:

Then I tried `java.io.BufferedReader`: