tags:

views:

7518

answers:

7

While googling, I see that using java.io.File.length() can be slow. FileChannel has a size() method that is available as well.

Is there an efficient way in java to get the file size?

+5  A: 

Measure it first - is it really slow for you in your particular circumstances? No point going off on a wild goose chase if it's not causing a problem!

JeffFoster
that's true.i was hoping though also there would be some documentation describing why is it implemented in java.io.File as well as FileChannel
joshjdevl
A: 

This is kinda an off the wall solution. If you can capture the output of a ls or dir command from a system call and then parse it out from there. That seems like it would be fairly fast. Write a function in a static library Called QuickFileSize and then share it with us. Again, just throwing the idea out there.

Yes, because spooling up a new process, reading in the results, then parsing those results is surely quicker than the system call to stat the file. ;) No, please don't do this.
jsight
+6  A: 

Well, I tried to messure it up with the code below:

For runs = 1 and iterations = 1 the URL method is fastest most times followed by channel. I runned this with some pause fresh about 10 times. So for one time access, using the URL is the fastest way I can think of:

LENGTH sum: 10626, per Iteration: 10626.0

CHANNEL sum: 5535, per Iteration: 5535.0

URL sum: 660, per Iteration: 660.0

For runs = 5 and iterations = 50 the picture draws different.

LENGTH sum: 39496, per Iteration: 157.984

CHANNEL sum: 74261, per Iteration: 297.044

URL sum: 95534, per Iteration: 382.136

File must be caching the calls to the filesystem, while channels and URL have some overhead.

Hope this helped out,

Greetz GHad

Code:

import java.io.*;
import java.net.*;
import java.util.*;

public enum FileSizeBench {

    LENGTH {
     @Override
     public long getResult() throws Exception {
      File me = new File(FileSizeBench.class.getResource(
        "FileSizeBench.class").getFile());
      return me.length();
     }
    },
    CHANNEL {
     @Override
     public long getResult() throws Exception {
      FileInputStream fis = null;
      try {
       File me = new File(FileSizeBench.class.getResource(
         "FileSizeBench.class").getFile());
       fis = new FileInputStream(me);
       return fis.getChannel().size();
      } finally {
       fis.close();
      }
     }
    },
    URL {
     @Override
     public long getResult() throws Exception {
      InputStream stream = null;
      try {
       URL url = FileSizeBench.class
         .getResource("FileSizeBench.class");
       stream = url.openStream();
       return stream.available();
      } finally {
       stream.close();
      }
     }
    };

    public abstract long getResult() throws Exception;

    public static void main(String[] args) throws Exception {
     int runs = 5;
     int iterations = 50;

     EnumMap<FileSizeBench, Long> durations = new EnumMap<FileSizeBench, Long>(FileSizeBench.class);

     for (int i = 0; i < runs; i++) {
      for (FileSizeBench test : values()) {
       if (!durations.containsKey(test)) {
        durations.put(test, 0l);
       }
       long duration = testNow(test, iterations);
       durations.put(test, durations.get(test) + duration);
       // System.out.println(test + " took: " + duration + ", per iteration: " + ((double)duration / (double)iterations));
      }
     }

     for (Map.Entry<FileSizeBench, Long> entry : durations.entrySet()) {
      System.out.println();
      System.out.println(entry.getKey() + " sum: " + entry.getValue() + ", per Iteration: " + ((double)entry.getValue() / (double)(runs * iterations)));
     }

    }

    private static long testNow(FileSizeBench test, int iterations)
      throws Exception {
     long result = -1;
     long before = System.nanoTime();
     for (int i = 0; i < iterations; i++) {
      if (result == -1) {
       result = test.getResult();
       //System.out.println(result);
      } else if ((result = test.getResult()) != result) {
        throw new Exception("variance detected!");
       }
     }
     return (System.nanoTime() - before) / 1000;
    }

}
GHad
interesting, here are my results (ubuntu 8.04)LENGTH sum: 97442, per Iteration: 97442.0CHANNEL sum: 15789, per Iteration: 15789.0URL sum: 522, per Iteration: 522.0LENGTH sum: 127074, per Iteration: 508.296CHANNEL sum: 51582, per Iteration: 206.328URL sum: 61334, per Iteration: 245.336
joshjdevl
Seems like the URL way is the best one to go for single access whether its XP or linux. Greetz GHad
GHad
`stream.available()` does not return the file length. It returns the amount of bytes which are available for read without blocking other streams. It is not necessarily the same amount of bytes as file length. To get the real length from a stream, you really need to **read** it (and count the read bytes meanwhile).
BalusC
Good point and you are right, but I never experienced any differnce for Files, as I expect that all bytes are readable, when I want to read a file this way. Well at least if the size is less than Integer.MAX_VALUE
GHad
+2  A: 

When I modify your code to use a file accessed by an absolute path instead of a resource, I get a different result (for 1 run, 1 iteration, and a 100,000 byte file -- times for a 10 byte file are identical to 100,000 bytes)

LENGTH sum: 33, per Iteration: 33.0

CHANNEL sum: 3626, per Iteration: 3626.0

URL sum: 294, per Iteration: 294.0

tdavies
+2  A: 

The benchmark given by GHad measures lots of other stuff (such as reflection, instantiating objects, etc.) besides getting the length. If we try to get rid of these things then for one call I get the following times in microseconds:

file sum: 19.0, per Iteration: 19.0
raf sum: 16.0, per Iteration: 16.0
channel sum: 273.0, per Iteration: 273.0

For 100 runs and 10000 iterations I get:

file sum: 1767629.0, per Iteration: 1.7676290000000001
raf sum: 881284.0, per Iteration: 0.8812840000000001
channel sum: 414286.0, per Iteration: 0.414286

I did run the following modified code giving as an argument the name of a 100MB file.

import java.io.*;
import java.nio.channels.*;
import java.net.*;
import java.util.*;

public class FileSizeBench {

  private static File file;
  private static FileChannel channel;
  private static RandomAccessFile raf;

  public static void main(String[] args) throws Exception {
    int runs = 1;
    int iterations = 1;

    file = new File(args[0]);
    channel = new FileInputStream(args[0]).getChannel();
    raf = new RandomAccessFile(args[0], "r");

    HashMap<String, Double> times = new HashMap<String, Double>();
    times.put("file", 0.0);
    times.put("channel", 0.0);
    times.put("raf", 0.0);

    long start;
    for (int i = 0; i < runs; ++i) {
      long l = file.length();

      start = System.nanoTime();
      for (int j = 0; j < iterations; ++j)
        if (l != file.length()) throw new Exception();
      times.put("file", times.get("file") + System.nanoTime() - start);

      start = System.nanoTime();
      for (int j = 0; j < iterations; ++j)
        if (l != channel.size()) throw new Exception();
      times.put("channel", times.get("channel") + System.nanoTime() - start);

      start = System.nanoTime();
      for (int j = 0; j < iterations; ++j)
        if (l != raf.length()) throw new Exception();
      times.put("raf", times.get("raf") + System.nanoTime() - start);
    }
    for (Map.Entry<String, Double> entry : times.entrySet()) {
        System.out.println(
            entry.getKey() + " sum: " + 1e-3 * entry.getValue() +
            ", per Iteration: " + (1e-3 * entry.getValue() / runs / iterations));
    }
  }
}
actually, while you are correct in saying it measures other aspects, I should be more clearer in my question. I'm looking to get the file size of multiple files, and I want the quickest possible way. so i really do need to take into account object creation and overhead, since that is a real scenario
joshjdevl
About 90% of the time is spent in that getResource thing. I doubt you need to use reflection to get the name of a file that contains some Java bytecode.
+1  A: 

In response to rgrig's benchmark, the time taken to open/close the FileChannel & RandomAccessFile instances also needs to be taken into account, as these classes will open a stream for reading the file.

After modifying the benchmark, I got these results for 1 iterations on a 85MB file:

file totalTime: 48000 (48 us)
raf totalTime: 261000 (261 us)
channel totalTime: 7020000 (7 ms)

For 10000 iterations on same file:

file totalTime: 80074000 (80 ms)
raf totalTime: 295417000 (295 ms)
channel totalTime: 368239000 (368 ms)

If all you need is the file size, file.length() is the fastest way to do it. If you plan to use the file for other purposes like reading/writing, then RAF seems to be a better bet. Just don't forget to close the file connection :-)

import java.io.File;
import java.io.FileInputStream;
import java.io.RandomAccessFile;
import java.nio.channels.FileChannel;
import java.util.HashMap;
import java.util.Map;

public class FileSizeBench
{    
    public static void main(String[] args) throws Exception
    {
        int iterations = 1;
        String fileEntry = args[0];

        Map<String, Long> times = new HashMap<String, Long>();
        times.put("file", 0L);
        times.put("channel", 0L);
        times.put("raf", 0L);

        long fileSize;
        long start;
        long end;
        File f1;
        FileChannel channel;
        RandomAccessFile raf;

        for (int i = 0; i < iterations; i++)
        {
            // file.length()
            start = System.nanoTime();
            f1 = new File(fileEntry);
            fileSize = f1.length();
            end = System.nanoTime();
            times.put("file", times.get("file") + end - start);

            // channel.size()
            start = System.nanoTime();
            channel = new FileInputStream(fileEntry).getChannel();
            fileSize = channel.size();
            channel.close();
            end = System.nanoTime();
            times.put("channel", times.get("channel") + end - start);

            // raf.length()
            start = System.nanoTime();
            raf = new RandomAccessFile(fileEntry, "r");
            fileSize = raf.length();
            raf.close();
            end = System.nanoTime();
            times.put("raf", times.get("raf") + end - start);
        }

        for (Map.Entry<String, Long> entry : times.entrySet()) {
            System.out.println(entry.getKey() + " totalTime: " + entry.getValue() + " (" + getTime(entry.getValue()) + ")");
        }
    }

    public static String getTime(Long timeTaken)
    {
        if (timeTaken < 1000) {
            return timeTaken + " ns";
        } else if (timeTaken < (1000*1000)) {
            return timeTaken/1000 + " us"; 
        } else {
            return timeTaken/(1000*1000) + " ms";
        } 
    }
}
kchellap
A: 

Actually, the 'ls' option is something I've considered alot. For a single file, it would be pointless. But if you have a directory with several hundred files and you need each of their sizes and last modified times, then 'ls' is more efficient. Make that 20 directories with several hundred files in varying sub-directories and a recursive 'ls' is amazingly efficient and any of the options mentioned above are unusable.

timo