While googling, I see that using java.io.File.length()
can be slow.
FileChannel
has a size()
method that is available as well.
Is there an efficient way in java to get the file size?
While googling, I see that using java.io.File.length()
can be slow.
FileChannel
has a size()
method that is available as well.
Is there an efficient way in java to get the file size?
Measure it first - is it really slow for you in your particular circumstances? No point going off on a wild goose chase if it's not causing a problem!
This is kinda an off the wall solution. If you can capture the output of a ls or dir command from a system call and then parse it out from there. That seems like it would be fairly fast. Write a function in a static library Called QuickFileSize and then share it with us. Again, just throwing the idea out there.
Well, I tried to messure it up with the code below:
For runs = 1 and iterations = 1 the URL method is fastest most times followed by channel. I runned this with some pause fresh about 10 times. So for one time access, using the URL is the fastest way I can think of:
LENGTH sum: 10626, per Iteration: 10626.0
CHANNEL sum: 5535, per Iteration: 5535.0
URL sum: 660, per Iteration: 660.0
For runs = 5 and iterations = 50 the picture draws different.
LENGTH sum: 39496, per Iteration: 157.984
CHANNEL sum: 74261, per Iteration: 297.044
URL sum: 95534, per Iteration: 382.136
File must be caching the calls to the filesystem, while channels and URL have some overhead.
Hope this helped out,
Greetz GHad
Code:
import java.io.*;
import java.net.*;
import java.util.*;
public enum FileSizeBench {
LENGTH {
@Override
public long getResult() throws Exception {
File me = new File(FileSizeBench.class.getResource(
"FileSizeBench.class").getFile());
return me.length();
}
},
CHANNEL {
@Override
public long getResult() throws Exception {
FileInputStream fis = null;
try {
File me = new File(FileSizeBench.class.getResource(
"FileSizeBench.class").getFile());
fis = new FileInputStream(me);
return fis.getChannel().size();
} finally {
fis.close();
}
}
},
URL {
@Override
public long getResult() throws Exception {
InputStream stream = null;
try {
URL url = FileSizeBench.class
.getResource("FileSizeBench.class");
stream = url.openStream();
return stream.available();
} finally {
stream.close();
}
}
};
public abstract long getResult() throws Exception;
public static void main(String[] args) throws Exception {
int runs = 5;
int iterations = 50;
EnumMap<FileSizeBench, Long> durations = new EnumMap<FileSizeBench, Long>(FileSizeBench.class);
for (int i = 0; i < runs; i++) {
for (FileSizeBench test : values()) {
if (!durations.containsKey(test)) {
durations.put(test, 0l);
}
long duration = testNow(test, iterations);
durations.put(test, durations.get(test) + duration);
// System.out.println(test + " took: " + duration + ", per iteration: " + ((double)duration / (double)iterations));
}
}
for (Map.Entry<FileSizeBench, Long> entry : durations.entrySet()) {
System.out.println();
System.out.println(entry.getKey() + " sum: " + entry.getValue() + ", per Iteration: " + ((double)entry.getValue() / (double)(runs * iterations)));
}
}
private static long testNow(FileSizeBench test, int iterations)
throws Exception {
long result = -1;
long before = System.nanoTime();
for (int i = 0; i < iterations; i++) {
if (result == -1) {
result = test.getResult();
//System.out.println(result);
} else if ((result = test.getResult()) != result) {
throw new Exception("variance detected!");
}
}
return (System.nanoTime() - before) / 1000;
}
}
When I modify your code to use a file accessed by an absolute path instead of a resource, I get a different result (for 1 run, 1 iteration, and a 100,000 byte file -- times for a 10 byte file are identical to 100,000 bytes)
LENGTH sum: 33, per Iteration: 33.0
CHANNEL sum: 3626, per Iteration: 3626.0
URL sum: 294, per Iteration: 294.0
The benchmark given by GHad measures lots of other stuff (such as reflection, instantiating objects, etc.) besides getting the length. If we try to get rid of these things then for one call I get the following times in microseconds:
file sum: 19.0, per Iteration: 19.0 raf sum: 16.0, per Iteration: 16.0 channel sum: 273.0, per Iteration: 273.0
For 100 runs and 10000 iterations I get:
file sum: 1767629.0, per Iteration: 1.7676290000000001 raf sum: 881284.0, per Iteration: 0.8812840000000001 channel sum: 414286.0, per Iteration: 0.414286
I did run the following modified code giving as an argument the name of a 100MB file.
import java.io.*;
import java.nio.channels.*;
import java.net.*;
import java.util.*;
public class FileSizeBench {
private static File file;
private static FileChannel channel;
private static RandomAccessFile raf;
public static void main(String[] args) throws Exception {
int runs = 1;
int iterations = 1;
file = new File(args[0]);
channel = new FileInputStream(args[0]).getChannel();
raf = new RandomAccessFile(args[0], "r");
HashMap<String, Double> times = new HashMap<String, Double>();
times.put("file", 0.0);
times.put("channel", 0.0);
times.put("raf", 0.0);
long start;
for (int i = 0; i < runs; ++i) {
long l = file.length();
start = System.nanoTime();
for (int j = 0; j < iterations; ++j)
if (l != file.length()) throw new Exception();
times.put("file", times.get("file") + System.nanoTime() - start);
start = System.nanoTime();
for (int j = 0; j < iterations; ++j)
if (l != channel.size()) throw new Exception();
times.put("channel", times.get("channel") + System.nanoTime() - start);
start = System.nanoTime();
for (int j = 0; j < iterations; ++j)
if (l != raf.length()) throw new Exception();
times.put("raf", times.get("raf") + System.nanoTime() - start);
}
for (Map.Entry<String, Double> entry : times.entrySet()) {
System.out.println(
entry.getKey() + " sum: " + 1e-3 * entry.getValue() +
", per Iteration: " + (1e-3 * entry.getValue() / runs / iterations));
}
}
}
In response to rgrig's benchmark, the time taken to open/close the FileChannel & RandomAccessFile instances also needs to be taken into account, as these classes will open a stream for reading the file.
After modifying the benchmark, I got these results for 1 iterations on a 85MB file:
file totalTime: 48000 (48 us)
raf totalTime: 261000 (261 us)
channel totalTime: 7020000 (7 ms)
For 10000 iterations on same file:
file totalTime: 80074000 (80 ms)
raf totalTime: 295417000 (295 ms)
channel totalTime: 368239000 (368 ms)
If all you need is the file size, file.length() is the fastest way to do it. If you plan to use the file for other purposes like reading/writing, then RAF seems to be a better bet. Just don't forget to close the file connection :-)
import java.io.File;
import java.io.FileInputStream;
import java.io.RandomAccessFile;
import java.nio.channels.FileChannel;
import java.util.HashMap;
import java.util.Map;
public class FileSizeBench
{
public static void main(String[] args) throws Exception
{
int iterations = 1;
String fileEntry = args[0];
Map<String, Long> times = new HashMap<String, Long>();
times.put("file", 0L);
times.put("channel", 0L);
times.put("raf", 0L);
long fileSize;
long start;
long end;
File f1;
FileChannel channel;
RandomAccessFile raf;
for (int i = 0; i < iterations; i++)
{
// file.length()
start = System.nanoTime();
f1 = new File(fileEntry);
fileSize = f1.length();
end = System.nanoTime();
times.put("file", times.get("file") + end - start);
// channel.size()
start = System.nanoTime();
channel = new FileInputStream(fileEntry).getChannel();
fileSize = channel.size();
channel.close();
end = System.nanoTime();
times.put("channel", times.get("channel") + end - start);
// raf.length()
start = System.nanoTime();
raf = new RandomAccessFile(fileEntry, "r");
fileSize = raf.length();
raf.close();
end = System.nanoTime();
times.put("raf", times.get("raf") + end - start);
}
for (Map.Entry<String, Long> entry : times.entrySet()) {
System.out.println(entry.getKey() + " totalTime: " + entry.getValue() + " (" + getTime(entry.getValue()) + ")");
}
}
public static String getTime(Long timeTaken)
{
if (timeTaken < 1000) {
return timeTaken + " ns";
} else if (timeTaken < (1000*1000)) {
return timeTaken/1000 + " us";
} else {
return timeTaken/(1000*1000) + " ms";
}
}
}
Actually, the 'ls' option is something I've considered alot. For a single file, it would be pointless. But if you have a directory with several hundred files and you need each of their sizes and last modified times, then 'ls' is more efficient. Make that 20 directories with several hundred files in varying sub-directories and a recursive 'ls' is amazingly efficient and any of the options mentioned above are unusable.