ansaurus

Question

How to list a 2 million files directory in java without having a out of memory exception

Answer 1

+1 A:

At fist you could try to increase the memory of your JVM with passing -Xmx1024m e.g.

InsertNickHere 2010-06-29 08:41:41

I have a feeling this won't fix the problem, and the JVM will just run out of memory *slightly* later.

Piskvor 2010-06-29 09:11:42

@Piskvor If so, I guess there is no way to solve this issue. Whatever you use to parse the os file system will need a certain amound of bytes - with 2million files this can fastly become too much.

InsertNickHere 2010-06-29 09:36:18

@InsertNickHere: you don't need to keep all your data in RAM at the same time.

Piskvor 2010-06-29 10:42:53

Answer 2

+3 A:

Why do you store 2 million files in the same directory anyway? I can imagine it slows down access terribly on the OS level already.

I would definitely want to have them divided into subdirectories (e.g. by date/time of creation) already before processing. But if it is not possible for some reason, could it be done during processing? E.g. move 1000 files queued for Process1 into Directory1, another 1000 files for Process2 into Directory2 etc. Then each process/thread sees only the (limited number of) files portioned for it.

Péter Török 2010-06-29 08:45:28

Diving them its a problem in it's own. I'm thinking on that as well at OS bash functions.It is not possible to do it while processing because the exception comes when trying to list the directory programmatically.

Fgblanch 2010-06-29 08:50:13

Answer 3

A:

Please post the full stack trace of the OOM exception to identify where the bottleneck is, as well as a short, complete Java program showing the behaviour you see.

It is most likely because you collect all of the two million entries in memory, and they don't fit. Can you increase heap space?

Thorbjørn Ravn Andersen 2010-06-29 08:48:59

Answer 4

+8 A:

aioobe 2010-06-29 08:51:00

Java 7 is not an option right now.Currently i'm trying the filter option. Thankfully the files have a hierarchy written in the filename. So this option could work.

Fgblanch 2010-06-29 09:23:30

aioobe effectively it didn't work. I've found the filenames are "guessables" :) so i'll do it the other way around:Generate the filenames and then go to the folder and try to reach them. Thanks a lot for your help

Fgblanch 2010-06-29 09:58:28

Answer 5

+1 A:

Use File.list() instead of File.listFiles() - the String objects it returns consume less memory than the File objects, and (more importantly, depending on the location of the directory) they don't contain the full path name.

Then, construct File objects as needed when processing the result.

However, this will not work for arbitrarily large directories either. It's an overall better idea to organize your files in a hierarchy of directories so that no single directory has more than a few thousand entries.

Michael Borgwardt 2010-06-29 08:53:07

Answer 6

A:

If file names follow certain rules, you can use File.list(filter) instead of File.listFiles to get manageable portions of file listing.

atzz 2010-06-29 09:06:19

Answer 7

A:

Try this, it works to me, but I hadn't so many documents...

File dir = new File("directory");
String[] children = dir.list();
if (children == null) {
   //Either dir does not exist or is not a  directory
  System.out.print("Directory doesn't  exist\n");
}
else {
  for (int i=0; i<children.length; i++) {   
    // Get filename of file or directory   
    String filename = children[i];  
}

mujer esponja 2010-06-29 09:20:56

Answer 8

+1 A:

This is untested and an absolute hack, but you might want to try somthing like this anyway:

Process process = System.getRuntime().exec(new String[]{"ls", "/path"});
BufferedReader reader = new BufferedReader(new InputStreamReader(process.getInputStream()));
String line;
while (null != (line = reader.readLine()) {
}

Jörn Horstmann 2010-06-29 09:59:55

ansaurus

tags:

views:

answers:

How to list a 2 million files directory in java without having a out of memory exception

related questions