tags:

views:

243

answers:

4

When I list files of a directory that has 300,000 files with Java, out of memory occurs.

String[] fileNames = file.list();

What I want is a way that can list all files of a directory incrementally no matter how many files in that specific directory and won't have "out of memory" problem with the default 64M heap limit.

I have Google a while, and cannot find such a way in pure Java.
Please help me!!

Note, JNI is a possible solution, but I hate JNI.

+5  A: 

I know you said "with the default 64M heap limit", but let's look at the facts - you want to hold a (potentially) large number of items in memory, using the mechanisms made available to you by Java. So, unless there is some dire reason that you can't, I would say increasing the heap is the way to go.

Here is a link to the same discussion at JavaRanch: http://www.coderanch.com/t/381939/Java-General/java/iterate-over-files-directory

Edit, in response to comment: the reason I said he wants to hold a large number of items in memory is because this is the only mechanism Java provides for listing a directory without using the native interface or platform-specific mechanisms (and the OP said he wanted "pure Java").

danben
The call that James is making returns an array. The question boils down to whether you can somehow get the equivalent of an iterator for the names in the directory, without allocating the full array at once. It's a reasonable question; I don't know the answer off the top of my head.
Dan Breslau
You cannot with the core Java API.
danben
Yes, what I want is exactly a FileIterator
James
A: 

Having 300 000 files in a dir is not a good idea - AFAIK filesystems are not good at having that many sub-nodes in a single node. Interesting question, though.

EDIT: THE FOLLOWING DOES NOT HELP, see comments.

I think you could use a FileFilter, reject all files, and process them in the filter.

        new File("c:/").listFiles( new FileFilter() {
            @Override   public boolean accept(File pathname) {
                processFile();
                return false;
            }
        });
Ondra Žižka
XFS supports large numbers of files in a single directory. Also, this answer is pretty far off topic.
danben
Just checked the source for java.io.File. It will call list prior to filtering anyway so the original problem persists.
Gennadiy
Yes, I wish people would at least verify answers that "look right" before modding up. No offense intended to the poster.
danben
All right, who could know that JDK programmers did it that silly way? Leaving the anwer here to warn the others.
Ondra Žižka
I keep forgetting about the JDKs FileSystem abstraction. The actual list method that returns an array of String file names is native so there is little hope in being able to retrieve a partial list of files in a dir.
Gennadiy
FileFilter does not work here, since the listFiles(FileFilter filter) is implemented based on listFiles()
James
@danben - actually, while most file systems "support" huge numbers of files in a single directory, many of them do it in a way that results in expensive file lookup. Putting lots of files in one directory is NOT a good idea.
Stephen C
Sorry that my wording was slightly off. I will rephrase - XFS **efficiently** supports large numbers of files in a single directory. What a useless statement my initial comment would have been if I was simply saying that in a given file system, it is possible to put a large number of files into a directory.
danben
+1  A: 

You are a bit out of luck here. In the least there will need to be created 300k strings. With an average length of 8-10 char and 2 bytes per char thats 6Mb in the minimum. Add object pointer overhead per string (8 bytes) and you run into your memory limit.

If you absolutely must have that many files in a single dir, which i would not recommend as your file system will have problems, your best bet is to run a native process (not JNI) via Runtime.exec. Keep in mind that you will tie yourself down to the OS (ls vs dir). You will be able to get a list of files as one large string and will be responsible for post processing it into what you want.

Hope this helps.

Gennadiy
+2  A: 

In JDK7 you may find that the [file walker][1] will work (I haven't tested it). For now, I think you are doing something horrible platform specific, such as running /bin/ls and streaming in the result.

[1]: http://download.java.net/jdk7/docs/api/java/nio/file/Files.html#walkFileTree(java.nio.file.Path, java.nio.file.FileVisitor)

Tom Hawtin - tackline
Thanks very much! The method java.nio.file.Path#newDirectoryStream is great!
James
There's some long overdue stuff in "More NIO Features", whenever JDK7 sees the light of day...
Tom Hawtin - tackline