views:

293

answers:

7

I have some files in a directory tree which is being served over HTTP. Given some sub-directory A, in that directory tree I want to be able to download directory A and all containing subdirectories and files.

It seems likely that a simple/direct/atomic solution exists in the some dark corner of Java. Does anyone know how to do this?

A webcrawler will not solve my problem since files in sub-directories may link to directories that are not subdirectories.

==Update==

The directories and files must be hosted in static manner.

The server is statically hosting files in a directory tree, the client is running Java and attempting to copy some branch of the directory tree using HTTP.

VFS is the answer to this, unfortunately I answered the question myself and so can't choose it as the answer until two days from now. If someone would write up of my answer I would be happy to mark their write up as the answer.

==Further Update==

VFS is in fact not the answer. VFS will not list directories over HTTP, as stated here. There does seem to be a few people that are interested in that functionality.

A: 

If I am not terribly mistaken, HTTP does not tell you anything about the "structure" of the server side - if such a thing even exists.

Think about REST where the URI does not really tell you where to find a file on the server, but could merely trigger some action, retrieve data or the like.

So I do not think what you are trying to achieve can be done reliably, be it with Java or any other language. Or maybe I am getting you wrong here?

Daniel Schneller
Most webservers will tell you directory structure. They build a simple html page containing files, directories if no index.html exists for that directory.
e5
+6  A: 

My first suggestion would be to create a servlet/jsp which recursiveley reads the directory structure (using java.io.File), reads all files, puts them in one zip (java.util.zip), and sends it to the browers for download.

Bozho
The files are too large to zip for every request. We wish to host the files in a static manner.
e5
Then you can make a .jsp that represents the directory structure and offer the files for download one by one. again using java.io.File recurively
Bozho
You are still assuming that the server is running java, these files are hosted completely statically.
e5
well, the server IS running java, because you said that in your question. It can be done with any server-side technology.You can read ANY directory on the server from a java servlet. Just pass the path as a post/get parameter, or have / as default. then new File(path), and recurse.
Bozho
where in the question did I say the server is running java? The client is running java, but not the server.
e5
well, it was assumed. Anyway, in that case, you can use apache-commons HttpClient, and browse an Apache file-listing. (I guess the server is at least using Apache?)
Bozho
@Bozho apache-commons doesn't imply that it must be run against apache. For instance iis also supports this.
e5
hah, yes, a coincidence here. Any http server that provides a listing would do.
Bozho
+2  A: 

I don't know of an atomic solution, but the most straightforward one would be using a URLConnection to fetch the sub-directory (assuming the server lists the directory) and then parse the response, look for contents of that directory and use URLConnection again to fetch each of the files under it.

Based on these answers, now I am wondering if you meant the Java to be on the client side or server side!

Murali VP
+1, you are understanding my question correctly. Your answer is what I'm am attempting to avoid implementing, as I assume some libraries already exists to do this.
e5
thanks for clarifying, I doubt if one exists since it doesn't sound like a very common need, but of course I could be wrong
Murali VP
+2  A: 

So you want from the client side on retrieve a list of all files and directores for the particular URL of the server side as if it is a local disk file system folder? That's usually not possible when the server doesn't have directory indexing enabled. And even then, you still need to parse the HTML page which represents the directory index and parse all <a> elements representing the files and folders yourself. There's no normal java.io.File approach for this. That would have been a huge security hole. One would for example be able to download all source files from http://gmail.com. HTTP is not meant as a file transfer protocol. Use FTP. That's where it stands for.

BalusC
+1 for pointing out Indexing has to be enabled. We shut them off here with very few exceptions.
Andy Gherna
Why would allowing a java.io.File approach involving having a security hole?
e5
Many people use http for serving files since many companies block both incoming and outgoing ftp connections. One would think that I am not the first person to encounter this.
e5
@e5: if it was possible, you could request everything from the webcontent, including secured files and the files in WEB-INF and so on. At any way, the **best** way for this would be FTP, not HTTP.
BalusC
+1  A: 

For the first time in a while google beat stackoverflow, Apache commons VFS does exactly what I need.

Commons VFS provides a single API for accessing various different file systems. It presents a uniform view of the files from various different sources, such as the files on local disk, on an HTTP server, or inside a Zip archive.

http://commons.apache.org/vfs/

==Update==

As stated in the question VFS only pretends to solve this problem, since it doesn't allow the listing of http directories.

e5
Well, in my view, the sequence should be this - 1. try google for 10 minutes, 2. ask others for help.Vice-versa is a little selfish :)
Bozho
@Bozho I did google for a while, didn't find anything, then I remembered that apache-commons is always the answer. Googled my previous queries with the word apache-commons appended found VFS.
e5
+1  A: 

Assuming you have control over both the server and client, I would write a page (in your favorite technology of your choice; ASP, JSP, PHP, etc) that reads the server directory structure, and dynamically returns a page that consists of a bunch of links to each file to be downloaded.

Then client side you can trigger a download of each link.

What is the client side technology? is the thing doing the downloading an application of some sort, or a web browser? Does it have to have a client interface?


If this is some sort of in-house utility program, maybe you can just FTP instead? Having FTP access open on a server and downloading a directory would be easy...


Adding another possible answer:

If the server does not have directory listings turned on, then you basically have to make a modification server side. The easiest thing would be to just make a page that returns the dir structure to the client in a known format (see my 1st answer above).

If you control the server and have directory listings on, and you are always using the same server program (IIS, Tomcat, JBoss, etc) then you might be able to just make the client webcrawl the directory listings. For example, in a directory listing from IIS, you can tell which links are directories and which are files because it always puts a '/' at the end of a directory link, and shows 'dir' instead of a file size:

 Friday, October 16, 2009 03:55 PM        &lt;dir&gt; <A href="Unity/">Unity</A>
 Thursday, July 02, 2009 10:42 AM           95 <A href="Global.asax">Global.asax</A>

You can tell here that the 1st link is a directory, and the 2nd is an actual file.

So if you are using a consistent server app, just take a look at how the directory listing is returned. Maybe you'll get lucky.

rally25rs
Ftp would solve the problem, but unfortunately many corporate file walls block ftp.
e5
is there something wrong with my 1st suggestion? or are you just trying to avoid writing any code? Making an ASP or JSP or PHP page that returns the file system structure in a known format would probably take less time than it took to post this question and monitor the responses... What is the web server? is it ALWAYS the same? or are you trying to just connect to any random server that might be out there whatever it is? (are you in control of the server)
rally25rs
Certainly if one changes the parameters of the question then the question becomes very simple, but the question assumes that no code runs on the server. I control the server in the sense that I can upload and download, but I'm attempting to do everything on the client. I was hoping that this problem had been solved generally.
e5
+1  A: 

Talk about low-hanging fruit ;-) Thanks for the offer, e5!

Commons VFS provides a single API for accessing various different file systems. It presents a uniform view of the files from various different sources, such as the files on local disk, on an HTTP server, or inside a Zip archive.

http://commons.apache.org/vfs/

Benjamin Cox