views:

3688

answers:

4

I have a number of CSV files that I want to download from Yahoo finance each day. I want my application to read the file's creation date (on my computer, not the server). If the creation date is prior to today then the new file should be downloaded (as it will have new data). If not then the new file should not be downloaded, and the correlation calculator (which is essentially what my application is), should use the last downloaded file for the particular stock code.

I have done some googling and have found the Apache POI project.

Is this the best way to go, is there a better way, what would you recommend. Is JNI at all relevant here?

+4  A: 

I might be missing something but I can't see why you would need JNI or POI to download a file. If you are downloading the file with HTTP, you can use an HttpURLConnection with the "If-Modified-Since" request header.

Maurice Perry
Sorry my explanation is probably not clear - I want to check if the creation date i.e. the date I downloaded it, is prior to today then I want to redownload it (because it has stock prices, which will be updated each day). I am not really interested in the modification date of the file on the server
Ankur
Is the If-Modified-Since technique applicable in that situation
Ankur
I don't get it: unless you have invented a time machine, after you downloaded a file, the download date can only be prior to the current date.
Maurice Perry
This is also a very elegant, simple, solution, Ankur. Do have a closer look at this, indeed. +1
Peter Perháč
+3  A: 

Did you consider creating an FTP account for access to that particular folder and then using an FTP client like SmartFTP or FileZilla to synchronize your local folder with the remote one? Should be well easy to set up and also convenient to use... Also, you could simply create an FTP command script and execute that from your Java code, if absolutely necessary...

Or I'll try to point you into another direction: md5() or other message-digest algorithms could help you. you wouldn't have to rely on timestamps. Try to calculate md5() hash of the file you have and the file you are about to download. Then you know whether to download or not.

Peter Perháč
Yes taking the hash is an interesting idea. But to take the hash of the file which is stored on Yahoo's server won't I have to download it first? Since this is running on my personal computer I am only doing this to minimise downloads.
Ankur
Ooops, I must have missed the fact it's yahoo finance. Try having a look at this: http://www.gummy-stuff.org/Yahoo-data.htm there seems to be a code c6 Change (Real-time) that could be of help to you
Peter Perháč
The answer you left in the comment below, is probably the right way to go.
Ankur
"Use an FTP client" does not answer this programming question.Also, in order to have an md5 of the remote file, he need to download it first, which defeats the purpose.
foljs
+1  A: 

JNI is definitely irrelevant, and so is Apache POI, unless the creation date is stored in the file itself (unlikely). Otherwise, it's external metadata and either accessible via the HTTP headers (possible using pure Java), or not accessible at all.

Michael Borgwardt
I am interested in the creation date of the file on my computer - to check whether I should redownload - how will HTTP headers help in that case?
Ankur
Also I am using windows, and I can check the creation date in Win Explorer so that must mean it is stored somewhere on my computer? Am I correct?
Ankur
Oh, you just want to create a File f = new File("myfile"); and then have a look at f.lastModified();
Peter Perháč
As long as Java doesn't do what Windows does and update the last modified time every time the file is read, then that should be it.
Ankur
+2  A: 

I have a number of CSV files that I want to download from Yahoo finance each day. I want my application to read the file's creation date (on my computer, not the server). If the creation date is prior to today then the new file should be downloaded (as it will have new data).

In order to detect changes to the local file, you need the file's last modification date, which is more generic than the creation date for this kind of check (since it also shows changes to the file after it has been created).

You can get that in Java by using the

public long lastModified()

method on a File object.

Note that there is no method to get the creation date in the File API, probably because this information is not available in all filesystems.

If you absolutely need to have a file creation date, then (if you create the files yourself or you can ask those who do) you could encode the creation date by convention in the file name, like this: myfile_2009_04_11.csv.

Then you will have to parse the file name and determine the creation date.

I have done some googling and have found the Apache POI project. Is this the best way to go, is there a better way, what would you recommend.

The Apache POI project is a library for reading and writing MS Office files (Excel files in this case). CSV is a simple textual format, so you don't need POI to read it.

Also, the information you need (creation date or last modification date) is available as metadata on the file itself, not in the file's data, so you don't need POI to get to it.

Is JNI at all relevant here?

Theoretically, you could use a custom JNI extension (a bridge to native code) to get the file's creation date on those filesystems that support it.

However, you're best off using the portable last modification date thats already in the Java SDK API and/or the "creation date encoded in the filename" convention.

Using JNI will make your program not portable for no real added benefit.

foljs
Thanks, this helps a lot
Ankur