views:

397

answers:

4

Hi.

I have the URL of a text file and I want my Java program to read that text file. But the plot thickens! The file is constantly being appended with new lines and I want to read these lines as they come in.

I think the right approach is to open a URLConnection to the URL of the file and somehow put that URLConnection under the 'supervision' of some sort of StreamReader or StreamBuffer type of object.

This is where my Java skills become questionable and I was wondering if anyone cares to donate an answer or two.

Thanks.

A: 

Get the InputStream from the URLConnection and wrap it in an InputStreamReader and then the ISR in a BufferedReader:

InputStream is = urlConnection.getInputStream();
InputStreamReader isr = new InputStreamReader(is, <encoding>);
BufferedReader br = new BufferedReader(isr);

You can now use br.readLine() to read each line of text from the resource until it returns null (EOF). You have to get the character encoding from somewhere though, either from the HTTP response content type header or if it's known, you can specify it directly.

jarnbjo
this will read the file once, and thats it. as far as i understand the OP is looking for the java equivalent of "tail -f"
hatchetman82
@hatchetman82: You are probably right.
jarnbjo
+1  A: 

To make this implement an InputStream you would need to make multiple http requests in a loop and keep track of how much you have read so far so the consumer of the stream gets consistent output.

pseudocode:

int read_bytes = 0;
while (should_be_reading) {
  # make http request

  # read or scan to read_bytes
  # emit any new bytes
  # update read_bytes

}
Kevin
This is exactly what I've done - I was just wondering if this means that I am really downloading the entire file every time I make the http request.
Warlax
It would request the entire file each time. If the webserver supports HTTP/1.1 range headers, this could be made more efficient by telling the server to only send the content after a certain byte offset. http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35
Kevin
A: 

If you keep track of the total number of bytes read, you can use an http range header to tell the server to start serving the file from a given position. This functionality is mainly used to resume downloads, but should be applicable here.

I realize this doesn't give you an input stream, but I think it is a more robust solution.

mikerobi
is the range header supported across all web servers and configurations ? (i ask because sometimes i know you cant resume downloads)maybe he'll still need to fallback to reading the whole file over and over for robustness
hatchetman82
No it isn't, but Warlax only needs it to be supported on 1 server. While range support isn't guaranteed, I think it is the norm, it is pretty rare these days to be unable to resume a download. The most likely scenario where you can't resume, is when the content is being served by a dynamic application and not a static file.
mikerobi
A: 

According to the comments above, I think I've solved this problem. I am just not so sure that this doesn't mean that I am downloading the entire file every time:

                long charsRead = 0;

                while(keepRunning)
                {
                    URL url = new URL(finalUrlString);
                    URLConnection connection = url.openConnection();
                    InputStreamReader stream = new InputStreamReader(connection.getInputStream());
                    BufferedReader reader = new BufferedReader(stream);
                    long skipped = reader.skip(charsRead);
                    String line = reader.readLine();
                    if(line != null)
                    {
                        charsRead += line.length() + 1;
                        process(line);
                    }
                    reader.close();
                }

This piece of code runs inside its own thread. I am using the process method to fill up a vector of objects generated by parsing each line.

A different piece of code, on a different thread, looks at this vector - reads the objects - and empties it.

Of course the this thread and the other one are synchronized around that vector instance.

Warlax