views:

341

answers:

3

Reading the javadoc on FileDesciptor's .sync() method, it is apparent that sync() is primarily concerned with committing any modified buffers back to the underlying storage. I.e., making sure that anything that your program has output will actually make it to the disk (or socket or what-have-you, but my question pertains mainly to disks).

But what about the other direction, what about INPUT? Suppose my program has some parts of a java.io.RandomAccessFile buffered in memory, and I want to READ those parts of the file, but perhaps some other process has modified those parts of the file since the last time my program read those blocks?

This is akin to marking a variable as 'volatile' in a C program; something else may have changed the 'real version' of something you merely have a convenient copy of.

I.e., how can you be certain that what your java program reads is at least reasonably up-to-date?

(Clearly the definition of 'up to date' matters. Purely as an example, suppose that the other process, the one that writes to the file, does so on the order of maybe once per second, and suppose that the reading process reads maybe once per minute. In a situation like this, performance isn't a big deal, it's just a matter of making sure that what the reader reads is consistent with what the write writes, to within say, a second.)

A: 

At the moment where the file is read into the internal buffer, the contents are up-to-date to the contents on the disk.

If you want to be sure to have the latest contents on your next access, you also have to go to the disk again, skipping all internal buffers and caches. If you really want to be sure, that all such layers are skipped, you'll have to reopen the file from scratch and seek to the according position you want to access.

Of course, your performance will go down the tubes if you access the disk on every possible access of the data. Don't think of 3-5 fold or so but orders of magnitudes.

Kosi2801
+1  A: 

Before re-reading your file, it is usually a good idea to check the last modified timestamp of the file with File.lastModified(). If this timestamp is not newer than the last time you read the file, you don't need to bother with more disk I/O to re-read the blocks you are interested in. One thing to keep in mind though, is that the last modifed timestamp may not always be updated immediately when the contents are updated if you are using a network filesystem. If you are dealing with a local process updating the file and another local process running your code reading the file, you most likely won't run into this issue.

One method I've had success with in the past was to have a separate thread poll the file for the last modified timestamp on certain intervals, say 5 seconds. If the file changed, re-process the file and send an event to registered listeners. In my case, 5 seconds was more than soon enough to get updates.

Gary
A: 

If another program you control is the only one writing to the file, then it's probably best to have 2 threads in the same Java process coordinate. The easiest solution is to create a java.util.concurrrent.atomic.AtomicBoolean. The writer thread calls set(true) on the AtomicBoolean and the reader calls getAndSet(false). If getAndSet() returns true, then you know the reader needs to re-read the data. If it's an issue, you could synchronize on some object to prevent the writer from writing while the reader is reading.

You said "process" in the question, so maybe you are concerned about any other process on the system changing the data. In this case, I think you best bet is to just reopen and reread the data. The performance impact of this should be negligible if you really are only reading once per minute.

AngerClown