views:

831

answers:

3

I have two independant threads F1 and F2 (to be precise, two instances of java.util.concurrent.FutureTask) that are running in parallel.

F1 do some processing, and then copy the result in a XML file. Then, it repeats these steps until it has nothing to do (many XML files are created). F2 looks in the F1 output directory, and take one file, parse it, and execute some processing on it.

This works quite pretty well, except that sometimes, F2 gets truncated XML data from the file. I mean by that an incomplete XML, where some XML node are not present. The problem is that it is not always reproductible, and the files that are truncated are not always the same. Because of that, I am thinking that while F1 is writing one file on the disk, F2 is trying to read the same file. That's why sometimes I get this kind of error.

My question: I am wondering if there is some mechanism that locks (even for reading) the file F1 is currently writing until it has completely finished to write it on the disk, so F2 will not be able to read it until the file is unlocked. Or any other way to solve my issue will be welcome !

F1 is writing the file this way:

try {
    file = new File("some-file.xml");
    FileUtils.writeStringToFile(file, xmlDataAsString);
} catch (IOException ioe) {
    LOGGER.error("Error occurred while storing the XML in a file.", ioe);
}

F2 is reading the file this way:

private File getNextFileToMap() {
    File path = getPath(); // Returns the directory where F1 stores the results...
    File[] files = path.listFiles(new FilenameFilter() {
        public boolean accept(File file, String name) {
            return name.toLowerCase().endsWith(".xml");
        }
    });
    if (files.length > 0) {
        return files[0];
    }
    return null;
}

// Somewhere in my main method of F2
...
f = getNextFileToMap();
Node xmlNode = null;
try {
    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    DocumentBuilder builder = factory.newDocumentBuilder();
    Document doc = builder.parse(f);
    if (doc != null) {
        xmlNode = doc.getDocumentElement();
    }
} catch (Exception e) {
    LOGGER.error("Error while getting the XML from the file " + f.getAbsolutePath(), e);
}
+6  A: 

Since you're already filtering for .xml files in F2, have F1 output to a .temp file, then rename it to .xml as a final step. That way, F2 will ignore the file F1 is making until F1 is completely done with it.

Welbog
Bah, beaten by 6 seconds... I'll leave my answer for the moment due to the reference to FileLock though.
Jon Skeet
@Jon Skeet: I was going to edit to mention locks, too. We had the same ideas, just in a different order.
Welbog
Why do I search for a complex solution while there is something as simple as that ? Thanks for the idea !
romaintaz
+4  A: 

Have you looked at the java.nio.channels.FileLock API?

A simpler solution may well be to write to a different filename (e.g. foo.tmp) and then rename it (e.g. to foo.xml) when it's ready - the rename is atomic on most operating systems (within a directory), so when the other process sees the XML file it should be complete. This is likely to be a lot simpler than locking.

Jon Skeet
+1  A: 

Use the keyword synchronized on a common object, a File Object pointing to the file would be the best in this case:

 class MyFileFactory {
     private static HashMap<string,File> files = new HashMap<String,File>();
     public static File get(String name) {
         if (!files.keyExists(name)) {
              files.put(name, new File(name));
         }
         return files.get(name);
     }
 }

 // Another class
 synchronized(File f = MyFileFactory::get("some-file.xml")) {
      // read or write here
 }
soulmerge