views:

123

answers:

2

Hi all,

I have to do batch processing to automate business process. I have to poll directory at regular interval to detect new files and do processing. While old files is being processed, new files can come in. For now, I use quartz scheduler and thread synchronization to ensure that only one thread can process files.

Part of the code are:

application-context.xml

<bean id="methodInvokingJob"
  class="org.springframework.scheduling.quartz.MethodInvokingJobDetailFactoryBean"><br/>
  <property name="targetObject" ref="documentProcessor" /><br/>
  <property name="targetMethod" value="processDocuments" /><br/>
</bean>

DocumentProcessor
.....

public void processDocuments() { 
  LOG.info(Thread.currentThread().getName() + " attempt to run.");
  if (!processing) {
     synchronized (this) {
        try {
           processing = true;
           LOG.info(Thread.currentThread().getName() + " is processing");
           List<String> xmlDocuments = documentManager.getFileNamesFromFolder(incomingFolderPath);               
           // loop over the files and processed unlock files.
           for (String xmlDocument : xmlDocuments) {
              processDocument(xmlDocument);
           }
        }
        finally {
           processing = false;
        }
     }
  }
}

For the current code, I have to prevent other thread to process files when one thread is processing. Is that a good idea ? or we support multi-threaded processing. In that case how can I know which files is being process and which files has just arrived ? Any idea is really appreciated.

A: 

I'd do the following:

  • One thread that gets your filenames and adds them to a synchronized queue.

  • Multiple threads to do the actual reading: get an item from the synced queue and process it.

To check if a file is used you can simply try to rename/move it.

Carra
+3  A: 

I would build it with these parts:

  1. Castle Transactions with TxF
  2. FileSystemWatcher JavaVersion
  3. TransactionScope (no java version unless you hack it a lot)
  4. A lock-free queue * (Paper discussing perf Java vs .Net, might be able to get source from them for Java) Java lock-based queues

    Such that:

When there's a new file, the file system watcher detects it (remember to put the correct flags, handle the error condition and set Enbled <- True and watch out for doubles), puts the file path in the queue.

You have an application thread, n worker threads. If this is the only app, they spin-wait on the queue, TryDequeue, otherwise they block on a monitor while(!Monitor.Enter(has_items)) ;

When a worker threads get a path through the de-queue operation, it starts working on it, and now no other thread can work on it. If there are doubles of output (depending on your setup), you can then use a file transaction as you are writing the output file. If the Commit operation fails, then you know another thread has already written the output file, and resume polling the queue.

Henrik
Thanks for the info Henrik, all the mentioned library are completely new to me but I must definitely ensure the transaction so I will look into TxF. For now, I intend to use Spring Batch + Spring Integration. I just read through the introduction and some demo but they all convinced me that it's the right solution to chase.
Hieu Lam