I have this scenario. OS is UNIX. There are a bunch of files on Server A. The need is to FTP these files(they are in XML format) to server B, parse them and store the values retrieved from specific tags to a DB. Current implementation of the parser in perl is such that the files are processed sequentially. Can a multi-threading concept in java be used here so that the fetching can be made faster?
The fetching and the processing are distinct steps.
For the fetching step, fetching more than one file at a time would give you increased speed only if the files are small. Otherwise, you will be limited by the bandwidth of the connection between the two machines even when transferring only one file at once.
For the processing step: if the files are not related, yes, you will see a speed up by processing more than one in unison (if the server is not itself an older single-core non-hyperthreaded machine).
Neither of these changes require switching to Java. That is a separate concern.
Yes, it will increase performance significantly. You have two I/O operations (FTP, database access) and some threads can use processor cores for parsing while others are waiting during I/O operations.
I had to do almost same task in Java. High Level Concurrency Objects helped me a lot. Particularly I used ThreadPoolExecutor
through Executors.newFixedThreadPool(concurrentCount);
.
Each FTP-file was downloaded using a new FTP connection. Advantage is that you save time of connection initiation and file requesting as another process is using your bandwidth at that time.
For FTP-related tasks I used org.apache.commons.net.ftp.FTPClient
Edit: you can start XML processing immediately after download is complete by using Future.get()
method.