views:

96

answers:

2

Hi Folks,

this is some kind of long post, so I have to say thanks for reading. My app is supposed to process a lot of soundfiles, lets say 4000+. My first approach was to load a certain amount (lets say 200mb) of sound data, process it, write it and then "null" the data to let the gc free it. But regarting to the fact that the data is loaded via intranet, this seems not to be the "best" way. (File access is slow) Calculations should start with the first loaded file. To achive this, I changed the concept to a sort of "producer/consumer" (I think). Here are my classes so far:

Reader/Producer

public class ReaderThread extends Thread {

    List<Long> files;
    ConcurrentLinkedQueue<Long> loaded = new ConcurrentLinkedQueue<Long>();
    boolean finished = false;

    public ReaderThread( List<Long> soundFiles) {
        this.files = soundFiles;
    }

    @Override
    public void run() {
        Iterator<Long> it = files.iterator();
        while(it.hasNext()) {
            Long id = it.next();
            if (FileLoader.load(id)) {
                loaded.add(id);
            }
        }
        finished = true;
    }

    public Long getNextId() {
        while(loaded.isEmpty()) {
            if( finished ) {
                return null;
            }
        }
        Long id = loaded.poll();
        return id;
    }
}

This is the writer/(Not consumer)

public class WriterThread extends Thread {

    ConcurrentLinkedQueue<Long> loaded = new ConcurrentLinkedQueue<Long>();
    String directory;
    boolean abort = false;

    public WriterThread(String directory) {
        this.directory = directory;
    }

    @Override
    public void run() {

        while(!(abort&&loaded.isEmpty())) {
            if(!loaded.isEmpty()) {
                Long id = loaded.poll();
                FileWriter.write(id, directory);
                FileManager.unload(id);
            }
        }
    }

    public synchronized void submit(Long id) {
        loaded.add(id);
    }

    public synchronized void halt() {
        abort = true;
    }

}

This is the part where all things get together:

    // Forgive me the "t" and "w". ;-)
                    t = new ReaderThread(soundSystem,soundfilesToProcess);
            w = new WriterThread(soundSystem,outputDirectory );

            t.start();
            w.start();

            long start = System.currentTimeMillis();
            while(!abort) {

                Long id = t.getNextId();
                if(id!=null) {

                    SoundFile soundFile = soundSystem.getSoundfile(id);
                    ProcessorChain pc = new ProcessorChain(soundFile, getProcessingChain(), w);
                        Future<List<ProcessorResult>> result = es.submit(pc);
                        results.add(result);

                }else {
                    break;
                }
            }
for( Future<List<ProcessorResult>> result : results) {
            List<ProcessorResult> tempResults;
            try {
                tempResults = result.get();
                processResults(tempResults);

            } catch (InterruptedException e) {

                e.printStackTrace();
            } catch (ExecutionException e) {
                e.printStackTrace();
            }

        }   
w.halt();

"ProcessorChain" is a runnable. es.submit -> "es" is a CachedThreadPool.

What I need to know in the first place is weather or not this approach is "good", or if it is more like "nonsens". It seems to work quite well, but I have little problems with the writer thread, it seems that in some cases not all files get written. The writer threads submit method is called by the ProcessorChain when it has finished work. The second thing it thread safety. Did I miss somethin?

+3  A: 

I believe it will be (a lot) simpler if each thread reads, process and then writes a whole soundfile (one thread per file).

You use can Java thread pools and let the operating system/Java VM parallelize the read/process/write with multiple files to gain eficiency. I may be wrong, but from what you described a simpler solution would be enough and then you can measure your bottleneck if further improvements are needed.

Hbas
@Hbas CachedThreadPool is what I use right now. Its correct, it would be simpler, but I thought performing reading/computing/writing parallel would be the better way.
InsertNickHere
@InsertNickHere The reading/computing/writing will happen in parallel... In practice, not every thread will be doing the same thing at the same time with the different files.
Hbas
@Hbas What I meant, from the general idea, your answer suggest a parallel processing of the "read/compute/write" sequence. But i want the read/compute/write to be parallel, not hardly stuck together. Anyway, you are right when you say things will not happen at the same time.
InsertNickHere
@Hbas I have changed the code to your idea and it seems to give me a good performance.
InsertNickHere
+1  A: 

I think the approach is ok in general (one thread for reading input, one for writing output and one or more for processing).

A couple of suggestions:

1 - you probably want to use semaphores instead of having your threads spinning in a constant loop. For example, with a semaphore your write thread would just block until a file was actually available to write. Currently it will spin, potentially wasting 1/3 of your cpu cycles when there's nothing to write.

2 - you probably want to explicitly create worker threads instead of doing the work on the main thread. That way you can have multiple threads doing processing at the same time. That may already be what ProcessorChain is doing, but it's not clear from the snippet.

patros
@patros Thank you for your answer, but I have taken the other. It is really less work.
InsertNickHere