views:

34

answers:

2

Hi,

I have a compressed file in the disk, that a partitioned in blocks. I read a block from disk decompress it to memory and the read the data.

It is possible to create a producer/consumer, one thread that recovers compacted blocks from disk and put in a queue and another thread that decompress and read the data?

Will the performance be better?

Thanks!

A: 

Yes, it's possible to set it up that way. Whether you would see a performance improvement is wildly dependent on the machine, the exact nature of what you're doing with the decompressed data, etc. If it's not too much trouble, and your dataset is substantial, I'd suggest doing it and measuring to see if it's faster. If nothing else, it's similar to the work you'd need to do to leverage some sort of map-reduce framework.

Hank Gay
The map reduce is for computer clusters. In my case, I have onlye one machine. How can I use it?Thanks
Database Designer
Although Map/Reduce is popular because it allows for easy horizontal scaling using a cluster, it's perfectly possible to use it in single-node configurations. Check out this [article on single-node Hadoop](http://hadoop.apache.org/common/docs/current/quickstart.html).
Hank Gay
+1  A: 

I suspect that the thread that decompresses the data would spend most of its time waiting for the thread that reads the compacted blocks from the disk.

I'd be surprised if the CPU-bound decompression took longer than the IO-bound reading the blocks from disk.

Gilbert Le Blanc
This all depends on the disks and the compression you use, e.g. decompressing gzipped files is by far cpu bound on our servers.
nos
That is the point. If i use a heavy compression probably the decompress thread will be not waiting for the I/O thread, and then I will have a performance gain.
Database Designer