views:

405

answers:

6

Hi!

I already googled for in-memory compression and found quite a few libraries that offert this functionality. the zlib seems to be widely used - but it also seems to be quite old. I'm asking here whether there are newer, better alternatives.

The data i want to compress in-memory are memorypools with size of a few megabytes (2-16 MB) and each of those blocks contains data of two different structs as well as some arrays of pointers. inside the blocks, there's no particular order for the structs and the arrays, they are just allocated after another when the application needs to create such an element.

What compression lib would you suggest for this? compression and decompression performance (both) are more important than compression quality.

Also - for compression reasons - would it be better to have separate pools for the two different structs as well as the arrays, such that each datablock to be compressed only contains one kind of data?

This is the first time i intend to use in-memory compression and i know my question is maybe too general to give a good answer - but every hint is welcome!

thx!

+5  A: 

zlib is good. Proven, performant, and understood by many. It's what I'd use by default in a new system like what you describe. Its age should be seen as one of its greatest assets.

John Zwinck
A: 

I'm not aware of anything newer/better than zlib... zlib works fine, despite its age. zlib's deflateInit() has an argument that lets you trade off compression speed against compressed size, so you can experiment with that to find the setting that works best for you application.

There are probably C++ wrapper APIs that call the zlib C API for you, if you want something "prettier"... or if there aren't, its easy enough to write your own.

Jeremy Friesner
In some (many?) applications, the compression strength knob of zlib is not all that useful. It can make compression take quite a bit longer, but may not reduce the output size as much as simply using a different system (like bzip2, which can do more extreme compression than zlib can at its maximum setting, though at a large cost in speed). Still, good to point it out.
John Zwinck
+3  A: 

For something more modern than zlib, libbzip2 might be worth a look. It provides a similar interface to zlib, for compatibility. In a lot of cases, it offers better compression, but at a performance cost.

For something faster than zlib (but which doesn't compress as well..) there's LZO.

bzip2 is not appropriate where high speed is a requirement.
John Zwinck
(as I said - better compression, but at a performance cost)
+1  A: 

It makes no sense to do this on modern operating systems with a virtual memory manager. You'll create a blob of bytes that are not useful for anything, taking space in your virtual memory address space for no good reason. The memory manager won't leave it in RAM for very long, it will notice that the pages occupied by the blob are not being accessed and swap it out to the paging file.

In addition, you'll have to translate the data if it contains pointers. The odds that you'll be able to decompress the data at the exact same virtual memory address, so that the pointers are still valid, are very close to zero. After all, you did this to free up virtual memory space, the hole previously used by the data will be occupied by something else. This translation will probably not be trivial and it will take lots of additional memory.

If you are doing this to avoid OOM, look at operating system support for memory mapped files and consider switching to 64-bit code.

Hans Passant
the pointers are adress offsets into the memorypools. i didn't really understand the issue with the paging. i need compression because i have a realtime system that creates and reuses a huge ammount of data
Mat
+1  A: 

If compression/decompression speed is important for you, you should take a look at LZO:

http://www.oberhumer.com/opensource/lzo/

Compared to zlib the code smaller and easier to use as well.

Nils Pipenbrinck
A: 

For compression, the data matters a lot. Compressing arbitrary binary data in memory is a complete waste of time, will slow your performance immensely, and probably will end up making your memory usage higher.

If you really need to have much more memory you should look at using VirtualAlloc or sbrk to control the memory yourself. This way you can address ALL physical memory, not just 2-4gb.

Charles Eli Cheese