tags:

views:

24

answers:

1

Say if the app is like Digg, where users post a web link and add tags. Then there will be many tags that says "shopping", repeatedly in the value part of the key/value pair.

Will MongoDB automatically or be configured so that it will remember that word as "s1" so as to reduce the size of the database? What if it is the key part that repeat a lot instead (almost always repeat if each "document" has the same property names.)

+1  A: 

No, it will not do that, but you can easily do that yourself.

One reason for not wanting to do this at the server is that (according to the mailing list) it makes sharding more difficult.

One reason to want to do this (or other techniques such as compressing the document on disk) is that the space savings also make for smaller memory usage of cached objects and indexes, meaning that you could get better cache hit rates for the same amount of RAM.

Thilo
Does that mean if there is a table with 1 million records, each with a field name "PhoneNumber", then this 11 bytes will repeat 1 million times? Now if each hard drive has its own compression inside the hard drive and is not visible to the outside world, it'd be fine to have this word compressed by using "p1" to stand for "PhoneNumber". So can MongoDB also have a layer underneath that does something like that?
動靜能量
Yes, the 11 bytes will repeat 1 million times. If you think that this is a problem, you can shorten the keys yourself. Some people have tried that and achieved about 15% space reduction (which they decided was not worth-while), but of course this will vary widely depending on your data. At the moment MongoDB does not have a compression layer, some form of compression is planned, but not scheduled yet (according to the JIRA page I linked).
Thilo
Keep in mind that 11 bytes repeated 1 million times is 11 MB, not exactly an overwhelming amount of data (for most applications).
kristina