MongoDB Performance Based on Document Size

views:

answers:

+1 Q:

MongoDB Performance Based on Document Size

I've been playing around with the samus mongodb driver, particularly the benchmark tests. From the output, it appears the size of the documents can have a drastic effect upon how long operations on those collections take.

alt text

Is there some documentation available that recommends what balance to strive for or some more "real" numbers around what document size will do to query times? Is this poor performance more a result of the driver and any serialization overhead? Has anyone else noticed this?

+1 A:

I cannot find a link right now, but the format of the database is such that it should not matter if a document is large or small. For access via index, there is certainly no difference, for a table scan, uninteresting documents (or uninteresting parts of documents) can be skipped quickly thanks to the BSON format. If anything, the overhead of the BSON format affects tiny documents more than large ones.

So I would assume that the performance drop you see is largely due to the serialization costs of loading those documents (of course it takes more time to write a large document to disk than a small document, but it should be about the same for multiple small documents of the same aggregate size).

In your benchmark, can you normalize the numbers to be based on the same amount of data (in bytes, not in document count)?

Thilo 2010-10-19 01:23:33

It's just a bad benchmark, there is no index created.

TTT 2010-10-19 21:20:19

+1 A:

You can turn on profiling with db.setProfilingLevel(2) and query db.system.profile for details on the executed queries.

Although this may distort the test results a little, it will give you insight into the query times on the server, eliminating any influence the driver or network may have on the results. If these query times show the same pattern as your test, then the document size does influence query times. If query times are roughly the same regardless of document size, then it's serialization overhead you're looking at.

Niels van der Rest 2010-10-19 06:45:24

It's just a bad benchmark, there is no index created.

TTT 2010-10-19 21:18:16

@TTT: Theoretically, if there *were* indexes, the index would be queried. The documents themselves wouldn't be scanned, eliminating any influence the document size could have. For testing **ad hoc queries**, where the document size could have more impact on performance, the lack of an index is a good thing :)

Niels van der Rest 2010-10-20 07:05:50

I believe that even for non-index queries individual document size should make no difference (while total document size of course does). In fact, if anything, scanning 1000 documents that make up 1 MB should be slower than scanning 1 document that makes up 1MB.

Thilo 2010-10-20 08:25:21

+1 A:

But is it a good benchmark? Don't think so. Read http://stackoverflow.com/questions/2460063/2465039#2465039 .

I think the exception that happens when the index should have been created is still swallowed. FindOne() medium return 363 with and without the "creation" of the index.

TTT 2010-10-19 21:15:03

well, it should be equally bad for small or big documents (given the same total data size)

Thilo 2010-10-20 01:08:57

in fact, since having no index moves more (albeit unnecessary) work into the server, it would reduce the impact of the driver-side/serialization overhead.

Thilo 2010-10-20 01:10:28

Thanks for linking to the other post. Looks like that benchmark is bad. Will write my own, eventually

Ty 2010-10-20 20:45:28

-1 The question is not about query times and indexes, but query times and **document size**. Try reading the question as *Will document size influence query times when querying non-indexed fields?*, instead of being fixated on the error in the benchmark test. I have run the benchmark *with* proper indexes and it still shows a performance hit for the large documents. This is probably serialization overhead. My answer tells you how to know for sure.

Niels van der Rest 2010-10-21 08:23:38

You do a benchmark because you want to know if a certain system is fast enough for your needs and without indexes (but you think they are there) you get skewed results. The computer has more time for other things if there are indexes.

TTT 2010-10-21 18:04:26

ansaurus

tags:

views:

answers:

MongoDB Performance Based on Document Size

related questions