Fast import into MongoDb

views:

104

answers:

+1 Q:

Fast import into MongoDb

I have around 2 million strings with different lengths that I need to compress and put into MongoDb GridFS as files.

The strings are currently stored in MS SQL TEXT field of a table. I wrote a sample app to read each row, compress it and store it as a GridFS file.

There is one reader and a thread pool of 50 threads storing the results. It works but it is very slow (100 records per second on average).

I was wondering if there is any way for faster import into GridFS?

I'm using MongoDb 1.6 on Windows with MongoCSharp driver in C# and .NET.

+2 A:

I think I found the issue inside MongoDb CSharp driver by profiling it while running a very simple app that puts 1000 strings into 1000 GridFS files.

It turns out that 97% of the time is spent on checking if a file with the same filename exists in the collection. I added an index on the filename field and it's now blazing fast!

The question for me is if the driver needs to keep the filename unique and does a check, why doesn't it add a unique index to it if that's missing? What's the reason behind that?

Khash 2010-08-26 13:26:28

That is weird. File names do not have to be unique in GridFS, since there is already the _id primary key, right?

Thilo 2010-08-28 06:04:25

I am not sure about GridFS spec, but profiling the sample app using MongoDbCSharp library certainly shows that it checks if the file Exists *and* throws an exception if it is not.

Khash 2010-08-28 15:58:35

ansaurus

tags:

views:

answers:

Fast import into MongoDb

related questions