tags:

views:

114

answers:

1

I am looking at implementing a CouchDB server to provide ad-hoc searching of some metadata that we store for an internal business operation.

We store a number of "attributes" like size, source, submit date, and URL for the "jobs" in our internal process.

This is all well and good in our relational database, but our users would like to build lists of similar jobs by providing "search criteria" similar to doing a google search. So the user could say "show me all jobs which are greater than XXX and submitted after YYY" and get back a list of descriptions and URLs.

This sounds perfect for Couch, and from what I have researched it looks like it will work well.

My question is how well will it scale with appropriate hardware? We have between 150-200 million such documents, and between 11-30 attributes per document. The metadata is a few kbytes in size at the most.

Im initially looking at having a quadcore server (VM) serving this up for testing but I need it to scale up to support between 100-250 users simultaneously.

I know I can do this with most db servers, but I am looking for something that provides the ad-hoc querying aspect (over REST or HTTP is fine we have our own search tools).

Has anyone had experience setting up Couch and using it for production loads at this level?

+2  A: 

Concurrent connections aren't a problem, erlang and CouchDB are built for concurrent performance.

Are you thinking that you'll have to be generating new map functions dynamically, cause it kind of sounds like it?

Whenever you add a new view map function you're going to hit a big bottleneck in the initial view generation.

If you use erlang views they generate much faster than javascript views because they don't hit the JSON serialization step, this can significantly speed up the view generation performance.

Once the view is generated it will be quite fast even with the size you're talking about.

mikeal
Awesome. Thanks, this is what I was hoping to hear.
GrayWizardx