views:

366

answers:

1

I've been messing about with the Ordnance Survey Code-Point Open dataset of UK postcodes/co-ordinates. Since Couch.io were offering a free hosted CouchDB instance I thought I'd put my geo data into one of those, learning a bit about CouchDB in the process.

The idea was that since CouchDB is supposed to be good at handling large datasets (the postcode data is around 1.7 million records) and works natively with REST/JSON, it would tie in nicely with client-side jQuery for use with Google Maps applications.

My initial aim was simply to be able to make an AJAX call with a postcode as a parameter, getting back a single JSON object with lat/lon properties, which I could consume in my script (showing a marker for that postcode).

I've done this successfully, but coming from a relational DB background it was much trickier than I thought; as I read more about CouchDB and play with it a bit I get the impression it wouldn't really be the right tool for this job, were I actually using it for a real-world project.

Am I right in thinking that dynamic queries are a bit of a weakness for CouchDB? Is it more aimed at returning large views which don't change all that often, from large datasets? What might be some examples of 'good' and 'bad' uses of CouchDB, in terms of playing to its strengths?

+5  A: 

Hi, Mark.

I am the main Couchio hosting guy. Glad you are enjoying CouchDB.

My feeling is, basically, relational databases are flat out better at constantly-changing, one-off queries from large data sets. It still takes forever to churn through all that data. Neither SQL nor NoSQL is a silver bullet there. However, broadly speaking, the NoSQL databases are better if you already know what questions you will ask. In other words, it's not a question of how much the data changes, but how much the queries change.

That is the theory. For your specific project, is CouchDB a good fit? My feeling is, there is nothing wrong with making many many indexes on the basic data set. The benefit of index-only queries is, the queries happen very fast. CouchDB in particular only needs to re-index new data, even for queries such as averages, or XOR checksums.

So, even if you have a hundred different types of queries you might perform, if you already know what those queries are, hey just write them down. However if you will never stop making brand new queries, CouchDB would have a hard time keeping up.

jhs
Thanks for that, it backs up what I thought (even though I perhaps didn't word it very clearly).
Mark B
Lets say you have a logger for a resource. The resources is accessed by many users every second. The logger needs to compute current number of users accessing the resource and average time the resource being used by all users. Currently MySQL can't keep up with number of writes. Would CouchDB be a better solution?
cory