CouchDB and MongoDB really search over each document with JavaScript?

views:

103

answers:

+1 Q:

CouchDB and MongoDB really search over each document with JavaScript?

From what I understand about these two "Not only SQL" databases. They search over each record and pass it to a JavaScript function you write which calculates which results are to be returned by looking at each one.

Is that actually how it works? Sounds worse than using a plain RBMS without any indexed keys.

I built my schemas so they don't require join operations which leaves me with simple searches on indexed int columns. In other words, the columns are in RAM and a quick value check through them (WHERE user_id IN (12,43,5,2) or revision = 4) gives the database a simple list of ID's which it uses to find in the actual rows in the massive data collection.

So I'm trying to imagine how in the world looking through every single row in the database could be considered acceptable (if indeed this is how it works). Perhaps someone can correct me because I know I must be missing something.

+1 A:

Think CouchDB stores the docs in a btree according to the "index" (view) and just walks this tree.. so it's not searching..

see http://guide.couchdb.org/draft/btree.html

Øyvind Skaar 2010-10-20 19:01:57

+1 A:

You should study them up a bit more. It's not "worse" than and RDMBS it's different ... in fact, given certain domains/functions the "NoSQL" paradigm works out to be much quicker than traditional and in some opinions, outdated, RDMBS implementations. Think Google's Big Table platform and you get what MongoDB, Riak, CouchDB, Cassandra (Facebook) and many, many others are trying to accomplish. The primary difference is that most of these NoSQL solutions focus on Key/Value stores (some call these "document" databases) and have limited to no concept of relationships (in the primary/foreign key respect) and joins. Join operations on tables can be very expensive. Also, let's not forget the object relational impedence mismatch issue... You don't need an ORM to access MongoDB. It can actually store your code object (or document) as it is in memory. Can you imagine the savings in lines of code and complexity!? db4o is another lightweight solution that does this.

I don't know what you mean when you say "Not only SQL" database? It's a NoSQL paradigm - wherein no SQL is used to query the underlying data store of the system. NoSQL also means not an RDBMS which SQL is generally built on top of. Although, MongoDB does has an SQL like syntax that can be used from .NET when retrieving data - it's called NoRM.

I will say I've only really worked with Riak and MongoDB... I'm by no means familiar with Cassandra or CouchDB past a reading level and feature set comprehension. I prefer to use MongoDB over them all. Riak was nice too but not for what I needed. You should download a few of these NoSQL solutions and you will get the concept. Check out db4o, MongoDB and Riak as I've found them to be the easiest with more support for .NET based languages. It will just make sense for certain applications. All in all, the NoSQL or Document databse or OODBMS ... whatever you want to call it is very appealing and gaining lots of movement.

I also forgot about your javascript question... MongoDB has JavaScript "bindings" that enable it to be used as one method of searching for data. Riak handles data via a JSON format. MongoDB uses BSON I believe and I can't remember what the others use. In any case, the point is instead of SQL (structured query language) to "ask" the database for information some of these (MongoDB being one) use Javascript and/or RESTful syntax to ask the NoSQL system for data. I believe CouchDB and Riak can be quieried over HTTP to which makes them very accessible. Not to mention, that's pretty frickin cool.

Do your research.... download them, they are all free and OSS.

db4o: http://www.db4o.com/ (Java & .NET versions)

MongoDB: mongodb.org/

Riak: http://www.basho.com/Riak.html

NoRM: http://thechangelog.com/post/436955815/norm-bringing-mongodb-to-net-linq-and-mono

bbqchickenrobot 2010-10-20 19:03:50

+2 A:

In terms of CouchDB, the Map function can be Javascript, but it can also be Erlang. (or another language altogether, if you pull in a 3rd Party View Server)

Additionally, Views are calculated incrementally. In other words, the map function is run on all the documents in the database upon creation, but further updates to the database only affect the related portions of the view.

The contents of a view are, in some ways, similar to an indexed field in an RDBMS. The output is a set of key/value pairs that can be searched very quickly, as they are stored as b-trees, which some RDBMSs use to store their indexes.

Dominic Barnes 2010-10-20 19:29:13

+2 A:

@Xeoncross

I built my schemas so they don't require join operations which leaves me with simple searches on indexed int columns. In other words, the columns are in RAM and a quick value check through them (WHERE user_id IN (12,43,5,2) or revision = 4)

Well then, you'll love MongoDB. MongoDB support indexes so you can index user_id and revision and this query will be able to return relatively quickly.

However, please note that many NoSQL DBs only support Key lookups and don't necessarily support "secondary indexes" so you have to do you homework on this one.

So I'm trying to imagine how in the world looking through every single row in the database could be considered acceptable (if indeed this is how it works).

Well if you run a query in an SQL-based database and you don't have an index that database will perform a table scan (i.e.: looking through every row).

They search over each record and pass it to a JavaScript function you write which calculates which results are to be returned by looking at each one.

So in practice most NoSQL databases support this. But please never use it for real-time queries. This option is primarily for performing map-reduce operations that are used to summarize data.

Here's maybe a different take on NoSQL. SQL is really good at relational operations, however relational operations don't scale very well. Many of the NoSQL are focused on Key-Value / Document-oriented concepts instead.

SQL works on the premise that you want normalized non-repeated data and that you to grab that data in big sets. NoSQL works on the premise that you want fast queries for certain "chunks" of data, but that you're willing to wait for data dependent on "big sets" (running map-reduces in the background).

It's a big trade-off, but if makes a lot of sense on modern web apps. Most of the time is spent loading one page (blog post, wiki entry, SO question) and most of the data is really tied to or "hanging off" that element. So the concept of grabbing everything you need with one query horizontally-scalable query is really useful.

It's the not the solution for everything, but it is a really good option for lots of use cases.

Gates VP 2010-10-21 14:29:23

Thank you for the complete answer. Since MongoDB supports secondary indexes there is no longer a problem. I was looking at CouchDB and it only seemed to support primary keys which made it look rather useless. MongoDB seems to be the way to go as it has most of the features of a normal SQL database minus joins and transactions (which I don't use either).

Xeoncross 2010-10-21 15:26:51

Yes, I prefer MongoDB over the rest as well! I set it up and was coding with it in under 10 minutes (including reading a sample tutorial).

bbqchickenrobot 2010-10-21 18:35:20

ansaurus

tags:

views:

answers:

CouchDB and MongoDB really search over each document with JavaScript?

related questions