Hi guys!
I am looking to store pictures in a NoSQL database (<5MB) and link them to articles in a different bucket. What kind of speed does Riak's link walking feature offer? Is it like a RDBMS join at all?
Hi guys!
I am looking to store pictures in a NoSQL database (<5MB) and link them to articles in a different bucket. What kind of speed does Riak's link walking feature offer? Is it like a RDBMS join at all?
Think one-way relationships and as fast as querying normally. Not as slow as MapReduce.
From: http://seancribbs.com/tech/2010/02/06/why-riak-should-power-your-next-rails-app/
The first way that Riak deals with this is with link-walking. Every datum stored in Riak can have one-way relationships to other data via the Link HTTP header. In the canonical example, you know the key of a band that you have stored in the “artists” bucket (Riak buckets are like database tables or S3 buckets). If that artist is linked to its albums, which are in turn linked to the tracks on the albums, you can find all of the tracks produced in a single request. As I’ll describe in the next section, this is much less painful than a JOIN in SQL because each item is operated on independently, rather than a table at a time. Here’s what that query would look like:
GET /raw/artists/TheBeatles/albums,,/tracks,_,1 “/raw” is the top of the URL namespace, “artists” is the bucket, “TheBeatles” is the source object key. What follows are match specifications for which links to follow, in the form of bucket,tag,keep triples, where underscores match anything. The third parameter, “keep” says to return results from that step, meaning that you can retrieve results from any step you want, in any combination. I don’t know about you, but to me that feels more natural than this:
SELECT tracks.* FROM tracks INNER JOIN albums ON tracks.album_id = albums.id INNER JOIN artists ON albums.artist_id = artists.id WHERE artists.name = "The Beatles" The caveat of links is that they are inherently unidirectional, but this can be overcome with little difficulty in your application. Without referential integrity constraints in your SQL database (which ActiveRecord has made painful in the past), you have no solid guarantee that your DELETE or UPDATE won’t cause a row to become orphaned, anyway. We’re kind of spoiled because ActiveRecord handles the linkage of associations automatically.
The place where the link-walking feature really shines is in self-referential and deep transitive relationships (think has_many :through writ large). Since you don’t have to create a virtual table via a JOIN and alias different versions of the same table, you can easily do things like social network graphs (friends-of-friends-of-friends), and data structures like trees and lists.
Links are not at all similar to JOINs (which involve a Cartesian product), but they can be used for similar purposes in some senses. They are very similar to links in an HTML document.
With link-walking you either start with a single key, or you create a map-reduce job that starts with multiple keys. (Link-walking/traversal is actually a special case of map-reduce.) Those values are fetched, their links filtered against your specification (bucket, tag) and then the matched links are passed along to the next phase (or back to the client). Of course, all of this is done in parallel (unlike a JOIN) with high data-locality.
Also, map-reduce isn't slow by itself, you just don't have a sophisticated query planner to do the hard work for you; you have to think about how you will query and organize your data around that as necessary.