+1  A: 

Just to get to 500million a triple store has to do all of that and more. I have spent several years working on a triple store implementation, and I can tell you that breaking 1 billion triples is not as simple as it may seem.

The problem is that many rdf queries are 2nd or 3rd order (and higher-orders are far from unheard of). This means that you are not only querying a set of entities, but simultaneously the data about the set of entities; data about the entities schemas; data describing the schema language used to describe the entities schemas.

All of this without any of the constraints available to a relational database to allow it to make assumptions about the shape of this data/metadata/metametadata/etc.

There are ways to get beyond 500 million, but they are far from trivial, and the low hanging fruit (ie. the approaches you have mentioned) were required just to get to where we are now.

That being said, the flexibility provided by an rdf-store, combined with a denotational semantic available via its interpretation in Description Logics, makes it all worthwhile.

Recurse
Hi Recurse, I have to admit I don't quite understand why the method I suggested might be useful for 500 million but it would then get really hard to break 1 billion triples. The way I can see it, 500 million takes 29 lookups, and 1 billion takes 30 lookups? Have I got it all wrong? Please note I don't expect a full answer it's obviously not a trivial question, but if you know of any research papers etc. that deal with it and can point me in their direction that would be much appreciated.
Ankur
That's because you are thinking in terms of lookups; lookups are largely irrelevant to the performance of the store at the scalability limit. What kills you is seeking to disk, and that becomes critical when you start routinely seeing cache misses when traversing the index. Moving from 29 to 30 index levels seems trivial, until you consider that it can be moving from 1-2 or 2-3 seeks. Combine this with the deep joins common in RDF queries, and while not insurmountable, continuing to scale becomes far from easy.
Recurse