Is there any way to select a subset from a large set based on a property or predicate in less than O(n) time?
For a simple example, say I have a large set of authors. Each author has a one-to-many relationship with a set of books, and a one-to-one relationship with a city of birth.
Is there a way to efficiently do a query like "get all books by authors who were born in Chicago"? The only way I can think of is to first select all authors from the city (fast with a good index), then iterate through them and accumulate all their books (O(n) where n is the number of authors from Chicago).
I know databases do something like this in certain joins, and Endeca claims to be able to do this "fast" using what they call "Record Relationship Navigation", but I haven't been able to find anything about the actual algorithms used or even their computational complexity.
I'm not particularly concerned with the exact data structure... I'd be jazzed to learn about how to do this in a RDBMS, or a key/value repository, or just about anything.
Also, what about third or fourth degree requests of this nature? (Get me all the books written by authors living in cities with immigrant populations greater than 10,000...) Is there a generalized n-degree algorithm, and what is its performance characteristics?
Edit:
I am probably just really dense, but I don't see how the inverted index suggestion helps. For example, say I had the following data:
DATA
1. Milton England
2. Shakespeare England
3. Twain USA
4. Milton Paridise Lost
5. Shakespeare Hamlet
6. Shakespeare Othello
7. Twain Tom Sawyer
8. Twain Huck Finn
INDEX
"Milton" (1, 4)
"Shakespeare" (2, 5, 6)
"Twain" (3, 7, 8)
"Paridise Lost" (4)
"Hamlet" (5)
"Othello" (6)
"Tom Sawyer" (7)
"Huck Finn" (8)
"England" (1, 2)
"USA" (3)
Say I did my query on "books by authors from England". Very quickly, in O(1) time via a hashtable, I could get my list of authors from England: (1, 2). But then, for the next step, in order retrieve the books, I'd have to, for EACH of the set {1, 2}, do ANOTHER O(1) lookup: 1 -> {4}, 2 -> {5, 6} then do a union of the results {4, 5, 6}.
Or am I missing something? Perhaps you meant I should explicitly store an index entry linking Book to Country. That works for very small data sets. But for a large data set, the number of indexes required to match any possible combination of queries would make the index grow exponentially.