Top
k
problem - searching BEST k
(3 or 1000) elements in DB
There is fundamental problem with relational DB, that to find top k
elems, there is a need to process ALL rows in table. Which make it useless on big data.
I'm making application (for university research, not really my invention, I'm implementing and trying to improve original idea) that allows you to effectively find top k
elements by visiting only 3-5% of stored data. Which make it really fast.
There are even user preferences, so on some domain, you can specify function that specify best value for user and aggregation function that specify most significant attributes.
For example DB of cars: attributes:(price, mileage, age of car, ccm, fuel/mile, type of car...) and user values for example 10*price + 5*fuel/mile + 4*mileage + age of car, (s)he doesn't care about type of car and other. - this is aggregation specification
Then for each attribute (price, mileage, ...), there can be totally different "value-function" that specifies best value for user. So for example (price: lower, the better, then value go down, up to $50k, where value is 0 (user don't want car more expensive than 50k). Mileage: other function based on his/hers criteria, ans so on...
You can see that there is quite freedom to specify your preferences and acording to it, best k
elements in DB will be found quickly.
I've spent many sleepless night thinking about real-life usability. Who can benefit from that query db? But I failed to whomp up anything and sticking to only academic write-only stance. :-( I hope there can be some real usage for that, but I don't see any....
.... do YOU have any idea how to use that in real-life, real problem, etc...
I'd love to hear from You.