views:

19

answers:

1

Consumer profiles with analytical scores [ConsumerID, 1..n demographical variables, 1...n analytical scores e.g. "likely to churn" "likely to buy an item > 100$ in worth" etc.] have to be possible to query fast if they are to be used in customizing web-sites, consumer communications etc.

Well. If you have:

  1. Large number of consumers
  2. Large profiles with a huge set of variables (as profiles describing human behaviour are likely to be..)

...you are in trouble. If you really have a physical relational database to which you target a query and then a physical disk starts to rotate someplace to give you an individual profile or a set of profiles, the profile user (a web site customizing a page, a recommendation engine making a recommendation..) has died of boredom before getting any observable results.

There is the possibility of having the profiles in memory, which would of course increase the performance hugely. What are the most proven solutions for a fast-response, scalable consumer profile storage? Is there a shootout of these someplace?

A: 

Folks associated with Apache Mahout are working on this problem at large scale, commonly using HDFS for scaled storage.

bmargulies
Thanks for the comment, however: "working on this problem" == *proven solution* for a fast-response, scalable consumer profile storage?Just asking.
Hubbard
Some of the Mahout contribs are responsible for giant, deployed, solutions.
bmargulies
A meager attempt to improve the odds of hitting the queried profile(s) is of course the possibility of having some type of a "reel" - that is, you take a random sample off the disk constantly, holding a part in memory, and target the query to the DB only for the parts you don't have in memory. But this seems to me as a work-around, not a solution..
Hubbard