views:

35

answers:

1

i am building a recommendation system in my application and i am probably going to use apache mahout, i ve to collect a big dataset, it ll be collected over a period of time...so which one is least expensive between collecting it in some sort of log file vs collecting in a DB and exporting it when i need it

+1  A: 

Mahout's recommender code can read directly from a database or a file -- if the data is reasonably formatted. It won't read general log files; they need to be translated into simple CSV or TSV. But it can read just about any table that contains users/items/preferences.

If you're already putting your data into a database table, I'd say leave it there and don't duplicate it or export it needlessly. You will probably want to have Mahout suck all that into memory, if possible.

If you're not already storing this data, and want to choose a simple and efficient representation, then I'd suggest you extract the user/item/preference information and store them in simple CSV files, compressed with gzip. These can be used easily with Mahout too and will be simpler and more compact than full log files or a database.

Sean Owen