views:

505

answers:

2

I have a database full of reviews of various products. My task is to perform various calculation and "create" another "database/xml-export" with aggregated data. I am thinking of writing command line programs in python to do that. But I know someone have done this before and I know that there is some open source python solution or similar which probably gives lot more interesting "aggregated data" then I can possibly think off.

The problem is I don't really know much about this area other then basic data manipulation from command line nor I know what are the terms I should use to even search for this thing.. I am really not looking for some scientific/visualization stuff (not that I don't mind if the tool provides), something simple to start with and gradually see/develop stuff what I need.

My only requirement is either the "end aggregated data" be in a database or export as XML file no proprietary stuff. Its a bit robust then my python scripts as I have to deal with "lots" of data across 4 machines.

Any hint where should I start my research?

Thanks.

+1  A: 

What kind of analysis are you trying to do?

If you're analyzing text take a look at the Natural Language Toolkit (NLTK).

If you want to index and search the data, take a look at the whoosh search engine.

Please provide some more detail on what kind of analysis you're looking to do.

lost-theory
In general terms, I have date/time, text (i.e. review) and comments/replies of that review (so its a bit like threaded comments), in some cases I have URL in the reviews, and other stuff related to user like his points etc, etc. Now, I definitely want some sort of NLP to analyze the texts. Also I would like to have extracted/calculated value like number of reviews in "computers" category, how often the comments are apart etc to start with. I hope it gives you bit more information. I will have a look at those you mentioned above. Thanks.
wailer
+1  A: 

Looks like you are looking for a Data Integration solution.
One suggestion is the open source Kettle project part of the Pentaho suite.
For python, a quick search yielded PyDI and SnapLogic

Amro
This Pentaho, sounds interesting. I guess it cost a lot!!..
wailer
Absolutely not, there is the open source community edition (without the support):http://community.pentaho.com/
Amro