views:

178

answers:

2

Hi guys,

I am entering a project to make a Opinion Mining (Data Mining -> Web Mining -> Opinion Mining) to get semantic orientation of the words contained. We will use a crawler to get the pages opinion. Now the question is, what type of DataBase should I use (OO, Relational, hierachycal, etc), is best to use in this type of project. I know this is a specific question, Im not expecting everybodies response but at least someone that already did it, that would help.

Regards!

A: 

If you need something large scale and responsive, you would probably need to go for Google's BigTable or something of that nature. At the prototype level, I am sure you can use traditional relational databases, but at certain point you'd hit the performance wall. See Brewer's CAP Theorem.

eed3si9n
Yes, it you are looking for such huge systems and data to analyze then certainly you are trying to do something that relational (+ row based) databases are not good at doing.In fact Facebook also has a column oriented database called Cassandra - http://incubator.apache.org/cassandra/ (which unlike Google's BigTable is open source) for utilizing in such kind of scenarios.
Aayush Puri
I doubt such a system will have the hard requirements to warrant a noSQL approach.
Vinko Vrsalovic
A: 

From my experience in such kind of scenarios a relational database can serve your purpose pretty well. You need to be extra careful when storing the web content part of it - whether you want to at all use a database to store it or will storing on as simple as a file system can do. BLOBs specially require extra care and they increase your maintenance work.

Also based on the nature of the project, you would certainly be using a lot of already built in components etc. many of which would already support/easy to extend to use a relational DB as a data store.

Aayush Puri