views:

601

answers:

5

My doubt is the following:

I will start to developing a Rails application that will accesss a lot of RSS feeds or crawl sites for data (most news). It will be something like Google News but with a different approach, so I'll store a lot of news (or news summaries), classify them in different categories and use ranking and recommending techniques.

  • So, should I go with MySQL

  • It's worthwhile to use IBM DB2 purexml to store the doucuments? Also Ruby search implementations (Ferret, Ultrasphinx and others) are not need If I choose DB2. That is right?

  • What are the advantages of PostreSQL in this?

  • It makes sense to use Couch DB in this scenario?

I'd like to choose the better option but without over-complicating the solution. So I discarded the idea to use two different storage solutions (one for the news documents and other for the rest of the data). I'm also considering only "free" options, so I didn't look at Oracle or MS SQL.

Thanks in advance.

A: 

purexml is heavier than SQL, so you pay more for your roundtrip between webserver and DB. If you plan to have lots of users, I'd avoid it, your better off letting your webserver cache the requests, thus avoiding creating xml(rss) everytime, if that is what you are thinking about.

I'd go with MySQL because its really good at serving and its totally free, well PostgreSQL is too, but haven't used it so I can't say.

CouchDB could make sense, but not if you plan on doing OLAP (Offline Analysis) of your data, a normal RDBMS will be better at it.

Robert Gould
Last time I checked, OLAP is an acronym for Online Analytical Processing
Noah Goodrich
I got the acronym wrong for sure, but what it practically means is you Analyze your data Offline(as in warehousing, not live servers), that's what I meant, not to O.L.A.P.==Offline Analysis. But thats for the comment and explaining the Acronym
Robert Gould
But it doesn't mean that you have to analyze data offline. In fact, there are any number of applications that perform both OLTP and OLAP functions and thus need databases or at least tables within the same database that are optimized for both uses.
Noah Goodrich
Just learned something new, thanks!
Robert Gould
+1  A: 

MySQL is probably one of the best options out there; light, easy to install and maintain, multiplatform and free. On top of that there are some good free client tools.

Something to think about; because of the nature of your system you will probably have some tables that will grow quite a lot very quickly so you might want to think about performance.

Thus, MySQL supports vertical partitioning but only from V 5.1. Keep that in mind.

Cheers,

Jacobo.

Jacobo
+2  A: 

Admitting firstly that I generally don't like mysql, I will say that there has been writing on this topic regarding postgres:

http://oldmoe.blogspot.com/2008/08/101-reasons-why-postgresql-is-better.html

This is always my choice when I need a pure relational database. I don't know whether a document database would be more appropriate for your application without knowing more about it. It does sound like it's something you should at least investigate.

Dustin
A: 

It sounds to me the application you will build can easily become a large-scale web app. I would suggest PostgreSQL, for it has been known for its reliability.

You can check out the following link -- Bob Ippolito from MochiMedia tells us why they ditched MySQL for PostgreSQL. Although the posts are more than 3 years old, the issues MySQL 5.1 has recently tend to prove that they are still relevant.

http://bob.pythonmac.org/archives/category/sql/mysql/

Cygwin98
A: 

MySQL is good in production. I haven't used PostgreSQL for rails, but it's a good solution as well.

In the dev and test environments I'd start out with SQLite (default), and perhaps migrate to your target DB in the test environment as you move closer to completion.

Jamal Hansen