tags:

views:

105

answers:

3

I have an Oracle database (roughly 1.2 billion records) of data with a web application sitting on top of it that generates queries (generates SQL code and returns counts). Basically you generated SQL queries graphically through an AJAX UI...and it runs pretty nice performance-wise.

This is roughly a 400 GB database. I've been looking at Hadoop and thinking about using it instead of Oracle (have my app generate HIVE query code), BUT it seems to me like it's an overkill....isn't hadoop targeted more towards tens of terabytes to petabyte scale datasets? Is it suitable in place of a relational database (like Oracle) for the task I'm doing??

A: 

isn't hadoop targeted more towards tens of terabytes to petabyte scale datasets?

Maybe. But it's suitable to a wide variety of problems. It's also suitable for very small datasets where the Hadoop "functional" style of programming helps.

SQL is not the perfect query language. It's just widely-adopted.

Is it suitable in place of a relational database (like Oracle) for the task I'm doing??

Without too many requirements, it's almost impossible to tell. However, if you're doing transactional stuff with lots of inserts, updates and deletes, then SQL RDBMS is probably necessary.

If you're not doing complex transactions; if you're doing bulk loads and bulk queries, then the database is getting in your way. The file system will be faster. And often simpler.

S.Lott
+1  A: 

Basically if something isn't broke don't try to repair it. From what i read in wikipedia it definitely is an overkill, but other than that you're saying the application "runs pretty nice performance-wise."

armonge
+1  A: 

It's hard to say without more details. However, in my experience, if all your data is in SQL than your SQL engine probably has more optimizations than simple map reduce has.

Without knowing what you want to crunch exactly and the state of the data, then unless you are hitting some major edge case with your environment, you probably would have more trouble setting up and using hadoop in your case and it would probably wouldn't end up taking a lot longer.

If all your data in Oracle, it's probably all parsed, indexed, and hopefully somewhat regular. If the crunching exists entirely in that domain (and you are not trying to work with something uncommon like massive BLOBs or other weird situtations), most of the time its better letting your database engine handle it.

Moral of the story:

Hadoop is really awesome but it's not magic and doesn't make regular old SQL faster!

Zac Bowling