views:

180

answers:

4

I've been using mysql (with innodb; on Amazon rds) because it's sort of universal default, but it's been ridiculously under-performing, and tweaking it only delays the inevitable.

The data is mostly relatively short (<1kB of bytes each) blobs information about 100Ms of urls. There is (or should be, mysql cannot seem to handle it) very high amount of insert / update / retrieve but few complex queries - not that complex queries wouldn't be useful, but because mysql is so slow that it's far faster to get the data out, process it locally, and cache the results somewhere.

I can keep tweaking mysql and throwing more hardware at it, but it seems increasingly futile.

So what are the options? SQL/relational model/etc. optional - anything will do as long as it's fast, networked, and language-independent.

+1  A: 

Well "Fast, networked and language-independent" + "few complex queries" brings to mind the various NoSQL solutions. To name a few:

And if that's not fast enough, there are always the wicked fast Redis which is my personal favorite atm. :) It is not a database per se, but it's good enough for most scenarios.

I am sure other people can list more NoSQL databases...
and there is always http://nosql-database.org/ .

Generally speaking, databases in this category is better and faster in your scenario because they have relaxed constraints and thus is easier and faster to insert/update/retrieve frequently. But that requires that you think harder about your data model and it is generally not possible to do SQL-style complex queries directly -- you'll instead write more pre-computed data or use a more denormalized design to account for the lack of complex queries.

But since complex queries is a minor problem in your case, I think NoSQL solutions are ideal for you.

chakrit
Which NoSQL solutions handle 100GBs of data well?
taw
Well, you can try Apache Cassandra... From Cassandra website: "Cassandra is in use at Digg, Facebook, Twitter, Reddit, Rackspace, Cloudkick, Cisco, " ...
chakrit
@taw Regarding your comment on `ALTER TABLE`, I want to add that some of these solutions such as CouchDB are schema-less ... which means that one document can look different from another--there are no enforced schema--which may help in your case.
chakrit
NoSQL databases might be more suitable for you but you will still have issues if you do not make an effort to understand the performance profile of your overall application. These databases bring different sets of issues that you need to be aware of. They are far from a slot in solution and need just as much DBA support (if not more).
BrianLy
The OP does care about his choice of database and is working with a very large datasets. That alone should hint you that the OP can and will try to understand the performance profile of the application before he puts it to use. **NOTHING** is a slot in there is no need to state such general assumptions.
chakrit
+1  A: 

Have you done any sort of end-to-end profiling of your application and MySQL database? To provide better advice it would also be good to understand what improvements you have tried to implement, and your database structure. You haven't given a lot of information on how your MySQL database is configured either. It provides a lot of options for tuning.

You should pick up a copy of High Performance MySQL if you haven't already to learn more about the product.

There is no point in doing anything until you know what your problem is. NoSQL solutions can offer performance benefits but you have provided little evidence that MySQL is incapable of servicing your needs.

BrianLy
While I'd love to do any kind of end-to-end profiling, simple ALTER TABLE ADD KEY on 100GB+ table takes literally days, so I'm extremely limited in what kind of testing I can do. I see no hope of it ever scaling to multi-TB sizes.
taw
You need to do some more reading. This will help you profile mysql http://serverfault.com/questions/3120/how-do-i-profile-mysql
BrianLy
+1  A: 

With the data you've given about your application's data and workload, it is almost impossible to determine whether the problem really is MySQL itself or something else. You seem to assume that you can throw any workload to a relational engine and it should handle it. Therefore the suggestions made by other commenters about analyzing the performance more carefully are valid in my opinion. Without more data (transactions / second etc.) any further analysis regarding other suitable engines is also futile.

Kyberias
+1  A: 

I'm not sure I agree with the advice to jump ship on traditional databases. It might not be the most efficient tool, but it is the one that is FAR more widely understood and used, and a strongly doubt you have a problem that can't be handled by an efficiently set up relational database. Obvious answers are Oracle, SQLServer, etc, but it might just be your database structure isn't right. I don't know much about MySQL but I do know it's used in some pretty big projects (eBay being noteworthy).

John
Can you consider finding a MySQL expert you can pay to spend a couple of days reviewing your DB?
John