views:

138

answers:

3

Back-Story

On a current project, I am using MySQL and SQLite in combination with each other. I'm currently giving each user their own SQLite database to get around my provider's 1GB MySQL DB limit. It's worked out alright and the performance is good, but I know for a fact that continual maintenance of these flat-file databases will be a nightmare in the future.

SQLite is surprisingly powerful, and supports some pretty complex SQL queries. However, I'm looking at MongoDB to hop on board with a little NoSQL for my user's bulk data. Each user could generate 60,000 rows or more. With a continually growing number of users, I get to worry about performance in the future.

-

Complexity

My worry with MongoDB and the other NoSQL databases is that they seem more limited in what kind of query operations they support. It's no big deal if you just need straightforward, simple bulk queries, however I've got to do some more complex joining and filtering (unions, case-sensitivity, groupings, the occasional join, etc...).

My example query is attempting to select a list of tracks by artist. The main problem is that these artist names may not match. For example some people tag as "A Day to Remember" and some people tag as "A Day To Remember". With a case sensitive query, this causes multiple records that are "different" but really the same thing to come back. Typically I trim and LOWER() the fields to properly group them together.

-

Performance

I created two fresh, new databases on my local machine. One for MongoDB, and one for MySQL. I'm talking to them with PHP since that's what my end result will have to use. Each database only has about 9,000 records in it, so it's not terribly large at this point.

I ran a few tests on my machine and came up with disappointing results for MongoDB. Let's consider these three queries...

#1 - MongoDB: ~14ms, incorrect results

$query = array('artist' => 'A Day to Remember');
$cursor = $collection->find($query);
foreach ($cursor as $row) {
    echo $row['artist'] . ' - ' . $row['album'] . ' - #'. $row['track'] . ' ' . $row['title'] . "\r\n";
}

#2 - MongoDB: ~170ms, correct results

$query = array('$where' => "this.artist.toLowerCase() == 'a day to remember'");
$cursor = $collection->find($query);
foreach ($cursor as $row) {
    echo $row['artist'] . ' - ' . $row['album'] . ' - #'. $row['track'] . ' ' . $row['title'] . "\r\n";
}

#3 - MySQL: ~18ms, correct results

$sql = "select artist, album, track, title from radio_files where lower(artist) = 'a day to remember'";
$stmt = $mysqldb->prepare($sql);
$stmt->execute();
while($row = $stmt->fetch(PDO::FETCH_ASSOC))
{
    echo $row['artist'] . ' - ' . $row['album'] . ' - #'. $row['track'] . ' ' . $row['title'] . "\r\n";
}

-

Discussion

Maybe I'm simply not querying correctly for #2, but just look at how the Javascript query engine kills it. There's not even very many records for it to deal with here in total: just under 9,000 in all of the database.

My main question is this: what is going to be more reliable and performant in the end and still suit my needs? As my project's userbase grows, I'm looking to leave my limited server and get something dedicated anyways. With my own MySQL installation I should be able to maintain my own large MyISAM tables with little relational data and proper indexing.

But with millions of records in the database, what happens to MySQL's performance? Thoughts, comments, and general discussion about this are encouraged. Thanks!

+2  A: 

The different NoSQL solutions differ between each other much more than traditional SQL databases differ, but MongoDB is actually one of the most feature-rich of the lot, especially when it comes to query complexity.

However, you shouldn't blindly go for a NoSQL solution just because you are expecting 60,000 rows per user. MySQL, and the other popular relational DBMSes can handle billions of rows without problems.

Relational databases come with plenty of important features (ACID guarantees and complex queries for example), and if you need these features, you might as well use an SQL database. NoSQL is usually a trade-off between some of those features (or all of them), and ease of horizontal scalability. If you can expect to manage the scalability problem for your system using a relational DBMS, then I would seriously consider sticking to SQL.

I'm currently giving each user their own SQLite database to get around my provider's 1GB MySQL DB limit.

You could also consider switching providers. A host that applies such limits will probably limit you in some other way eventually.

Daniel Vassallo
+1 for suggesting switching providers. If he's at the point where each user can generate 60k rows, then it's probably time to upgrade to a real hosting provider.
mellowsoon
I agree that NoSQL is in no way a silver bullet. I was mostly concerned that having too many records would eventually lead to a slow database. Your point about the billions of rows is well taken, and I think I'll be sticking with MySQL.
jocull
My provider is GoDaddy, and yes it does have a lot of limits. I'm bootstrapping with what I have for now, because it fits my budget and I can make it work. It's 1200% cheaper than a dedicated host. When my business grows getting improved hosting is one of the first things I'm going for.
jocull
@jocull: it may be cheaper than a dedicated host, but it's not significantly cheaper than a virtual server which you can get for as low as $5 per month (plus you get almost all the benefits of having a dedicated server). Get improved hosting first, it's worth it...
ircmaxell
Can you send me a link to where you can get hosting that cheap? I've never seen anything like it, and I'm curious how reliable it would be?
jocull
+4  A: 

You have to store a value twice if you want to do a case insensitive search on this value in Mongodb. Once normal and once in lowercase for indexing and searching.

Mongodb has a rich query language (compared to other nosql systems) and you can index every (combination) of columns. I do however find mapreduce to be slow, but as long you can solve your problem without mapreduce you are fine.

TTT
Is this really the preferred method? Seems like it would eat up a lot of extra data over time.
jocull
I too do this. Instead of having to perform multiple queries (because there's no JOINs remember), I opted to go w/ storing my data again. Case in point, when saving author name for blog comments, along w/ the _id of the user, I also store the username. This saves an extra trip to the DB if I had to loop through the records and then pull out the user name from the users collection.
luckytaxi
@jocull, But is is the same story in sqlite. Sqlite can't do a ToUpper() on unicode strings so I store values twice.
TTT
Odd? I've been using the lower(x) function in SQLite with pretty good success. It may not work as correctly for unicode, but the docs say you can load an extension to do it. http://www.sqlite.org/lang_corefunc.html
jocull
@jocull, I didn't know about this extension. Maybe I will test it, I can test it for example with the Turkish dotless i.
TTT
+2  A: 

Try with a regular expression :

$regex = new MongoRegex('/^' . preg_quote('a day to remember'). '$/i');
$query = array('artist' => $regex);
$cursor = $collection->find($query);
Maxence
Wow, good call! This one actually only takes 7ms to run. It's the fastest one thus far!
jocull