Back-Story
On a current project, I am using MySQL and SQLite in combination with each other. I'm currently giving each user their own SQLite database to get around my provider's 1GB MySQL DB limit. It's worked out alright and the performance is good, but I know for a fact that continual maintenance of these flat-file databases will be a nightmare in the future.
SQLite is surprisingly powerful, and supports some pretty complex SQL queries. However, I'm looking at MongoDB to hop on board with a little NoSQL for my user's bulk data. Each user could generate 60,000 rows or more. With a continually growing number of users, I get to worry about performance in the future.
-
Complexity
My worry with MongoDB and the other NoSQL databases is that they seem more limited in what kind of query operations they support. It's no big deal if you just need straightforward, simple bulk queries, however I've got to do some more complex joining and filtering (unions, case-sensitivity, groupings, the occasional join, etc...).
My example query is attempting to select a list of tracks by artist. The main problem is that these artist names may not match. For example some people tag as "A Day to Remember" and some people tag as "A Day To Remember". With a case sensitive query, this causes multiple records that are "different" but really the same thing to come back. Typically I trim and LOWER() the fields to properly group them together.
-
Performance
I created two fresh, new databases on my local machine. One for MongoDB, and one for MySQL. I'm talking to them with PHP since that's what my end result will have to use. Each database only has about 9,000 records in it, so it's not terribly large at this point.
I ran a few tests on my machine and came up with disappointing results for MongoDB. Let's consider these three queries...
#1 - MongoDB: ~14ms, incorrect results
$query = array('artist' => 'A Day to Remember');
$cursor = $collection->find($query);
foreach ($cursor as $row) {
echo $row['artist'] . ' - ' . $row['album'] . ' - #'. $row['track'] . ' ' . $row['title'] . "\r\n";
}
#2 - MongoDB: ~170ms, correct results
$query = array('$where' => "this.artist.toLowerCase() == 'a day to remember'");
$cursor = $collection->find($query);
foreach ($cursor as $row) {
echo $row['artist'] . ' - ' . $row['album'] . ' - #'. $row['track'] . ' ' . $row['title'] . "\r\n";
}
#3 - MySQL: ~18ms, correct results
$sql = "select artist, album, track, title from radio_files where lower(artist) = 'a day to remember'";
$stmt = $mysqldb->prepare($sql);
$stmt->execute();
while($row = $stmt->fetch(PDO::FETCH_ASSOC))
{
echo $row['artist'] . ' - ' . $row['album'] . ' - #'. $row['track'] . ' ' . $row['title'] . "\r\n";
}
-
Discussion
Maybe I'm simply not querying correctly for #2, but just look at how the Javascript query engine kills it. There's not even very many records for it to deal with here in total: just under 9,000 in all of the database.
My main question is this: what is going to be more reliable and performant in the end and still suit my needs? As my project's userbase grows, I'm looking to leave my limited server and get something dedicated anyways. With my own MySQL installation I should be able to maintain my own large MyISAM tables with little relational data and proper indexing.
But with millions of records in the database, what happens to MySQL's performance? Thoughts, comments, and general discussion about this are encouraged. Thanks!