views:

176

answers:

8

A friend and I were debating about whether he should use MySQL or a flatfile database for his website's backend. I told him to go with MySQL because it was structured, held records well, and was consistent. He on the other hand said that he would rather go for speed. Reading files is a lot quicker than connecting to MySQL and it made me wonder whether he was right. For example, why not just create a folder for each table, like so: users/ groups/ posts/, within the folders have the files named by ID (1, 2, 3) and then for the data use a format like so: username: John\npassword: e2fc714c4727ee9395f324cd2e7f331f\nemail: [email protected]?

In other words, what are the advantages of MySQL over flatfiles?

+9  A: 

In other words, what are the advantages of MySQL over flatfiles?

MySQL offers indexes and joins (for execution performance), transactions (for data integrity) and SQL (for development performance).

It your project involves just a 3-line self-sufficient text file, you don't need MySQL.

Quassnoi
+2  A: 

What please is a "flatfile database"? A flat file is a flat file - nam it like this. SAying it is a flat file database makes you think it magically has some of the features of a database - which flat files per definition do not have.

what are the advantages of MySQL over flatfiles?

Skip MySQL here - the main question you ask is "why use a database at all".

I suggest you look into perfomance (sewarch operations - indices are there for a reason) and look up the term "ACID conditions" to get an even vague idea what a database actually DOES.

Flat files do not give you any guarantee, and decades of developers have prooven all the problems they have over and over again.

TomTom
+7  A: 

Reading files is a lot quicker than connecting to MySQL and it made me wonder whether he was right.

Hobcobbles. A database like mySQL stores its data in files as well, but features tons of optimizations, most obviously its indexing capabilities, allowing for huge performance increases compared to reading (or writing) a big flat file.

Flat files may be faster in certain very limited cases, but a database engine uses the experience of generations of developers working on making data access faster, and more reliable. Just think about race conditions and locking when two instances of your script try to write data into the database, for example.

If the amount of data used exceeds a few lines in a CSV file - or doesn't happen to be easily manageable in files like for example the pages of a Wiki - go with a database. It adds a layer of complication, but saves you a lot of headache.

Just think about doing a SELECT * FROM posts WHERE MONTH(post_date) = "2010-03-10" on a flat file quickly and what is necessary to write from scratch to achieve that.

Pekka
+1  A: 

There is also the matter of security. If you don't properly protect the flat files, they can be much more easily exposed. Especially if you are storing user information, there is no barrier to entry around the flat files.

Assuming your website or application grows vertically, flat files also do not scale, because the larger the flat files get the longer they take to read.

And lastly, using flat files when it's already so easy to use databases is quite simply a hack. It's not doing things the "right way" in that EVERYONE ELSE uses databases, so I would argue the opposite: Why use flat files over MySQL? Is someone else coming into to maintain your application after the fact going to understand or agree with your decision to use flat files?

jathanism
Just because everyone does something doesn't make it the right thing. The speed flatfiles provides can be quite large. Also, flat files themselves don't get larger because you have a separate file for each "row" of a MySQL database. As for security, it's a problem in MySQL too, you just have to learn how to prevent exploitation.
John
That is possibly true when it comes to following your friends off of the nearest bridge, but in the computing world I don't think that argument holds much weight. Computing is a scientific world, and one that has matured through doing things largely in the same way (even if slightly varied) that everyone else has been doing them. Paradigms don't become paradigms overnight.
jathanism
+1  A: 

We need a bit more of context.

If your friend is reading complete pages (Stored ad "blobs" in the DB) then yes, using MySql is not much of help. If he has granular data (including, I don't know, blog posts, newsitems, images with metadata, order details) then unless the site is very skimpy and very static, a file-based approach will soon become too limited.

You proposed solution has two big drawbacks:

Using folders/filenames is the same as having just one index on each table (in this case, the filename) so searching for any other criteria will take ages. Not to mention the fact that having lots of files in a single directory will start taxing the OS.

On top of that, security-by-filename is a bit of a security risk, even if you use the hashed pwd as part of the URL.

I did some filesystem-based medium-size applications in the past (due to mismanaged requirements we couldn't use a DB) and this is fun, but really very limiting as soon you go over a few hundreds files. And even with small numbers, you have to start pulling tricks from the start to have any hope to keep the thing working.

p.marino
You could always have files be called `id=1`, and then have `username=John` with contents `link to id=1`, and interpret the link as reading the other file.
John
Yes, and you could also rewrite the filesystem if you don't like how it works. A DBMS offers tried and tested features. If you don't need them, you can do with a filesystem. But if you are forced to reinvent a large number of wheels every time soimeone points out a problem with your design, maybe you better accept the fact that you really needed a DB?
p.marino
A: 

Also, without storing all the user informaiton inside the Posts/ folder, how do you get all the posts written by John Doe (for example)? In SQL it's just a joined select statement. With flat files, you either have to store the information inside the actual post file, or write the code to perform the join & search operations on your own.

AllenG
A: 

Just an example: consider that you have 1,000,000 customers, with address info and you need to search and set of customers that live in NY. If you stored each customer in separate file, than you would need to read all 1,000,000 files and see if a customer belongs to the state. If you stored all records in one huge file - you would need to read whole file and iterate to find all customers from NY.

In both cases you loose.

In case of RDBMS like MySql - you would use so called "set" operation or SELECT statement, with addition of indexes, the engine would probably only read 10/20% more data than needed to find all customers from NY.

Hope this helps

IMHO
A: 

Data redundancy and a lack of atomicity are big problems in flat file databases which manifest exponentially the more data it is required to hold and introduce latency in queries and other problems such as update/ delete/ insertion anomalies.

The Relational Data Model with Normalisation helps to negate these problems, by ensuring atomicity and that each record is uniquely identifiable (First Normal Form), that each field in a table is functionally dependant on the primary key (Second Normal Form) and that non-key fields don't share transitive dependencies on other fields in the table (Third Normal Form).

The Relational Data Model is by no means the only way of doing it, perhaps not even the best, but it certainly attempts to address the problems of query latency and anomalies inherent in flat files.

Yandawl