views:

760

answers:

8

lets assume the same environments for php5 working with MySQL5 and CSV files. MySQL is on the same host as hosted scripts.

Will the MySQL always faster than retriving/searching/changing/adding/deleting records to CSV ? Or is there some amount of data below which php+CSV performance is better than using database server ?

+4  A: 

CSV won't let you create indexes for fast searching.

If you always need all data from a single table (like for application settings), CSV is faster, otherwise not.

I don't even consider SQL queries, transactions, data manipulation or concurrent access here, as CSV is certainly not for these things.

Quassnoi
There are probably some other edge cases where CSV is faster... but this is the only one I could think of too. :)
jheriko
A: 

Databases are for storing and retrieving data. If you need anything more than plain line/entry addition or bulk listing, why not go for the database way? Otherwise you'd basically have to code the functionality (incl. deletion, sorting etc) yourself.

tehvan
A: 

CSV is an incredibly brittle format and requires your app to do all the formatting and calcuations. If you need to update a spesific record in a csv you will have to first read the entire csv file, find the entry in memory would need to change, then write the whole file out again. This gets very slow very quickly. CSV is only useful for write once, readd once type apps.

Ronny Vindenes
A: 

yea, i was considering only base operations, no indexes, autoincrements, keys, specified field types etc, just wanted to know which is faster for base operations on the table : CSV or db.

there is probably also a point at which using CSV even for the base stuff would be slower than db (large filesize, loooong records) and i was wondering what that point roughly can be (100k of records, 1kk of records, filesize since php has to fopen() it) ?

bat
bat, keep in mind this is a question-answer site, and as so you shouldn't reply to your own question (unless you have an answer). To reply to a specific post, add a comment.
ryeguy
What do you call "base operations"? Is updating a record a base operation? If you change the file size when updating the record (as when replacing "foo" with "foobar"), CVS will be more slow, as there is much work to move all file's data. MySQL handles all this for you in efficient way.
Quassnoi
+1  A: 

No, MySQL will probably be slower for inserting (appending to a CSV is very fast) and table-scan (non-index based) searches.

Updating or deleting from a CSV is nontrivial - I leave that as an exercise for the reader.

If you use a CSV, you need to be really careful to handle multiple threads / processes correctly, otherwise you'll get bad data or corrupt your file.

However, there are other advantages too. Care to work out how you do ALTER TABLE on a CSV?

Using a CSV is a very bad idea if you ever need UPDATEs, DELETEs, ALTER TABLE or to access the file from more than one process at once.

MarkR
A: 

As a person coming from the data industry, I've dealt with exactly this situation.

Generally speaking, MySQL will be faster.

However, you don't state the type of application that you are developing. Are you developing a data warehouse application that is mainly used for searching and retrieval of records? How many fields are typically present in your records? How many records are typically present in your data files? Do these files have any relational properties to each other, i.e. do you have a file of customers and a file of customer orders? How much time do you have to develop a system?

The answer will depend on the answer to the questions listed previously. However, you can generally use the following as a guidelines:

If you are building a data warehouse application with records exceeding one million, you may want to consider ditching both and moving to a Column Oriented Database.

CSV will probably be faster for smaller data sets. However, rolling your own insert routines in CSV could be painful and you lose the advantages of database indexing.

My general recommendation would be to just use MySql, as I said previously, in most cases it will be faster.

JP
A: 

From a pure performance standpoint, it completely depends on the operation you're doing, as @MarkR says. Appending to a flat file is very fast. As is reading in the entire file (for a non-indexed search or other purposes).

The only way to know for sure what will work better for your use cases on your platform is to do actual profiling. I can guarantee you that doing a full table scan on a million row database will be slower than grep on a million line CSV file. But that's probably not a realistic example of your usage. The "breakpoints" will vary wildly depending on your particular mix of retrieve, indexed search, non-indexed search, update, append.

To me, this isn't a performance issue. Your data sounds record-oriented, and MySQL is vastly superior (in general terms) for dealing with that kind of data. If your use cases are even a little bit complicated by the time your data gets large, dealing with a 100k line CSV file is going to be horrific compared to a 100k record db table, even if the performance is marginally better (which is by no means guaranteed).

Adam Bellaire
A: 

Depends on the use. For example for configuration or language files CSV might do better. Anyway, if you're using PHP5, you have 3rd option -- SQLite, which comes embedded in PHP. It gives you ease of use like regular files, but robustness of RDBMS.

vartec