views:

700

answers:

7

I do datamining and my work involves loading and unloading +1GB database dump files into MySQL. I am wondering is there any other free database engine that works better than MySQL on huge databases? is PostgreSQL better in terms of performance?

I only use basic SQL commands so speed is the only factor for me to choose a database

+4  A: 

It is unlikely that substituting a different database engine will provide a huge increase in performance. The slow down you mention is more likely to be related to your schema design and data access patterns. Myabe you could provide some more information about that? For example, is the data stored as a time series? Are records written once sequentially or inserted / updated / deleted arbitrarily?

Adamski
A: 

HDF is the storage choice of NASA's Earth Observing System, for instance. It's not exactly a database in the traditional sense, and it has its own quirks, but in terms of pure performance it's hard to beat.

Joonas Pulakka
+1  A: 

I am using PostgreSQL with my current project and also have to dump/restore databases pretty often. It takes less than 20 mins to restore 400Mb compressed dump. You may give it a try, although some server configuration parameters need to be tweaked to comply with your hardware configuration. These parameters include, but not limited to:

  • shared_buffers
  • work_mem
  • temp_buffers
  • maintenance_work_mem
  • commit_delay
  • effective_cache_size
akrasikov
Neither temp_buffers or commit_delay are effective for most people. The whole feature commit_delay is involved in doesn't really work, and the default sizing of temp_buffers is good enough unless you're really relying heavily on temporary data.The best guide to server parameter tuning for PostgreSQL right now is at http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
Greg Smith
+2  A: 

As long as you drop indexes before inserting huge data, should not be much difference between those two.

M. Utku ALTINKAYA
+1  A: 

If your datamining tool will support it, consider working from flat file sources. This should save most of your import/export operations. It does have some caveats, though:

  • You may need to get proficient with a scripting language like Perl or Python to do the data munging (assuming you're not already familiar with one).

  • You may need to expand the memory on your computer or go to a 64-bit platform if you need more memory.

  • Your data mining tool might not support working from flat data files in this manner, in which case you're buggered.

Modern disks - even SATA ones - will pull 100MB/sec or so off the disk in sequential reads. This means that something could inhale a 1GB file fairly quickly.

Alternatively, you could try getting SSDs on your machine and see if that improves the performance of your DBMS.

ConcernedOfTunbridgeWells
Or just upgrade your memory to 8GB, create a sufficiently large tmpfs RAM disk, and do all the random accesses associated with the operation in a random access friendly medium? That's probably the cheapest way to proceed, seeing that 2GB RAM sticks go for €20.
mikaelhg
A: 

Your question is too ambiguous to answer usefully. "Performance" means many different things to different people. I can comment on how MySQL and PostgreSQL compare in a few areas that might be important, but without information it's hard to say which of these actually matter to you. I've written up a bunch more background information on this topic at Why PostgreSQL Instead of MySQL: Comparing Reliability and Speed. Which is faster certainly depends on what you're doing.

Is the problem that loading data into the database is too slow? That's one area that PostgreSQL doesn't do particularly well at, the COPY command in Postgres is not a particularly fastest bulk-loading mechanism.

Is the problem that queries run too slowly? Is so, how complicated are they? On complicated queries, the PostgreSQL optimizer can do a better job than the one in SQL, particularly if there are many table joins involved. Small, simple queries tend to run faster in MySQL because it isn't doing as much thinking about how to execute the query before beginning; smarter execution costs a bit of overhead.

How many clients are involved? MySQL can do a good job with a small number of clients, at higher client counts the locking mechanism in PostgreSQL might do a better job.

Do you care about transactional integrity? If not, it's easier to turn more of those features off in MySQL, which gives it a significant speed advantage compared to PostgreSQL.

Greg Smith
+1  A: 

If you are doing datamining, perhaps you could use a document-oriented database. These are faster than relational databases if you do not use my SQL.

MongoDB and CouchDB are both good options. I prefer MongoDB because I don't know Java, and found CouchDB easier to get up and running.

Here are some articles on the topic:

Joseph Turian