I'm building Ruby on Rails 2.3.5 app. By default, Ruby on Rails doesn't provide foreign key contraints so I have to do it manually. I was wondering if introducing foreign keys reduces query performance on the database side enough to make it not worth doing. Performance in this case is my first priority as I can check for data consistency with code. What is your recommendation in general? do you recommend using foreign keys? and how do you suggest I should measure this?
Generally speaking, more keys (foreign or otherwise) will reduce INSERT/UPDATE performance and increase SELECT performance.
The added benefit of data integrity, is likely just about always worth the small performance decrease that comes with adding your foreign keys. What good is a fast app if the data within it is junk (missing parts or etc)?
Found a similar query here: http://stackoverflow.com/questions/507179/does-foreign-key-improve-query-performance
Assuming:
- You are already using a storage engine that supports FKs (ie: InnoDB)
- You already have indexes on the columns involved
Then I would guess that you'll get better performance by having MySQL enforce integrity. Enforcing referential integrity, is, after all, something that database engines are optimized to do. Writing your own code to manage integrity in Ruby is going to be slow in comparison.
If you need to move from MyISAM to InnoDB to get the FK functionality, you need to consider the tradeoffs in performance between the two engines.
If you don't already have indicies, you need to decide if you want them. Generally speaking, if you're doing more reads than writes, you want (need, even) the indicies.
Stacking an FK on top of stuff that is currently indexed should cause less of an overall performance hit than implementing those kinds of checks in your application code.
You should define foreign keys. In general (though I do not know the specifics about mySQL), there is no effect on queries (and when there is an optimizer, like the Cost based optimizer in Oracle, it may even have a positive effects since the optimizer can rely on the foreign key information to choose better access plans). As per the effect on insert and update, there may be an impact, but the benefits that you get (referential integrity and data consistency) far outweight the performance impact. Of course, you can design a system that will not perform at all, but the main reason will not be because you added the foreign keys. And the impact on maintaining your code when you decide to use some other language, or because the business rules have slightly changed, or because a new programmer joins your team, etc., is far more expensive than the performance impact. My recommendation, then, is yes, go and define the foreign keys. Your end product will be more robust.
Two points:
1. are you sure that checking integrity at the application level would be better in terms of performance?
2. run your own test - testing if FKs have positive or negative influence on performance should be almost trivial.
It is a good idea to use foreign keys because that assures you of data consistency ( you do not want orphan rows and other inconsistent data problems).
But at the same time adding a foreign key does introduce some performance hit. Assuming you are using INNODB as the storage engine, it uses clustered index for PK's where essentially data is stored along with the PK. For accessing data using secondary index requires a pass over the secondary index tree ( where nodes contain the PK) and then a second pass over the clustered index to actually fetch the data. So any DML on the parent table which involves the FK in question, will require two passes over the index in the child table. Ofcourse, the impact of the performance hit depends on the amount of data, your disk performance, your memory constraints ( data/index cached). So it is best to measure it with your target system in mind. I would say the best way to measure it is with your sample target data, or atleast some representative target data for your system. Then try to run some benchmarks with and without FK constraints. Write client side scripts which generate the same load in both cases.
Though, if you are manually checking for FK constraints, I would recommend that you leave it upto mysql and let mysql handle it.