views:

127

answers:

4

I have an application with a lot of database relationships that depend on each other to successfully operate the application. The hinge in the application is a model called the Schedule, but the schedule will pull Blocks, an Employee, a JobTitle, and an Assignment (in addition to that, every Block will pull an assignment from the database along with it as well) to assemble an employees schedule throughout the day.

When I built the app, I put a lot of emphasis on validations that would ensure that all of the pieces had to be in place before everything was saved to the database. This has worked out fantastically so far, and the app has been live and pounded on for almost 6 months, serving approximately 150,000 requests a month with no hiccups or errors. Until last week.

Last week, while someone was altering a schedule, it looks like the database erred, and a Schedule was saved to the database with its Assignment missing. Because the association is called in every view, whenever this schedule was called from the database, the application would throw an NoMethod error for calling on nil.

When designing an application in the way that I state, do you guard against a possible failure on the part of the database/validations? And if so how do you programatically defend against it? Do you check every relationship to make sure that it is not nil before sending it to the view?

I know this question is awash in generality, and if I can be more specific in what I mean, please let me know in the comments.

+4  A: 

I would recommend adding database-enforced foreign key constraints and wrapping important groups of operations into transactions.

If there is a foreign-key between Schedule and Assignment somewhere, a database-enforced foreign key constraint would have prevented the errant insert. Additionally, if you wrap the particular action in a transaction, you can be sure that either the entire stream of inserts/updates/deletes happens or fails, reverting to a clean state.

John Douthat
IIRC, in most DBs, FK constraints can be applied to columns that are nullable, in which case it would not protect the OP from the problem. Perhaps there's something specific about Ruby/Rails/the particular DB that would prevent this... my answer assumed that FK constraints already exist.
rmeador
I am not using database enforced FK constraints, only Rails validations to ensure the child objects are created and exist. Thanks for the info! I will look into FK constraints! I already wrap my multiple database operations up in a transaction, but my database has to support transactions to actually receive this benefit, isn't that correct?
BushyMark
This is why the DBMS should enforce the constraints - in case there's a bug in one of the indefinite number of applications has a glitch in it. It is not completely clear whether the schedule has a foreign key column that would reference the assignment table; if it did, though, the insertion would have failed and the problem could not occur. It is because there is one DBMS and probably many applications sharing the data that the validation should be done by the DBMS, because the apps all use the DBMS, and you therefore get maximal consistency.
Jonathan Leffler
You are correct Jonathan, the Schedule table does contain the foreign key for the assignment table. Is it redundant to have model validations in addition to FK constraints? Will the Rails error reporting work with DBMS FK constraints, or do I continue using Rails validations in my models to double up on my protection? PS, thanks once again for all the info!
BushyMark
I'd say it's the database that's providing the redundant checks (which is OK - we all need our backup resources) - but you shouldn't have database exceptions popping up to the UI. And the database is a lot harder to unit test when the action is taking place in the business layer.
le dorfier
Always wrap multiple, related inserts in a transaction - doing this would have prevented the issue you're fixing!
Jarrod Dixon
A: 

If it's something that must be true in order for the app to function, that's really what assert()s are for. I've barely ever used Ruby, but I imagine it must have that concept. Use them to enforce preconditions in various places throughout your code. That combined with sanitizing and validating your external (user) inputs should be enough to protect you. I think if something goes wrong after that amount of checking, your app is righteously allowed to crash (in a controlled manner, of course).

I doubt the problem you're experiencing is a bug in your database. More likely there's some edge case in your validations that you've overlooked.

rmeador
+5  A: 

In addition to your validations, and adding some database constraints as mentioned in other answers, you might also run a background job that periodically sweeps the database looking for orphans.

When it finds one, it cleans it up (if possible), or deletes it, or just marks it inactive and sends you email so you can look at it later. Depending on the amount and nature of your data, once a minute, once an hour, once a day...

That way, if bad data does get in despite whatever safeguards you have in place, you'll know about it sooner rather than later.

Sarah Mei
We do that here on Stack Overflow - as we denormalize a lot of data for query performance, we have a few background tasks that will ensure the data's validity.
Jarrod Dixon
This is a fantastic idea as well, I am already using backgroundrb to sweep the database to remove data . . . adding another worker to ensure I don't have orphaned objects would be a great check against this! Thanks!
BushyMark
It's a good idea to have consistency-checking routines. But it should be for the purpose of fixing the problem, more than cleaning up the debris. If you keep seeing the same bad data, things are not under control yet.
le dorfier
+1  A: 

I'll argue the non-conventional wisdom on this. The constraints you describe don't belong in the database, they belong in your OO code. And it's not true that "the database erred", it's unquestionably true that the application is what inserted improperly validated data.

When you start expecting the database to carry the burden of these checks, you're putting business rules into the schema. At a minimum, this makes it a lot harder to write unit tests (which is where you should probably have caught this in the first place; but now is your chance to add another test.)

Ideally, you should be able to replace the RDBMS with some other generic data store and still have all the functional logic properly active and unchanged in the appropriate other places. The UI shouldn't be talking to the DAL much less dealing with database exceptions directly.

You can add the additional database constraints if you want, but it should be strictly as a backup. As you can see, handling database structural errors gracefully (especially if the UI is involved) is a lot harder.

le dorfier
In general I agree with you, but it becomes an issue with things like validates_uniqueness_of. It can't actually guarantee uniqueness because of multiple simultaneous processes.
Sarah Mei
It's not a problem with simultaneous processes per se, but perhaps multiple Ruby processes (I don't use ActiveRecords so I can't say.) But I have implemented pretty much the scheduling app you describe, and I did need to design it to allow for simultaneous users, and it didn't require any heroic database programming. I just hate to see an upvoted popular answer implying that it's inevitable that the database must deal inevitably with it because it can't be handled in the proper abstraction layers. Unfortunately, though, that's the conventional wisdom.
le dorfier
Since Rails is single-threaded, a conventional Rails installation may have hundreds of separate Ruby processes hitting the database. So in between when validates_uniqueness_of checks for uniqueness and when the record is saved, an identical record may be saved by another process. Right now, if you actually want to guarantee uniqueness in Rails (and you might not - that's a whole separate post!), a database constraint is sadly your only choice.
Sarah Mei
I'll take these one at a time if you find it useful. 1) You say "Do you check every relationship to make sure that it is not nil before sending it to the view?" You have a couple options here. First, you are selecting a Schedule that thinks it has an Assignment, but finds what? the Assignments key value? null. Then why are you pulling it? Your query should not select Schedules without an Assignment, if the context requires there be one. 2) What exactly is the "uniqueness" check you refer to? I don't think it's described in the question.
le dorfier
3) Can there be Assignments without Schedules, just as there are Schedules without Assignments? If not, then the Assignment record should hold the key to the Schedule, not vice versa. To see if a Schedule is Assigned, you should join the tables. (I'd probably do it this way anyway.)
le dorfier