views:

59

answers:

3

Hello stackoverflow,

I'm currently busy making a Python ORM which gets all of its information from a RDBMS via introspection (I would go with XRecord if I was happy with it in other respects) — meaning, the end-user only tells which tables/views to look at, and the ORM does everything else automatically (if it makes you actually write something and you're not looking for weird things and dangerous adventures, it's a bug).

The major part of that is detecting relationships, provided that the database has all relevant constraints in place and you have no naming conventions at all — I want to be able to have this ORM work with a database made by any crazy DBA which has his own views on what the columns and tables should be named like. And I'm stuck at many-to-many relationships.

First, there can be compound keys. Then, there can be MTM relationships with three or more tables. Then, a MTM intermediary table might have its own data apart from keys — some data common to all tables it ties together.

What I want is a method to programmatically detect that a table X is an intermediary table tying tables A and B, and that any non-key data it has must belong to both A and B (and if I change a common attribute from within A, it should affect the same attribute in B). Are there common algorithms to do that? Or at least to make guesses which are right in 80% of the cases (provided the DBA is sane)?

A: 

Theoretically, any table with multiple foreign keys is in essence a many-to-many relation, which makes your question trivial. I suspect that what you need is a heuristic of when to use MTM patterns (rather than standard classes) in the object model. In that case, examine what are the limitations of the patterns you chose.

For example, you can model a simple MTM relationship (two tables, no attributes) by having lists as attributes on both types of objects. However, lists will not be enough if you have additional data on the relationship itself. So only invoke this pattern for tables with two columns, both with foreign keys.

Gintautas Miliauskas
You're right about using MTM pattern in the object model, but... Two columns == no compound keys, no attributes == no "common data" bit. Most advanced DBAs will stone me, because I aim at an ORM which doesn't get in their way! :-)
Yaroslav Fedevych
What complicates things is that there are cases when such an intermediary table might have its own primary key, or it comes out that the same related table can be reached in several distinct ways.
Yaroslav Fedevych
Indeed there are many problematic cases, but as I said they can only be systematically uncovered by examining your MTM patterns at the object model level. You will never run into problems if you don't use any and just model every using a separate class, but I assume you want some smarter techniques. Start with analysis of the techniques rather than the relational model.
Gintautas Miliauskas
I want to know if there are some common techniques and have pointers to them. I'm aware there's no end all be all solution, there are bits scattered over, it's just damn hard to google those bits up (filtering out all Hibernate and RoR related cruft which is of no help most of the time). So maybe some fellow stackoverflowers might have their bookmarks handy just in case.
Yaroslav Fedevych
Then I'd suggest posting a different question: "What are common object-oriented approaches of modeling many-to-many relationships by an ORM?"
Gintautas Miliauskas
I doubt that deserves its own question. Won't it be better to edit the summary, mentioning old question wording in the body?
Yaroslav Fedevych
+1  A: 

If you have to ask, you shouldn't be doing this. I'm not saying that to be cruel, but Python already has several excellent ORMs that are well-tested and widely used. For example, SQLAlchemy supports the autoload=True attribute when defining tables that makes it read the table definition - including all the stuff you're asking about - directly from the database. Why re-invent the wheel when someone else has already done 99.9% of the work?

My answer is to pick a Python ORM (such as SQLAlchemy) and add any "missing" functionality to that instead of starting from scratch. If it turns out to be a good idea, release your changes back to the main project so that everyone else can benefit from them. If it doesn't work out like you hoped, at least you'll already be using a common ORM that many other programmers can help you with.

Just Some Guy
SQLAlchemy, however, does not play well with Twisted which is what I need, and it is complicated enough to be confident that missing bits won't be added anytime soon. I'm against the wheels, it's just that they're pushed on you when you expect them least.
Yaroslav Fedevych
I wasn't aware of that. Fair enough. But one think you might still consider is that while SQLAlchemy's autoload stuff works well, almost everyone disables it and explicitly declares their classes manually. As it turns out, fetching and parsing that information every time you connect to the database takes ages and is a real performance hit. Unless you have a very small number of tables, or you can write your ORM to cache its results, you should be aware of the (potential) slowdown.
Just Some Guy
Contrary to your typical web application, a Twisted server is initialized once, and the introspection happens only upon its startup. And then it is up for, hopefully, weeks. It is foolish to optimize startup if its time is negligible in comparison to the total estimated runtime. Besides, not having to update models manually on every DBA's whim (except when the tests actually start failing) is priceless in terms of time spent syncing.
Yaroslav Fedevych
A: 

So far, I see the only one technique covering more than two tables in relation. A table X is assumed related to table Y, if and only if X is referenced to Y no more than one table away. That is:

"Zero tables away" means X contains the foreign key to Y. No big deal, that's how we detect many-to-ones.

"One table away" means there is a table Z which itself has a foreign key referencing table X (these are easy to find), and a foreign key referencing table Y.

This reduces the scope of traits to look for a lot (we don't have to care if the intermediary table has any other attributes), and it covers any number of tables tied together in a MTM relation.

If there are some interesting links or other methods, I'm willing to hear them.

Yaroslav Fedevych