views:

343

answers:

4

I'm interested in some of the design behind Rails ActiveRecord, Doctrine for PHP (and similar ORMs).

  • How does an ORM manage to accomplish features like chained accessors and how deep are they typically expected to work?
  • How does an ORM construct queries internally?
  • How does an ORM manage the queries while sustaining the arbitrary nature of all that is expected of it?

Obviously this is an academic question, but all natures of answers are welcome!

(My language of choice is OO PHP5.3!)

A: 

Chained accessors aren't really a big deal: you return $this from the setter method. Boom, done, works at as many levels as you like.

chaos
Do you mean "getter" method?So you just store the state of whatever the method requested and bring back the original object to receive further tweaks?Say in PHP, what would be a good way to know that you're at the end of a chain - and thus should be returning the results instead of "this"?
Omega
Nope, I mean setter methods. Accessor chaining on *read* accessors, rather than *write* accessors, is nonsensical to me.
chaos
Interesting, I may simply be using the term wrong. Can you explain how come?
Omega
Well, we're talking about like how in jQuery you can do `$(obj).hide('fast').show('slow').cook('thoroughly')`, right? Those are all 'write-type' operations; we don't need to use their return value for anything but returning the base object, allowing us to do the chaining. You wouldn't do `$(obj).attr('src').hide()`, because that doesn't make any sense; `attr('src')` is returning a queried value that you're using, so chaining something off of it that operates on `$(obj)` isn't something that would ever make any sense.
chaos
Your reasoning is correct, although I'm talking strictly in terms of retrieving data. Not performing any modifications. My operations would be more like: $car->owners->addresses (array of Address instances) or $car->owners[1]->address (instance of Address) or $car->owners (array of Owner instances).
Omega
Oh, okay. I don't even identify that as chaining; it's just about having `$car->owners` be something you can do those operations on. You couldn't do exactly what you write in PHP; the first operation depends on `$car->owners` being an object and the latter two operations depend on `$car->owners` being an array. If it were me, I would make `$car->owners` a collection class and then do `$car->owners->addresses()`, `$car->owners->address(1)`, and $car->owners->asArray()`.
chaos
+3  A: 

Chained method calls are orthogonal to the ORM question, they're used all over the place in OOP. A chain-able method simply returns a reference to the current object, allowing the return value to be called. In PHP

class A {
 public function b() {
  ...
  return $this;
 }

 public function c($param) {
  ...
  return $this;
 }  
}


$foo = new A();
$foo->b()->c('one');
// chaining is equivilant to
// $foo = $foo->b();
// $foo = $foo->c();

As for how queries are constructed, there are two methods. In ActiveRecord like ORMs there's code that examines the Database's meta-data. Most databases has some kind of SQL or SQL like commands to view this meta-data. (MySQL's DESCRIBE TABLE, Oracle's USER_TAB_COLUMNS table, etc.)

Some ORMs have you describe your database tables in a neutral language such as YAML. Others might infer a database structure from the way you've created your Object models (I want to say Django does this, but it's been a while since I looked at it). Finally there's a hybrid approach, where either of the previous two techniques are used, but a separate tool is provided to automatically generate the YAML/etc. or class files.

One the names and data-types of a table are known, it's pretty easy to pragmatically write a SQL query that returns all the rows, or a specific set of rows that meet a certain criteria.

As for your last question,

How does an ORM manage the queries while sustaining the arbitrary nature of all that is expected of it?

I'd argue the answer is "not very well". Once you move beyond the one-table, one-object metaphor, each ORM has a different approach an philosophy as to how SQL queries should be used to model objects. In the abstract though, it's just as simple as adding new methods that construct queries based on the assumptions of the ORM (i.e. Zend_Db_Table's "findManyToManyRowset" method)

Alan Storm
Thanks! That's a really great answer.So truthfully, an ORM derives its complexity from the features it offers on top of a rather simplistic query generator?
Omega
That's one way of putting it. I'd be careful with the term "ORM" though. There's a lot of people who don't consider things like AcrtiveRecord a true ORM, but rather a tool for implementing ORM. More here http://kore-nordmann.de/blog/why_active_record_sucks.html
Alan Storm
That's definitely true. I tend to call mine just an OO data access wrapper or base OO data class. Thanks for the info :)I may have to make a more specific question about how the queries themselves are generated. Determining which columns to fetch is easy. But I'm sure there's some complex process to generate the WHERE clause and other conditions...
Omega
+1  A: 

I created a presentation on the topic of building a PHP DataMapper that might be interesting to you. It was recorded on video at the Oklahoma City Coworking Collaborative when I presented it there for the PHP user group:

Video: http://blip.tv/file/2249586/

Presentation Slides: http://www.slideshare.net/vlucas/building-data-mapper-php5-presentation

The presentation was basically the early concept of phpDataMapper, though a lot has changed since.

Hope they help you understand the inner workings of ORMs a bit better.

Vance Lucas
+2  A: 

How does an ORM manage to accomplish features like chained accessors and how deep are they typically expected to work?

Nobody seems to have answered this. I can quickly describe how Doctrine does this in PHP.

In Doctrine, none of the fields which you see on an object model are actually defined for that class. So in your example, $car->owners, there is no actual field called 'owners' defined in $car's class.

Instead, the ORM uses magic methods like __get and __set. So when you use an expression like $car->color, internally PHP calls Doctrine_Record#__get('color').

At this point the ORM is free to satisfy this in anyway necessary. There are a lot of possible designs here. It can store these values in an array called $_values, for example, and then return $this->_values['color']. Doctrine in particular tracks not only the values for each record, but also its status relative to the persistence in the database.

One example of this that is not intuitive is with Doctrine's relations. When you get a reference to $car, it has a relationship to the People table that is called 'owners'. So the data for $car->owners is actually stored in a separate table from the data for $car itself. So the ORM has two choices:

  1. Each time you load a $user, the ORM automatically joins all related tables and populates that information into the object. Now when you do $car->owners, that data is already there. This method is slow, however, because objects may have many relationships, and those relationships may have relationships themselves. So you'd be adding a lot of joins and not necessarily even using that information.
  2. Each time you load a $user, the ORM notices which fields are loaded from the User table and it populates them, but any fields which are loaded from related tables are not loaded. Instead, some metadata is attached to those fields to mark them as being 'not loaded, but available'. Now when you write the expression $car->owners, the ORM sees that the 'owners' relationship has not been loaded, and it issues a separate query to get that information, add it into the object, and then return that data. This all happens transparently without you needing to realize it.

Of course, Doctrine uses #2, since #1 becomes unwieldy for any real production site with moderate complexity. But it also has side-effects. If you are using several relations on $car, then Doctrine will load each one separately, as you access it. So you end up running 5-6 queries when maybe only 1 was required.

Doctrine allows you to optimize this situation by using Doctrine Query Language. You tell DQL that you want to load a car object, but also join it to its owners, manufacturer, titles, liens, etc. and it will load all of that data into objects.

Whew! Long response. Basically, though, you've gotten at the heart of "What is the purpose of an ORM?" and "Why should we use one?" The ORM allows us to continue thinking in object mode at most times, but the abstraction is not perfect and the leaks in the abstraction tend to come out as performance penalties.

mehaase