views:

114

answers:

3

Bill Karwin has a blog post called "Why Should You Use An ORM?" which is being discussed on Reddit and I was confused about a couple of points.

In it he says in the comments section:

OODBMS and ORM works only on objects that we've instantiated in the application layer. I.e. there's no way to do a query like this:

UPDATE Bugs SET status = 'CLOSED' WHERE status = 'OPEN';

To do this in an ORM or an OODBMS, you'd have to fetch all bugs that match the criteria and instantiate objects for them. Then you could set the attribute and save the objects back to the database one by one. This is expensive and certainly requires more code than the equivalent SQL operation shown above.

This illustrates an advantage of a language like SQL that treats sets as a first-class data type. The OO paradigm cannot substitute for the relational paradigm in all cases. There are some ordinary operations that SQL can do much better.

I bolded the part where he says you have to instantiate objects for these bugs when you use an ORM because that's the part I'm confused about.

My question is why do you have to? Okay, object-oriented is one thing and relational is another. But is it really true that they are so different that there is no way to represent an object so that it can be understood by the relational database? For example, I'm thinking about how you can serialize an object and then it gets written into a file-storable format. Couldn't you use a format like that to transfer the object to a relational database?

A: 

ORM's map the state of objects to an equivalent state in the database. So, if you want to change the state of something in the database using ORM, the only mechanism you have available to you is to first manipulate the objects represented by the database, and then save their state.

I'm not sure what you mean by:

I'm thinking about how you can serialize an object and then it gets written into a file-storable format. Couldn't you use a format like that to transfer the object to a relational database?

Do you mean serialize an object into a structure that you could principally store in a flat file (e.g. an XML format), and then store that data in the database? If so, yes you could. The challenge would be when you want to search for that information. Say you wanted to find all "closed" bugs, you would have to read every single bug, deserialize it and examine it's status to see if it should be included in the list.

Eric J.
But is it really true that there's no way except with an O/R mapper to translate between the object state and the relational state? I just don't get why that is.
peanutz
I guess you could say that's a key part of the definition of an O/R mapper. Something that can translate between object state and relational state is "mapping" one state to the other, making it by definition an O/R mapper.
Eric J.
Right, I'm just trying to understand what it is about these two formats that is so incompatible that it needs a translator in between. Just can't picture it. That's why I mentioned the serialization example - it seems like there should be a way to translate more directly between the two.
peanutz
Serialization is another way to map the data to long-term storage. For some applications it's perfectly fine to write an object's XML representation to a file and be done with it. If you use files, though, you don't have a good way to find individual objects based on criteria (walk through in your head how you would find all "closed" bugs if you were using XML files), and you have no good way to create relations between the individual objects/XML files. Those are things that relational DB's are good at, and why it often makes sense to persist objects to them instead of to XML.
Eric J.
A: 

The fundamental purpose of an ORM is to convert data from one representation to another; the tone of your quote is that SQL is better suited for batch-work, which is true--since the ORM would convert the tables of relational data to object graphs then back to tables.

A (very loose) analogy is having a vat of pulp that you want to dye red. If the vat represents the SQL database you'd just dump the dye in and give it a stir. Using an ORM would be like converting the pulp into paper, dying the individual sheets, then re-pulping the (now colored) paper to put back in the vat.

STW
Thanks, I think I see the analogy. I just don't see why it's true. I'm just not getting why the two formats - set + object - are so incompatible that you need to make this dramatic move of loading relational data into objects before operating on them.
peanutz
Because in the OO paradigm, we can call methods *only* on object instances (or static methods on classes, but I'd claim a static method for `update()` is not much of an ORM).
Bill Karwin
A: 

is [there] no way to represent an object so that it can be understood by the relational database?

You've missed the point of my statements. I didn't mean that one couldn't store an object in a relational database. I meant that the OO paradigm assumes you have an instance of that object in application space. That is, you can call methods and access properties of an object:

$bug->status = 'CLOSED';
$bug->save();

But in any ORM I've seen*, you can't operate on an object instance without first fetching it from the database. Nor can you operate on whole sets of rows at a time, as you can with SQL.

It would be interesting to see an ORM package that had an object type mapping to a set of data. Then when you change an attribute, it applies to all rows in that set. I haven't seen any ORM attempt to do this.

It would be very challenging, because of concurrency issues. Does the set include rows that were in that set when you instantiated the object, or when you execute the change, or when you save the changes? If it supports all these permutations as options, then it starts to get so complex to use that one might rightly think that it represents no actual improvement over using SQL directly.

Re your comment: It's not that sets and objects are incompatible. A set can be an object (Java even has classes for Collection and Set). But the OO paradigm assumes operations apply to one object instance, whereas relational operators always apply to sets (a set of one row is still a set). And in reality, ORM packages that exist today make the same assumption, that one can change only one instance of a row at a time, and you must have fetched that row before you can change it.

It's possibly in theory to expand the capabilities of an ORM to work on sets -- but AFAIK no one has tried to do this. My claim is that an ORM that could do all of what relational operators can do would be much worse to use than SQL.

* I am excluding SQL-like pseudolanguages like HQL, that happen to be part of an ORM package (Hibernate in the case of HQL) but that pseudolanguage itself doesn't qualify as an ORM.

Bill Karwin
Thanks for responding, Bill. I'm not familiar with HQL. Maybe that's a case of a language that straddles the divide between the object space and the relational space. But I guess my question is more about why the formats are so incompatible i.e. set vs object. It's like you have a key but it doesn't fit the lock. I guess I just don't understand why it doesn't fit to such an extent. That's why I didn't ask on your blog post - I think my question is really more rudimentary than you're giving it credit for.
peanutz
"It would be interesting to see an ORM package that had an object type mapping to a set of data. Then when you change an attribute, it applies to all rows in that set" -> Exactly! You go on to explain why this would be difficult because of concurrency but wouldn't optimistic locking on a per-object basis be a simple solution?
peanutz
Re optimistic locking, okay so if one row in the set you're changing is already locked and rejects the operation, does that block the change for all other rows in the set? Or is it configurable? Configurable per application, per transaction, per row? The point is, it's not a simple problem to solve *generally*.
Bill Karwin
LinqToSql allows you to do exactly this (though you would probably claim it's not a true ORM) - http://weblogs.asp.net/scottgu/archive/2007/05/19/using-linq-to-sql-part-1.aspx
Dan Diplo
What I don't know is whether applying it to sets would be any more or less difficult than applying it to objects. That's really where the question was coming from.
peanutz
@Bill - What I don't know is whether applying optimistic locking to sets would be any more or less difficult than applying it to objects. That's really where the question was coming from. Perhaps it comes down to your statement: "But the OO paradigm assumes operations apply to one object instance, whereas relational operators always apply to sets (a set of one row is still a set)". I don't know enough about ORMS *in general* to know whether that's true but if it is, then that would be a major limitation.
peanutz
The issue of optimistic locking is tangential. My point is that in any ORM, you have to fetch a row into the ORM before you can apply a change to it. In SQL that's not true.
Bill Karwin
@Peanutz - I wasn't actually referring to locking, but to the fact that LinqToSql will allow you to efficiently update a property on a collection of objects in an effecient manner. You could write something like "from b in BugsCollection where b.status = 'closed' set b.status = 'open'" and this would update the underlying database using just one SQL statement.
Dan Diplo
@Bill - But since, as you mentioned, there are features in Java like Collection and Set that let you operate on a "set of objects", in this case objects that represent rows in the database, that would seem to suggest that if you can (a) find a format that allows you to persist from object to set; (b) operate on more than one object at a time using something like a Collection, and (c) use something like optimistic locking to avoid concurrency issues, then you are close to being able to replicate your relational features without leaving the object-oriented world.
peanutz
Okay, it sounds like the LinqToSql pseudolanguage describes a set using a syntax very similar to SQL, it can achieve this. One of these days I'd like to try LinqToSql and study exactly what SQL is generated internally. I can watch the query logs on the RDBMS side to be sure.
Bill Karwin