views:

666

answers:

13

When programming in PHP I always try to create meaningful 'models' (classes) that correspond to tables in the database. I often encounter the following problem:

Assuming that I've created a database with two tables: authors and blogs, which both have a corresponding model in my application.

Let's say I want to print all the blogs along with information about the author, I'd have to do something like this:

<?php
foreach ($app->getBlogs() as $blog) {
  echo "<h1>" . $blog->title . "</h1>";
  echo "Written by" . $blog->getAuthor()->name . "</p>";
  // ... et cetera
}
?>

The problem is that the application will now fire 1 SQL query to get all the blog items, and [number of blog items] queries to get the information for every author. Having used straightforward SQL I could have retrieved this information using a simple query:

SELECT * FROM blogs
JOIN authors ON authors.id = blogs.author

What's the best way of dealing with such issues: developing an object-oriented application without executing too many useless SQL queries.

+1  A: 

Whatever solution you use, ORM or not, should be capable if issuing a single select in this case, and also it should be capable of selecting only the necessary columns. Then from that join it should be capable of populating the authors objects with corresponding lists of blogs per author. Having to issue multiple SQL is wasteful.

Otávio Décio
A: 

This is the classic ORM problem. Many many schools of thought. Not sure of php specifics, but there are several strategies to resolve this 'impedence mismatch.' Google orm.

n8wrl
+2  A: 

Things like this are exactly what creating your own data layer should solve for you. In your model for your blogs, there should be a fuction like getBlogList() that will return the blog titles and author's name all in one query.

TheCodeMonk
Some sort of solution I've thought of myself but the point is: I don't know in advance if the author-object is ever needed, $blog->getAuthor() might never be called but I would have retrieved the information nonetheless with the provided solution.
thijs
What you're asking is impossible, then. You're basically asking for code that somehow knows in advance what you're going to use. You have to hardcode it yourself; you have to either choose lazy loading, which is efficient if you want only a few Authors but not all, or batch loading, as described above, which is efficient if you need many Authors. You can't have both.
ryeguy
The ORM tool we use here at Inntec would have generated classes for (a) pulling down a single Author and another for (b) pulling down a collection of Authors. So I could easily say "get me all authors that have the letter 'm' in the first name" or almost anything else. However, both (a) and (b) pull down the entire Author, so all fields go across the wire. If you wanted just one field for a special case, or if you needed an unusual join, you would have to write that yourself. Which is fine. I use the gen'd code 90% of the time.
Brian MacKay
ryeguy, it's not impossible. If you use something like LinqToSql you can get this data just using Linq to create the queries everywhere, or as you need the data in a special way, you can create a function in a layer on top of the LinqToSql classes that would create the query in a more efficient manor. If he is describing a situation where, at runtime, he has no idea what queries will be needed, then I would like to know what he is doing that is creating that kind of situation, because that sounds like a bad design.
TheCodeMonk
A: 

one way to do this is to create a view qith your join in it and map view results to another class which contains data for blog and author.

Mladen Prajdic
+1  A: 

Propel is an example of a PHP ORM which can cope with this. I'm sure Doctrine must be able to do so, though I've never looked at it.

Why reinvent the wheel?

Colin Fine
+2  A: 

IMO, I think you should just write another class that will encapsulate what you have. Does it always make sense for a blog to have an author? Does every author have a blog? Can an author have multiple blogs? Think about these issues, then design a class that will encapsulate this. Remember, typical database schemas are not OO... they are relational. Yes, they are close, but there are subtle differences.

So if an author can have multiple blogs, you can have a key to multivalued class of some sort (with the key based on author id) and you can initialize or load this class with one SQL call. Just some things to think about.

Polaris878
+3  A: 

Unless you know for a fact that the inefficient sql operations will have no real impact, such as the number of redundant iterations or the rows affected will always be small (e.g. iterating an operation over the number of children in a family, which, short of very rare cases like the Duggars, can be relied on to be less than 10), I have always favored efficiency of the relational query over beauty of the OO code.

Although ugly OO code can make maintenance a pain, inefficient data access can bring a system to its knees, usually when you're on vacation or trying to sleep. And most of the time, you can find a good compromise that makes the most efficient SQL operations have a reasonably "objecty" interface. It may cost you a bit more time when it comes to refactoring or adding features if your object model isn't beautiful, but it is costing your customers time every single time they push that button (or money in terms of bigger hardware to run the app - never a good method of optimization), and the man hours spent using the app should far outstrip the man hours put into developing it (one would hope).

As far as your concerns about whether an interface will be needed (forcing you to figure out all possible consumption patterns), I have dealt with this by doing all my data modification via stored procedures, but allowing data access to go straight against the tables & views by giving all users select privileges only. This is a semi-controversial position, as many people would like to lock out all data access operations from downstream consumers in the name of ensuring that all sql being run is to their standards. But new ways of looking at the data are always coming up, and if you have to add a new stored proc, update your core class libraries and update your client code every time someone wants to implement a new feature, deployment and qualification can grow to be a real burden - far more than having to deal with an object model that doesn't fit a religious ideal. And it's a lot easier to implement a code inspection process that verifies that new select statements written by downstream consumers are kosher.

Steve Broberg
A: 

Honestly, just create a method on your Blog class called getBlogsWithAuthors() and then run

SELECT  *
FROM    blogs
JOIN    authors 
        ON authors.id = blogs.author

I know it may seem like a pain to write stuff like this for each model class, but there is really no other way. You could make it a bit more dynamic, however:

//this is a method of a model class. 
//Assume $this->table is the table name of the model (ie, Blog)
public function getWith($joinTable, $pivot1, $pivot2)
{
    $sql="SELECT    *
            FROM    {$this->table}
            JOIN    $joinTable 
                    ON $pivot1 = $pivot2";

    return executeQuery($sql);    
}

$blog=new Blog();
$result=$blog->getWith('authors', 'authors.id', 'blogs.author');
[play with results here]
ryeguy
To be honest, this solutions seems a little hacky to me. The getWith() method isn't very descriptive and does not conform the encapsulation advocated by object-oriented design because other objects have to know the inner workings of the Blog class (column names, join candidates).
thijs
Well the idea was that this could be in your base model class and could be adapted without any hardcoding to any model. If you want something more elegant, why are you coding it manually? Go for propel or doctrine. You're just reinventing the wheel otherwise.
ryeguy
I think RyeGuy is right on this one. ;) IF his solution is hackish, it's because it's too minimalistic. I don't like passing around database elements as text. But what's the next step? Making a seperate class for each entity with more tooling (enums, etc) for columns. But that's time consuming and brittle (think about database changes, etc), so what's next? Code generation solves a ton. And if there's code generation, gee whiz we could do so much more. And that is how my ORM tool was born. It literally started in ASP Classic, using MS Access to generate classes. :)
Brian MacKay
+2  A: 

I'm a huge ORM advocate, and here's my weigh-in:

It's okay to trade an inperceptible amount of application performance for a ton of developer performance. Servers are extremely powerful these days and that extra iron gives us some new flexibility.

That said, if you do something silly that obliterates the user experience by bringing the server to its knees, that's no longer okay. If you had a million authors in your example, pulling them all down along with all of their fields and iterating through them would be unwise. If you had only 20 authors, then it's no big deal.

In the case of huge datasets and expensive batch operations, even as an ORM guy, I have to optimize and write special sprocs or SQL statements just for that case. And I have to be careful not to write my code in such a way that I hammer the database would it would be better to use a caching pattern where I pull down a large data set and then work off of that.

This is a big on-going debate, but to me it's just a matter of understanding that you can't solve every problem with a single tool.

Brian MacKay
Thanks for your comment. Will there ever be a trade-off between performance and clean design? Isn't there a way to write both reusable and clean object-oriented code and execute efficient queries selecting the right data?
thijs
thijs: I think you can have your cake and eat it too. If you have to write a one-off sql statement, just do your best to encapsulate that data access code in a class somewhere using a pattern that is consistent throughout your application. It doesn't have to be hideous. ;) I've got projects where there's a class called 'sprocs' that contains code for getting into all my stored procedures. So I can do something like: myData = Sprocs.AuthorNames, and it's very consistent. Many ORM tools auto-generate that kind of stuff for you, which is nice.
Brian MacKay
How do you know you have 20 authors when the users can add more at any time? Do you put CHECK constraints on the number of rows in the table or what? This kind of capacity planning is a disaster waiting to happen.
wqw
Well of course you have to use common sense. That nothing to do with the ORM vs. Other Solutions debate. To answer your question, if I didn't know for sure there there will never be thousands of Authors, in this case I would maybe choose a different way to visualize the data or implement paging. My ORM does that. I can say Authors.Select.AuthorActive(True); Authors.ConfigurePaging(PageIndex, PageSize); Authors.Load;
Brian MacKay
I would also add that in the types of line-of-business apps that we create as a consulting firm, I usually have a good idea of what the table sizes are expected to look like and how much scale the customer is paying us to support. They usually are not willing to pay us to support extreme levels of scale, and there's an understanding that if something unexpected happens where the parameters change by 1000%, they will be paying us more, and that's what we call a high quality problem. The IT world does not tend to work this way, so that's a common place where we get our wires crossed.
Brian MacKay
+1  A: 

I used some other bindings.. Example:

<?php
$blogs = $app->getBlogs();
$blogs->getAuthor();
foreach ($blogs as $blog) {
  echo "<h1>" . $blog->title . "</h1>";
  echo "Written by" . $blog->getAuthor()->name . "</p>";
  // ... et cetera
}
?>

-> getAuthor() call on $blog queries the DB only once, and using special object for array, the getAuthor() call is called on each (but, is somehow optimized to run only as one query).

Yossarian
A: 

You could always use memcached as an intermediate layer. Each query would be purely RAM-based, which means you can run as many as you want.

Joeri Sebrechts
+1  A: 

You've already answered the question:

Having used straightforward SQL I could have retrieved this information using a simple query

You have a choice between SQL that fetches only blog posts and SQL that fetches blog posts and authors. Likewise you have a choice between some PHP code that fetches just blog posts or PHP code that fetches blog posts and authors. You have to make a choice about your PHP code, just as you have to make a choice about your SQL.

There are plenty of examples above which demonstrate how this would work in practice. The recommendation to use Doctrine or Propel is also a good one.

Rob Knight
A: 

Consider Command/Query Separation as described by Greg Young and Martin Fowler. Your query model can have Blog and Author de-normalized into a single table optimized for retrieving DTOs for your presentation layer.

Greg Young has a great presentation on CQS on InfoQ.

pnschofield