views:

891

answers:

3

So I'm implementing the repository pattern in an application and came across two "issues" in my understanding of the pattern:

  1. Querying - I've read responses that IQueryable should not be used when using repositories. However, it's obvious that you'd want to so that you are not returning a complete List of objects each time you call a method. Should it be implemented? If I have an IEnumerable method called List, what's the general "best practice" for an IQueryable? What parameters should/shouldn't it have?

  2. Scalar values - What's the best way (using the Repository pattern) to return a single, scalar value without having to return the entire record? From a performance standpoint, wouldn't it be more efficient to return just a single scalar value over an entire row?

+1  A: 

Regarding 1: As far as I can see it, it is not the IQuerable itself that is the problem being returned from a repository. The point of a repository is that it should look like an object that contains all your data. So you can ask the repository for the data. If you have more than one object needing the same data, the job of the repository is to cache the data, so the two clients of your repository will get the same instances - so if the one client changes a property, the other will see that, becuase they are pointing to the same instance.

If the repository was actually the Linq-provider itself, then that would fit right in. But mostly people just let the Linq-to-sql provider's IQuerable pass right through, which in effect bypasses the responsibility of the repository. So the repository isn't a repository at all, at least according to my understanding and usage of the pattern.

Regarding 2: Naturally it is more performance-effective to just return a single value from the database than the entire record. But using a repository pattern, you wouldn't be returning records at all, you would be returning business objects. So the application logic should not concern itself with fields, but with domain objects.

But how more effective is it to return a single value compared to a complete domain object? You will probably not be able to measure the difference if your database schema is reasonably well defined.

It is a lot more important to have clean, easy-to-understand code - instead of microscopic performance optimizations up front.

Pete
+10  A: 

Strictly speaking, a Repository offers collection semantics for getting/putting domain objects. It provides an abstraction around your materialization implementation (ORM, hand-rolled, mock) so that consumers of the domain objects are decoupled from those details. In practice, a Repository usually abstracts access to entities, i.e., domain objects with identity, and usually a persistent life-cycle (in the DDD flavor, a Repository provides access to Aggregate Roots).

A minimal interface for a repository is as follows:

void Add(T entity);
void Remove(T entity);
T GetById(object id);
IEnumerable<T> Find(Specification spec);

Although you'll see naming differences and the addition of Save/SaveOrUpdate semantics, the above is the 'pure' idea. You get the ICollection Add/Remove members plus some finders. If you don't use IQueryable, you'll also see finder methods on the repository like:

FindCustomersHavingOrders();
FindCustomersHavingPremiumStatus();

There are two related problems with using IQueryable in this context. The first is the potential to leak implementation details to the client in the form of the domain object's relationships, i.e., violations of the Law of Demeter. The second is that the repository acquires finding responsibilities that might not belong to the domain object repository proper, e.g., finding projections that are less about the requested domain object than the related data.

Additionally, using IQueryable 'breaks' the pattern: A Repository with IQueryable may or may not provide access to 'domain objects'. IQueryable gives the client a lot of options about what will be materialized when the query is finally executed. This is the main thrust of the debate about using IQueryable.

Regarding scalar values, you shouldn't be using a repository to return scalar values. If you need a scalar, you would typically get this from the entity itself. If this sounds inefficient, it is, but you might not notice, depending on your load characteristics/requirements. In cases where you need alternate views of a domain object, because of performance reasons or because you need to merge data from many domain objects, you have two options.

1) Use the entity's repository to find the specified entities and project/map to a flattened view.

2) Create a finder interface dedicated to returning a new domain type that encapsulates the flattened view you need. This wouldn't be a Repository because there would be no Collection semantics, but it might use existing repositories under the covers.

One thing to consider if you use a 'pure' Repository to access persisted entities is that you compromise some of the benefits of an ORM. In a 'pure' implementation, the client can't provide context for how the domain object will be used, so you can't tell the repository: 'hey, I'm just going to change the customer.Name property, so don't bother getting those eager-loaded references.' On the flip side, the question is whether a client should know about that stuff. It's a double-edged sword.

As far as using IQueryable, most people seem to be comfortable with 'breaking' the pattern to get the benefits of dynamic query composition, especially for client responsibilities like paging/sorting. In which case, you might have:

Add(T entity);
Remove(T entity);
T GetById(object id);
IQueryable<T> Find();

and you can then do away with all those custom Finder methods, which really clutter the Repository as your query requirements grow.

lordinateur
I came across jbogard's post on this topic today: http://www.lostechies.com/blogs/jimmy_bogard/archive/2009/09/02/ddd-repository-implementation-patterns.aspx
lordinateur
+1: "...most people seem to be comfortable with 'breaking' the pattern to get the benefits of dynamic query composition..."
Jim G.
+3  A: 

In response to @lordinateur I don't really like the defacto way to specify a repository interface.

Because the interface in your solution requires that every repository implementation requires at least an Add, Remove, GetById, etc.. Now consider a scenario where it doesn't make sense to Save through a particular instance of a repository, you still have to implement the remaining methods with NotImplementedException or something like that.

I prefer to split my repository interface declarations like so:

interface ICanAdd<T>
{
    T Add(T entity);
}

interface ICanRemove<T>
{
    bool Remove(T entity);
}

interface ICanGetById<T>
{
    T Get(int id);
}

A particular repository implementation for a SomeClass entity might thus look like the following:

interface ISomeRepository
    : ICanAdd<SomeClass>, 
      ICanRemove<SomeClass>
{
    SomeClass Add(SomeClass entity);
    bool Remove(SomeClass entity);
}

Let's take a step back and take a look at why I think this is a better practice than implementing all CRUD methods in one generic interface.

Some objects have different requirements than others. A customer object may not be deleted, a PurchaseOrder cannot be updated, and a ShoppingCart object can only be created. When one is using the generic IRepository interface this obviously causes problems in implementation.

Those implementing the anti-pattern often will implement their full interface then will throw exceptions for the methods that they don’t support. Aside from disagreeing with numerous OO principles this breaks their hope of being able to use their IRepository abstraction effectively unless they also start putting methods on it for whether or not given objects are supported and further implement them.

A common workaround to this issue is to move to more granular interfaces such as ICanDelete, ICanUpdate, ICanCreate etc etc. This while working around many of the problems that have sprung up in terms of OO principles also greatly reduces the amount of code reuse that is being seen as most of the time one will not be able to use the Repository concrete instance any more.

None of us like writing the same code over and over. However a repository contract as is an architectural seam is the wrong place to widen the contract to make it more generic.

These exerpts have been shamelesly taken from this post where you can also read more discussion in the comments.

Stephan
What would a repository that you don't 'Save' through look like? If you drop the Add/Remove part, you just have a Finder interface. If all you want to do is find stuff, you don't need a repository but rather some kind of ICustomerFinder implementation. That said, your Repository could implement a Finder:interface ICustomerRepository : IRepository<Customer>, ICustomerFinderJust to clarify, you don't really 'Save' through the repository interface, that would be the responsibility of some kind of Unit of Work implementation. Such a session is best managed outside of the Repository.
lordinateur
Edited the original answer because 600 chars was not enough ;)
Stephan
Greg's point about composition is a good one, i.e., that you can implement a domain object repository with an internal generic repository. That's kind of how it works in practice when using an interface like ICustomerRepository around an ORM like NHibernate (which substitutes for a generic repository). His main point is not to use generic contracts at the seams of your application. That's pretty reasonable; if your contract is ICustomerRepository and not IRepository, you avoid the problems he describes. I don't love all the finders even though they express intent. What about query objects?
lordinateur