views:

177

answers:

3

I'm working with Doctrine2 for the first time, but I think this question is generic enough to not be dependent on a specific ORM.

Should the entities in a Data Mapper pattern be aware - and use - the Mapper?

I have a few specific examples, but they all seem to boil down to the same general question.

If I'm dealing with data from an external source - for example a User has many Messages - and the external source simply provides the latest few entities (like an RSS feed), how can $user->addMessage($message) check for duplicates unless it either is aware of the Mapper, or it 'searches' through the collection (seems like an inefficient thing to do).

Of course a Controller or Transaction Script could check for duplicates before adding the message to the user - but that doesn't seem quite right, and would lead to code duplication.

If I have a large collection - again a User with many Messages - how can the User entity provide limiting and pagination for the collection without actually proxying a Mapper call?

Again, the Controller or Transaction Script or whatever is using the Entity could use the Mapper directly to retrieve a collection of the User's Messages limited by count, date range, or other factors - but that too would lead to code duplication.

Is the answer using Repositories and making the Entity aware of them? (At least for Doctrine2, and whatever analogous concept is used by other ORMs.) At that point the Entity is still relatively decoupled from the Mapper.

+1  A: 

IMO, an Entity should be oblivious of where it came from, who created it and how to populate its related Entities. In the ORM I use (my own) I am able to define joins between two tables and limiting its results by specifying (in C#) :

SearchCriteria sc = new SearchCriteria();
sc.AddSort("Message.CREATED_DATE","DESC");
sc.MaxRows = 10;
results = Mapper.Read(sc, new User(new Message());

That will result in a join which is limited to 10 items, ordered by date create of message. The Message items will be added to each User. If I write:

results = Mapper.Read(sc, new  Message(new User());

the join is reversed.

So, it is possible to make Entities completely unaware of the mapper.

Otávio Décio
Yeah, I tend to want the Entity decoupled from, well, anything. Your method is roughly what I meant by, "...Controller or Transaction Script or whatever is using the Entity could use the Mapper directly to retrieve a collection of the User's Messages limited by..." What about the issue of checking if a `Message` already exists? Just do it outside the Entity?
Tim Lytle
I'll assume you are consuming messages from a 3rd party service (RSS feed as you mention). In this case you can search for the last message received; if not found you add it. Keep doing this with the other messages until you find one that exists - that means you already caught up with them (unless I am not understanding your requirements)
Otávio Décio
@Otávio That would work, but it would happen outside the Entity - correct? While I want to keep my Entity decoupled, it also seems as though the code for adding a `Message` should be in the Entity (to avoid code duplication).
Tim Lytle
@Tim - I have absolutely *no* business related code in my Entities. Adding a new message in my model would require creating a new Message object, assigning its UserId foreign key with the one for user I am working with and assigning the other information then calling Mapper.Write(Message). Again, completely outside the Entity.
Otávio Décio
+1  A: 

No.

Here's why: trust. You cannot trust data to act on the benefit of the system. You can only trust the system to act on data. This is a fundamental of programming logic.

Let's say something nasty slipped into the data and it was intended for XSS. If a data chunk is performing actions or if it's evaluated, then the XSS code gets blended into things and it will open a security hole.

Let not the left hand know what the right hand doeth! (mostly because you don't want to know)

Geekster
So what is the best way to handle the generic situations I describe? Just keep it in the controller/transaction script/whatever *uses* the entity?
Tim Lytle
Push it out into a new control class. It's always better to think of coding like you are building a bunch of tools. You would never fasten a hammer to someone's arm -- you'd give them the opportunity to pick one up and use it though. Same goes with your data. You gotta tighten down the code so that it can perform certain valid actions depending on the data, but it must be constrained by your rules. When you let data control the system, you give the data a chance to damage the system.
Geekster
I guess I consider the Entity (or Model) as something that both represents and validates the data.
Tim Lytle
Data models cannot self-validate. This is against philosophical law. See Putnam's Brain in Vat thought experiment: http://en.wikipedia.org/wiki/Brain_in_a_vat What this means is that data will assume it is flawless, but if it has been ethically compromised, it will not know it has been compromised (by design). No system can self-evaluate. That's like suggesting a student can grade his own essay. Only systems that exist outside of the scope of the data can evaluate the data. Hopefully this is proof enough for you but I am willing to continue debating if it helps. Also, data must be portable.
Geekster
I thought about this some more, and I have something else to add. Whenever, there is data that is moderating a system's function, such as a settings file or something, you keep the defaults on hand in case a problem occurs but you handle them as being validated if they work. If a user can change them, then you handle them with care. It's not the same as data. You don't handle it the same way. It's options, carefully weighed out and they would modify the system. You still don't eval them. You would never eval a settings file. But loading full parts of a system from a user is dangerous. Don't!:)
Geekster
Obviously I'm not talking about validating data *with* data (and certainly not evaling anything), just that the object that contains the data (the Entity/Model in a Data Mapper Pattern) ensures the data is valid whenever it's changed or set. Of course that could also be done in a second layer, and I do see the point of that.
Tim Lytle
Don't ensure it is valid when it is set. Ensure it is valid BEFORE it is set... and I agree you need to buffer a second layer there to be certain you don't lose their data during purification, but also so that they can't try and XSS you.
Geekster
+1  A: 

Rule #1: Keep your domain model simple and straightforward.

First, don't prematurely optimize something because you think it may be inefficient. Build your domain so that the objects and syntax flow correctly. Keep the interfaces clean: $user->addMessage($message) is clean, precise and unambiguous. Underneath the hood you can utilize any number of patterns/techniques to ensure that integrity is maintained (caching, lookups, etc). You can utilize Services to orchestrate (complex) object dependencies, probably overkill for this but here is a basic sample/idea.

class User
{
  public function addMessage(Message $message)
  {
     // One solution, loop through all messages first, throw error if already exists
     $this->messages[] $message;
  }
  public function getMessage()
  {
     return $this->messages;
  }
}
class MessageService
{
  public function addUserMessage(User $user, Message $message)
  {
     // Ensure unique message for user
     // One solution is loop through $user->getMessages() here and make sure unique
     // This is more or less the only path to adding a message, so ensure its integrity here before proceeding 
     // There could also be ACL checks placed here as well
     // You could also create functions that provide checks to determine whether certain criteria are met/unmet before proceeding
     if ($this->doesUserHaveMessage($user,$message)) {
       throw Exception...
     }
     $user->addMessage($message);
  }
  // Note, this may not be the correct place for this function to "live"
  public function doesUserHaveMessage(User $user, Message $message)
  {
     // Do a database lookup here
     return ($user->hasMessage($message) ? true
  }
}
class MessageRepository
{
  public function find(/* criteria */)
  {
     // Use caching here
     return $message;
  }
}

class MessageFactory
{
   public function createMessage($data)
   {
     //
     $message = new Message();
     // setters
     return $message;
   }
}

// Application code
$user = $userRepository->find(/* lookup criteria */);
$message = $messageFactory->create(/* data */);
// Could wrap in try/catch
$messageService->sendUserMessage($user,$message);

Been working with Doctrine2 as well. Your domain entity objects are just that objects...they should not have any idea of where they came from, the domain model just manages them and passes them around to the various functions that manage and manipulate them.

Looking back over, I'm not sure that I completely answered your question. However, I don't think that the entities themselves should have any access to the mappers. Create Services/Repositories/Whatever to operate on the objects and utilize the appropriate techniques in those functions...

Don't overengineer it from the onset either. Keep your domain focused on its goal and refactor when performance is actually an issue.

jsuggs
Yeah, I'm with you on design patterns, and looping through all the messages is the simple solution - but that means (assuming a large quantity of messages) you're loading all the records from storage when a simple query would give the same result.
Tim Lytle
Yeah, which is why I mentioned possibly using the doesUserHaveMessage function (or something similar). That could be where the query is done. It still keeps your entities clean and focused, but allows you to keep your domain consistent. In addition, you can start with a simple/trivial implementation (loop through array), but then refactor later when performance required it...so long as your interface stays the same no other code has to change.
jsuggs