views:

102

answers:

4

I'm dealing with a database that contains data with inconsistencies such as white leading and trailing white space.

In general I see a lot of developers practice defensive coding by trimming almost all strings that come from the database that may have been entered by a user at some point. In my oppinoin it is better to do such formating before data is persisted so that it is done only once and then the data can be in a consistent and reliable state. Unfortunatley this is not the case however which leads me to the next best solution, using a Trim method.

If I trim all data as part of my data access layer then I don't have to concern myself with defensive trimming within the business objects of my domain layer. If I instead put the trimming responsibility in my business objects, such as with set accessors of my C# properties, I should get the same net results however the trim will be operating on all values assigned to my business objects properties not just the ones that come from the inconsistent database.

I guess as a somewhat philisophical question that may determine the answer I could ask "Should the domain layer be responsible for defensive/coercive formatting of data?" Would it make sense to have a set accessor for a PhoneNumber property on a business object accept a unformatted or formatted string and then attempt to format it as required or should this responsibility be pushed to the presentation and data access layers leaving the domain layer more strict in the type of data that it will accept? I think this may be the more fundamental question.

Update: Below are a few links that I thought I should share about the topic.

Information service patterns, Part 3: Data cleansing pattern

LINQ to SQL - Format a string before saving?

How to trim values using Linq to Sql?

A: 

it is better to do such formatting before data is persisted

Absolutely.

Should the domain later be responsible for defensive/coercive formatting of data?

With the currently stored data you won't find a right place to introduce the trim. it's because the consistency of your storage is broken.

You could try the self-healing approach. Read the data and trim it somewhere before displaying in a dialogue. As soon as the user saves this dialogue, the data in the database gets "fixed".

As for the new input, I'm leaning toward opinion that trimming data is some cleaning operation that belongs neither to the domain layer nor to the data layer. The user input should get "cleaned" somewhere close to the UI layer before you actually start working with that data.

Developer Art
My thinking is along the same lines. I guess what I was realizing though is that by putting the formatting code in the domain model I could essentially take care of two formatting issues, one being data that comes from the UI, and the second data that comes from the database. This is essentially like the self healing method you described but without requiring it to be done in the UI. But just as often is the case just because something makes sense for one reason doesn't mean it is the right choice.
jpierson
I should also mention that I'm using LINQ to SQL so in my case it would be pretty trivial to place some code in the Data Access layer that runs a Trim() any value in question and the best thing is that this would allow formatting data as it's being persisted and as it is being pulled out of the database. Obviously in the long term doing both is overkill but for now as I have to live with the badly fomratted data I think it is a reasonable solution. Alternatively it is just as easy to place the code in my Business Objects. Which of these is more correct? Are there other suitable alternatives?
jpierson
Like i tried to explain in my answer: the Trimming is not a concern of the Application Domain but rather the application itself (that which services the domain). So don't trim in the Domain/Business objects. Trim in the Application/Service layer or if you really have to, the UI layer. But again, UI is for presentation, and while that responsibility might seem to appropriate trimming it's not. If you don't have an App or Service layer, then UI should be the place i think.
cottsak
+2  A: 

I'd suggest "cleaning" the data in the Application layer. The reason you want to do it here (yes, higher in the stack like Dev Art suggested) is because your Domain Model should 'model' the domain as close as possible. What if at some point in time all the data is 'clean'? Well, then you might want to remove the helper method that does the 'cleaning'. Easier to remove it from a place higher in the application stack.

Use a classy extension method that uses reflection (don't start with telling me reflection is slow until you know how it works) or something to dig into all 'string' properties (for example) of your domain object graph. Here is an example that uses this technique to adjust DateTime values to a fixed offset - note how it will "offset" all DateTime values even deep in collections or other custom types. In your case, the offsetting wlil be your trimming. This is certainly easier than adding .Trim()s all over the show and can be decoupled fairly easily.

Remember, the bad data is a cross-cutting concern to your Domain and so should not tie directly to it (think AOP).

cottsak
+2  A: 

The data must be cleaned prior to being persisted. Now that it is persisted you have unclean data, that likely needs to be cleaned in the database. Consider looking for customer by name. Can I find "John", "Doe" if what I stored was " John ", " Doe ".

Cleaning the data as close to the UI allows you have much simpler code. The defensive code can change from cleanup code to assertions. (i.e. assert string = trimmed(string)). To get to this point you will need to cleanup the database as well as the UI code.

BillThor
The issue is that our database is the common link between two separate applications only one of which I have control over. So even if I concentrate on correcting the problems, bad data will still sneak in. Also database centric cleansing operations will not be adequate because badly formatted data may be inserted at moment. The other application that I do not have control over is littered with Trim and Null/Empty string checks throughout the code and this is something I want to avoid. The question is how best to work around the badly formatted data that I have little control over.
jpierson
I agree very strongly about the defensive code being able to change to Assertions especially within the Domain Layer. I think that is why I started considering if some of the existing validation code in our Business Objects were changed to coerce values as opposed to rejecting them whether this subtle change would be bad practice or lead to introducing other UI related concerns into the wrong layer.
jpierson
You may need to build triggers or code to trim or reject data on the way in. As you don't have control over the application, you need to deal with the garbage it is feeding you. I prefer placing code like this as close to the boundary as possible so you don't have to litter the code around like the application you are dealing with.
BillThor
I agree 100% about placing the code as close as possible to boundaries. I've considered triggers too, I think this would be a reasonable solution coupled with some type of one-time cleansing script that is run during new software updates.
jpierson
A: 

Here's a question that's sort of related to this discussion:

I'm building an n-tiered .NET app. If I have a busines rule that states something like "A Street Address shall not have leading or trailing spaces" what is a "best practice" in terms of implementing this rule? Should I allow the user to enter leading or trailing spaces and automatically remove them in the domain layer OR should I throw an exception telling the user to remove the leading or trailing spaces and try again?

I'm leaning towards the latter, but wanted to hear what others think.

Thanks,

rjd

Robert D'Alimonte
In situations like this I normally see people write code in the presentation layer to coerce this data into the format expected by their business objects. Then the business objects would reject bad values that make it through using an exception or by use of the IDataError interface. I think about this in terms of data like a phone number where the user enters "(555) 555-5555" and the domain object instead expects unformatted values "5555555555", so in this case I would prefer to leave the formatting out of the business object.
jpierson