views:

2053

answers:

4

I am doing some reading, and came across avoiding an internalStore if my application does not need to massage the data before being sent to SQL. What is a data massage?

+8  A: 

Manipulate, process, alter, recalculate. In short, if you are just moving the data in raw then no need to use internalStore, but if you're doing anything to it prior to storage, then you might want an internalStore.

Adam Davis
Data purity should never be assumed of course. :)
EBGreen
No, one should never implicitly trust program input of any sort. Simple checks, however, might not be considered massaging as you aren't touching the data - merely peeking at it.
Adam Davis
That is true. My experience with massaging has almost always been to clean up the data already in a data store that was entered from another system that I have no control over.
EBGreen
Pleas wait... reticulating splines...
Dustin Fineout
@Dustin: lol...
Adam Davis
+3  A: 

Clean up, normalization, filtering, ... Just changing the data somehow from the original input into a form that is better suited to your use.

tvanfosson
+5  A: 

Sometimes the whole process of moving data is referred to as "ETL" meaning "Extract, Transform, Load". Massaging the data is the "transform" step, but it implies ad-hoc fixes that you have to do to smooth out problems that you have encountered (like a massage does to your muscles) rather than transformations between well-known formats.

Thinks that you might do to "massage" data include:

  • Change formats from what the source system emits to what the target system expects, e.g. change date format from d/m/y to m/d/y.
  • replace missing values with defaults, e.g. Supply "0" when a quantity is not given.
  • Filter out records that not needed in the target system.
  • Check validity of records, and ignore or report on rows that would cause an error if you tried to insert them.
  • Normalise data to remove variations that should be the same, e.g. replace upper case with lower case, replace "01" with "1".
Anthony
A: 

And finally there is the less savory practice of massaging the data by throwing out data (or adjusting the numbers) when they don't give you the answer you want. Unfortunatley peole doing statistical analysis often massage the data to get rid of those pesky outliers which disprove their theory. Becasue of this practice refering to data cleaning as massing the data is inappropriate. Cleaning the data to make it something that can go into your system (getting rid of meaningless dates like 02/30/2009 because someone else stored them in varchar instead of as dates, separating first and last names into separate fields, fixing all uppercase data, adding default values for fields that require data when the supplied data isn't given, etc.) is one thing - massaging the data implies a practice of adjusting the data inappropriately.

HLGEM