views:

39

answers:

2

Does anyone have recommendations for testing data quality. Imagine you have staged data in one format and executed a process to transfer it to a more relationship normalized format in another database. I would like to create a bunch of tests that could be executed to verify data quality. Does anyone have recommendations as far as tools and techniques? This is targeting SQL Server.

Best regards.

+1  A: 

Won't the "more relationship normalized format" destination database be a test in of itself? The NOT NULLs, FKs, UNIQUE and CHECK constraints, etc will catch a lot of problems on their own.

I'd suggest writing queries to find missing or problem data, and that is entirely dependent on your tables and business logic.

EDIT after OP comment

I've done several large conversions over the years and I just make several stored procedures, each importing a different group of related tables, and slog through the data in them. I usually output a bunch of info for each error in the data into log tables and keep running totals for each kind of the data errors (nonconforming data that got rejected). I would work with the customer on how to handle the rejected data. Possibly hard code logic to make the data fit the new tables, etc. I would not worry about testing the data, just make a process of importing it and reporting the problems. If the number of problems (rejects) is small, and acceptable, you are done. If not, you can keep tweaking your import process until only an acceptable number of rejects occur.

KM
That's my first inclination as well, but I'm wondering if there are formalized tools like NUnit is to code testing which are applied to data quality testing.
Nissan Fan
+1  A: 

use DBCC commands. for example

dbcc checkconstraints

can check the integrity of a specified constraint or all constraints on a specified table in the current database.

DBCC CHECKDB - Ensures that tables in the db and the indexes are correctly linked.

see this link.

masoud ramezani