views:

714

answers:

6

SQL Server 2005 Question:

I'm working on a data conversion project where I'm taking 80k+ rows and moving them from one table to another. When I run the TSQL, it bombs with various errors having to do with converting types, or whatever. Is there a way to find out what row caused the error?

=====================

UPDATE:

I'm performing an INSERT INTO TABLE1 (...) SELECT ... FROM TABLE2 Table2 is just a bunch of varchar fields where TABLE1 has the right types.

This script will be put into a sproc and executed from an SSIS package. The SSIS package first imports 5 large flat files into TABLE2.

Here is a sample error message: "The conversion of a char data type to a datetime data type resulted in an out-of-range datetime value."

There are many date fields. In TABLE2, there are data values like '02/05/1075' for Birthdate. I want to examine each row that is causing the error, so I can report to the department responsible for the bad data so they can correct it.

A: 

If you are working with cursors, yes and is trivial. If you are not working with cursors, I don't think so because SQL operations are ACID, or transactions per se.

eKek0
+2  A: 

I assume you do the update with the INSERT INTO ...

Instead try to do the update with the cursor, use exception handling to catch the error and log all you need: the row number it failed on etc.

van
agree 80k rows is not that many really so cursor will get there; if there is more than one error will help. Don't forget you can do try / catch in sql 2005 so you can store the rows that failed and carry on with those that worked.
u07ch
good point, u07ch: insert all that did not throw error, then you can just LEFT JOIN ... WHERE RIGHT.X IS NULL to check those that were not inserted all in one statement. If there are many rows that fail, this would be the best solution rather then fixing 1K erroneous rows out of 80K. But most probably it is enough to find 1-2 different causes, the others should be the same and should be easy to fix.
van
A: 

if you are looping, add prints in the loop.

if you are using set based operations, add a restrictive WHERE condition and run it. Keep running it (each time making it more and more restrictive) until you can find the row in the data. if you could run it for blocks of N rows, then just select out those rows and look at them.

ADD CASE statements to catch the problems (converting that bad value to NULL or whetever) and put a value in a new FlagColumn telling you the type of problem:

CASE WHEN ISNUMERIC(x)!=1 then NULL ELSE x END as x
,CASE WHEN ISNUMERIC(x)!=1 then 'not numeric' else NULL END AS FlagColumn

then select out the new converted data where FlagColumn IS NOT NULL

you could try using select statements with isnumeric() or isdate() functions on the various columns of the source data

EDIT

There are many date fields. In TABLE2, there are data values like '02/05/1075' for Birthdate. I want to examine each row that is causing the error, so I can report to the department responsible for the bad data so they can correct it.

Use this to return all bad date rows:

SELECT * FROM YourTable WHERE ISDATE(YourDateColumn)!=1
KM
+3  A: 

This is not the way to do it with SSIS. You should have the data flow from your source, to your destination, with whatever transformations you need in the middle. You'll be able to get error details, and in fact, error rows by using the error output of the destination.

I often send the error output of a destination to another destination - a text file, or a table set up to permit everything, including data that would not have been valid in the real destination.


Actually, if you do this the standard way in SSIS, then data type mismatches should be detected at design time.

John Saunders
+1  A: 

What I do is split the rowset in half with a WHERE clause:

INSERT MyTable(id, datecol) SELECT id, datecol FROM OtherTable WHERE ID BETWEEN 0 AND 40,000

and then keep changing the values on the between part of the where clause. I've done this by hand many times, but it occurs to me that you could automate the splitting with a little .Net code in a loop, trapping exceptions and then narrowing it down to just the row throwing the exception, little by little.

Chris McCall
Incorrect syntax and poor idea to solve the problem. It will work (once syntax error is corrected) but it is the least efficient and really needs to be done by hand rather than running in a package. Bad dates are easily identifiable using isdate() function.
HLGEM
Fixed the bad SQL.
Chris McCall
A: 

John Sauders has the right idea, there are better ways to do this kind of processing using SSIS. However, learning SSIS and redoing your package to completely change the process may not be an option at this time, so I offer this advice. You appear to be having trouble with the dates being incorrect. So first run a query to identify those records which are bad and insert them into an execptions table. Then do you insert only of those records that are left. Something like:

 insert exceptiontable (field1, field2)
 select field1, field2 from table2 where isdate(field2) = 0

 insert table1 (field1, field2)
 select field1, field2 from table2 where isdate(field2) = 1

Then of course you can send the contents of the exception table to the people who provide the bad data.

HLGEM
He did say he was already using SSIS, and Source -> Dest -> Error is not very hard...
John Saunders
I agree, that is how I would do it, but SSIS is not easy to learn how to use properly and he may be under time pressure. I know coming from years of doing DTS packages, that method would have never occurred to me if I hadn't had formal SSIS training. Clearly he is using t-sql scripts not the Data Flow, so he may be completely unaware as to how to use data flow. It isn't easy the first time you do it.
HLGEM
You know, it never occurred to me that he could be using SSIS and _not_ be using a Data Flow.
John Saunders
Well if your DTS pacakges are all based on Exec SQL tasks, then when you convert, they won't be using Data Flow. And when you convert hundreds of them, you don't go in and fix them to do that unless you are making a major change. And if you don't really know what Data Flow does, you might look at your converted DTS packages and think that is the best way to set up a new package. I imagine many of people who started with SSIS use it very differently than many of the people who started with DTS.
HLGEM
Guys, I am using data flows to move txt files into tables. There are over a hundred fields and more than one date field that was causing the errors. I chose to import all the data into a table with simple varchar (allowing everything) and then correcting the data in T-SQL where I felt I had more power and flexability. I was not looking to handle/fix these errors in the process, but find the bad data and fix it in the originating system. For example, some employees had a birthdate in the year 1080.
DeveloperMCT