tags:

views:

44

answers:

1

SQL Server 2005. I want a script to run in the near future when we are ready to deploy the new app. We have a lot of old data that must be moved to new tables in the new app.

One such set of data is sampling hours the techs have entered, which must be converted to seconds in the new app.

Sounds easy enough, but brace yourself... the old app had no validation. The new column will be bigint, but the old column was varchar. Check out the different kind of data I've encountered for hours.

24
24:00
22:57
24 HR
24hrs
24 hours
22.3 Hrs
24 hr's
n/a
3.09
19
86394
86400 Sec
24:00 valid:19:07
24 hrs / valid=13:44
15:8 (valid=15:07)

Ok, so take a deep breath, it's actually not too bad. I've done most of the hard work already, (identifying the various patterns the users have been using.) This is what have so far:

I create a function for repetitive parsing of HH or SS or HH:MM format of sampling duration.

CREATE FUNCTION HoursToSecond(@input nvarchar(5))
RETURNS bigint
AS
BEGIN
    DECLARE @return bigint
    SELECT @return = CASE
        WHEN ISNUMERIC(@input)=1
        THEN CASE
                WHEN CAST(@input AS decimal(9,3))<100
                THEN CAST(@input as decimal(9,3))*3600 --convert hrs to secs
                ELSE CAST(@input as bigint) --already in seconds
            END
        WHEN CHARINDEX(':',@input)>0
        THEN CAST(left(@input,CHARINDEX(':',@input)-1) as int)*3600 +  
             CAST(SUBSTRING(@input,CHARINDEX(':',@input)+1,2) as int)*60
        ELSE NULL
    END
    RETURN @return
END

Then I switch based on the patterns I see in the data.

INSERT INTO NewDatasheets (sample_time)
    SELECT 
        CASE
           WHEN ISNUMERIC(samplingtime)=1 THEN dbo.HoursToSecond(samplingtime)
           WHEN samplingtime LIKE '% %' THEN dbo.HoursToSecond(LEFT(samplingtime, CHARINDEX(' ',samplingtime)-1))
           WHEN samplingtime LIKE '%h%' THEN dbo.HoursToSecond(LEFT(samplingtime, CHARINDEX('h',samplingtime)-1))
           WHEN samplingtime LIKE '%:%' THEN dbo.HoursToSecond(samplingtime)
           ELSE NULL
        END
    FROM OldDatasheets

Ugly script job. Yes. And I didn't even try to parse the hours after "valid". But it'll do 90% of the work. And I can query for the edge cases and clean those up by hand... but I want to avoid any manual work.

I was wondering if anyone has a better solution, perhaps with less lines of code or avoiding the creation of a function.

+1  A: 

Because this is a one-time job from the sound of things, not an ongoing job, it doesn't have to be that elegant a piece of code, or frankly even pick up all the data. It won't have to be maintained so you can pretty much write anything that'll work, and there's no point in spending longer processing the odd rarities than it would take to update them individually, such as:

UPDATE Table SET NewValue=86400 WHEN OldValue='86400 Sec';
UPDATE Table SET NewValue=86400 WHEN OldValue='24 hrs / valid=13:44';

This way you at least update all identical cases in one go, but it's far less work than trying to parse the real oddities. Write what you can to handle the bulk of the cases (which it looks like you have) and just go through the rest manually as above. If you spot any patterns you can script in that then by all means script them but sometimes a manual job is the right answer.

eftpotrm