views:

55

answers:

2

I have ran into this problem a few times and I am wondering what other people are doing.

When I am creating a database, sometimes I have to import data into a table on a regular basis, let's say daily. What I normally do is delete all of the records and reimport every record from the external data source.

Many times I will have to store some more data that relates to the imported records, but doesn't come from the original import source. Typically this "extra data" comes from user input. So, I will create another table with a primary key matching the key of the table that gets the imported data, and store this additional data in the new table. If that doesn't make sense here is an example:

In an old legacy system, we store employee data. But, I need to use this data in a web application that can't connect to this old legacy system. So, I create a database with a table that matches the schema of the data that I have in the old system and I import each record into this table every day. When I do the import, I drop every record and import every record.

But in my new system, employees can save a bio. So in another table, I store this and their Id.

It would be simpler to have only one table, but I can't do that because I would drop data that doesn't exist elsewhere when I do the import.

Another bad thing is that, because I am deleting all of these records for the import, I can't define foreign key constraints with related data.

I hate designing databases this way, because I know there is a better way. Wouldn't it be nice if I could do updates when I import the data, instead of deleting and importing it all?

I am using Sql server 2008 but am interested to hear of strategies that can work with any RDBMS.

+1  A: 

Well, when you do the import, import into a temp table, and then update the records in the production table (update in the generic sense of the word: delete what's deleted, add what's new, modify what's changed).

You might also want to check out the new MERGE SQL command in 2008, it just might turn out to be very helpful for this case.

Bandi-T
This is exactly what I was looking for. Thanks.
Ronnie Overby
A: 

Here is a SQL Server 2008 merge statement that I came up with to help me with my current situation:

MERGE INTO dbo.Sections as S        -- Target
USING dbo.SectionsStaging as SS     -- Source
ON S.Id = SS.Id                     -- Join
WHEN MATCHED THEN                   -- Record exists in both tables
    UPDATE SET
        TermCode = SS.TermCode,
        CourseTitle = SS.CourseTitle,
        CoursePrefix = SS.CoursePrefix,
        CourseNumber = SS.CourseNumber,
        SectionNumber = SS.SectionNumber,
        Capacity = SS.Capacity,
        Campus = SS.Campus,
        FacultyFirstName = SS.FacultyFirstName,
        FacultyLastName = SS.FacultyLastName,
        [Status] = SS.[Status],
        Enrollment = SS.Enrollment
WHEN NOT MATCHED THEN               -- Record exists only in source table
    INSERT ([Id],[TermCode],[CourseTitle],[CoursePrefix],[CourseNumber],[SectionNumber],[Capacity],[Campus],[FacultyFirstName],[FacultyLastName],[Status],[Enrollment])
    VALUES (SS.[Id],SS.[TermCode],SS.[CourseTitle],SS.[CoursePrefix],SS.[CourseNumber],SS.[SectionNumber],SS.[Capacity],SS.[Campus],SS.[FacultyFirstName],SS.[FacultyLastName],SS.[Status],SS.[Enrollment])
WHEN NOT MATCHED BY SOURCE THEN     -- Record exists only in target table
    DELETE;

Good stuff!

Ronnie Overby