views:

682

answers:

3

I need to import data from an Excel spreadsheet into SQL Server, but the data isn't in a relational/normalized format so the import wizard isn't going to cut it (as far as I know).

The data is in this format:

Category    SubCategory     Name        Description

Category#1  SubCategory#1   Product#1   Description#1
Category#1  SubCategory#1   Product#2   Description#2
Category#1  SubCategory#2   Product#3   Description#3
Category#1  SubCategory#2   Product#4   Description#4
Category#2  SubCategory#3   Product#5   Description#5

(apologies I'm lacking the inventiveness to come up with 'real' data at this time in the morning...)

Each row contains a unique product, but the cateogry structure is duplicated. I want to import this data into three tables:

Category
SubCategory
Product

(I know SubCategory should really be contained within Category, DB was not my design)

I need a way to import unique rows based on the Category and then SubCategory columns, and then when importing the other columns into Product, obtain a reference to the SubCategory based on name.

Short of scripting this, is there any way to do it using the import wizard or some other tool?

+1  A: 

I had a similar problem a while ago, and did not find any easy way to do this using an import wizard. The way I resolved the import (as this was a one-off task, and not something that was going to hang around) was to create a simple macro (VBA) from excel that would simply call a stored proc, using each row as the parameters.

The stored proc would intelligently insert each parameter (column), and then grab the ID to use as the foreign key on the next parameter insert.

For example:


    DECLARE @CategoryID INT
    DECLARE @SubCategoryID INT

    -- Check that the Category exists
    IF NOT EXISTS (SELECT * FROM tblCategories WHERE CategoryName = @pCategoryName)
    BEGIN

        -- Your insert statement here, then grab the ID

        SET @CurrencyID = scope_identity()

    END
    ELSE
    BEGIN

        -- Set the category ID here

    END

The VBA macro had code similar to:


Private Sub CommandButton1_Click()

    Dim cnt As ADODB.Connection
    Dim wbBook As Workbook
    Dim wsSheet As Worksheet
    Dim intActiveRow As Long
    Dim intInsuranceProduct As Variant

    ' Get our connection
    Set cnt = CreateConnection()

    ' Read the input sheet
    Set wbBook = ActiveWorkbook
    Set wsSheet = wbBook.Worksheets(1)

    ' Ignore the header row
    intActiveRow = 2

    ' process every row into the database
     Do While (wsSheet.Cells(intActiveRow, 1)  "")

         ' execute the stored procedure, GenerateScript would create your SQL
         cnt.Execute (GenerateScript(wsSheet, intActiveRow))

         ' increment i for row count
         intActiveRow = intActiveRow + 1

     Loop

    End If

    'Cleaning up.

    cnt.Close
    Set cnt = Nothing
    Set wbBook = Nothing
    Set wsSheet = Nothing


End Sub

Chris
I was afraid this was going to be the case, but thanks for putting me in the right direction. Unfortunately this isn't likely to be a one-off task and I may have slightly different data sets to work with :(
roryf
+1  A: 

You might want to investigate SSIS (SQL Server Integration Services) formerly known as DTS (Data Transformation Services).
In SSIS there is the ability to use Excel as a data source where you can specify filters and transformations of the data for loading into the appropriate SQL Server tables. It might take a little research but it is a pretty robust tool, and also support the ability to create a Script task if you need to do something not Out of the Box.

CertifiedCrazy
A: 

Actually a good software to use that was developed specifically for this type of work is Relational Excel - there's a trial edition but it can be used past the trial period it just shows nag screens every once in a while. www.relationalexcel.com

Laszlo