views:

1010

answers:

7

OK, here is my problem, without being too specific for reasons of being fired for putting company practices on the internet.

There are spreadsheets made. These are uploaded to the database. I need to filter out duplicates from uploading. The only way to do this is by making sure that for each and every entry that two fields aren't the same as an entry already in the database. As just becasue one field is the same does not mean its a duplicate. There are two specific fields lets call them FLDA and FLDB that both must match up to an entry in the database already. I can filter by one field already. I'm thinking this has to be a subquery but I'm not sure how to apply it. This is hard to decribe. Just ask if your not sure what I mean.

+1  A: 

I had a similar problem. My solution was to:

  1. import into a staging-table.
  2. delete the duplicates
  3. copy what's left over into the live table

It's a little BFI, but it just plain works.

CodeSlave
A: 

How are you loading them into the database? Is this with your own code to read the Excel files? You can read the Excel files using ADO/ADO.NET with the right connection string. Then you could read the data using a query that would weed out the dupes.

Ryan Farley
A: 

Building on CodeSlave's answer, Access provides a Find Duplicates Query wizard that can help you easily build the query to weed out the duplicates.

Another approach would be to set up an identity on FLDA and FLDB. This will prevent duplicate entries from even getting written to the table. Of course you will also need to catch the exception that is thrown when the insert operation fails.

Jason Z
+1  A: 

Would a query suit? For example:

INSERT INTO ToUpdate ( Field1, Field2 )
SELECT e.H1, e.H2
FROM (SELECT * FROM [Sheet1$] IN '' [Excel 8.0;HDR=YES;IMEX=1;database=C:\Docs\LTD.xls]) As e
LEFT JOIN ToUpdate ON (e.H2 = ToUpdate.Field2) AND (e.H1 = ToUpdate.Field1)
WHERE ToUpdate.Field1 Is Null AND ToUpdate.Field2 Is Null
Remou
A: 

Is there a field FLDC that would be different for identifying duplicates? I presume there must be, as otherwise having (FLDA,FLDB) as a unique or primary key would solve your problem immediately.

Assuming there is such a field, you could try something like this:

SELECT T1.FLDA, T1.FLDB, T1.FLDC
FROM Table1 T1, Table1 T2
WHERE T1.FLDA = T2.FLDA
  AND T1.FLDB = T2.FLDB
  AND T1.FLDC <> T2.FLDC

The downside here is that both the original and the duplicate will be returned by something like this. If you only want to see the duplicates, you will probably have to figure out a way to identify an 'original' row and add another WHERE clause or two for this.

If you can get a query that gives you just the duplicate rows and not the originals, it should be pretty easy to change it to a DELETE query.

kaybenleroll
A: 

To avoid duplicates on imports:

1 - If there's not already a primary key on the table, put one on FLDA and FLDB (together). If there's already a primary key that's not FLDA and FLDB (together), place an index on the table on these two fields, unique yes, ignore nulls no.

2 - You can import from the spreadsheet to the table with the wizard or with a query. If you do it with the import spreadsheet wizard, you'll see this message before the import starts:

"DB name was unable to append all the data to the table.

"The contents of fields in 0 records were deleted and (xx) records were lost due to key violations. (These lost records were duplicates, so no real loss there.) ... Do you want to proceed anyway?"

Click yes to import the rows from the spreadsheet. No duplicates will be imported.

Or, to use a query for the import, paste this into a new query in sql view (menu: Insert > Query > Design View, Close button; menu: View > SQL View.)

INSERT INTO tblInput
  SELECT XLS.* 
  FROM tblInput AS T RIGHT JOIN
    [Excel 8.0;IMEX=1;HDR=Yes;DATABASE=c:\data.xls;].[Sheet1$] AS XLS 
    ON T.FLDA = XLS.FLDA AND T.FLDB = XLS.FLDB
  WHERE ISNULL(T.FLDA) AND ISNULL(T.FLDB);

Change the path, c:\data.xls, to your path, Sheet1$ to your sheetname, tblInput to your table name, and FLDA and FLDB to your column names. If the spreadsheet doesn't have headers (column names) change HDR=Yes to HDR=No

Chris OC
A: 

Hi all.

I did this by using a delete quey and then using Select from Table 1 Group by X Having Y, Z, A. And then I put run query button for front end users.

Cheers for all your help.