views:

41

answers:

2

I have an Access database that has two tables that are related by PK/FK. Unfortunately, the database tables have allowed for duplicate/redundant records and has made the database a bit screwy. I am trying to figure out a SQL statement that will fix the problem.

To better explain the problem and goal, I have created example tables to use as reference: alt text You'll notice there are two tables, a Student table and a TestScore table where StudentID is the PK/FK.

The Student table contains duplicate records for students John, Sally, Tommy, and Suzy. In other words the John's with StudentID's 1 and 5 are the same person, Sally 2 and 6 are the same person, and so on.

The TestScore table relates test scores with a student.

Ignoring how/why the Student table allowed duplicates, etc - The goal I'm trying to accomplish is to update the TestScore table so that it replaces the StudentID's that have been disabled with the corresponding enabled StudentID. So, all StudentID's = 1 (John) will be updated to 5; all StudentID's = 2 (Sally) will be updated to 6, and so on. Here's the resultant TestScore table that I'm shooting for (Notice there is no longer any reference to the disabled StudentID's 1-4): alt text Can you think of a query (compatible with MS Access's JET Engine) that can accomplish this goal? Or, maybe, you can offer some tips/perspectives that will point me in the right direction.

Thanks.

+1  A: 

The most common technique to identify duplicates in a table is to group by the fields that represent duplicate records:

ID  FIRST_NAME  LAST_NAME
1   Brian   Smith
3   George  Smith
25  Brian   Smith

In this case we want to remove one of the Brian Smith Records, or in your case, update the ID field so they both have the value of 25 or 1 (completely arbitrary which one to use).

SELECT  min(id)
    FROM example
GROUP BY first_name, last_name

Using min on ID will return:

ID  FIRST_NAME  LAST_NAME
1   Brian   Smith
3   George  Smith

If you use max you would get

ID  FIRST_NAME  LAST_NAME
25  Brian   Smith
3   George  Smith

I usually use this technique to delete the duplicates, not update them:

DELETE FROM example
      WHERE ID NOT IN (SELECT   MAX (ID)
                           FROM example
                       GROUP BY first_name, last_name)
Brian
Thanks, Brian. That's a cool method for deleting duplicates.However, although I'd be fine with deleting duplicates from my sample Student table, it is mandatory that I save (Update) the existing records in the TestScore table.Referring back to the sample TestScore table, you'll notice that there are records for John(ID=1) and John(ID=5). The problem is, John ID1 and ID5 are the same person. So, I want to update all the ID=1 to ID=5.I do not want to lose track of the history for all of John's (and the other students') test scores.
Jed
+1  A: 

The only way to do this is through a series of queries and temporary tables.

First, I would create the following Make Table query that you would use to create a mapping of the bad StudentID to correct StudentID.

Select S1.StudentId As NewStudentId, S2.StudentId As OldStudentId 
Into zzStudentMap
From Student As S1
    Inner Join Student As S2
        On S2.Name = S1.Name
Where S1.Disabled = False
    And S2.StudentId <> S1.StudentId
    And S2.Disabled = True

Next, you would use that temporary table to update the TestScore table with the correct StudentID.

Update TestScore
    Inner Join zzStudentMap
        On zzStudentMap.OldStudentId = TestScore.StudentId
Set StudentId = zzStudentMap.NewStudentId
Thomas
I didn't think to use a temp table. Thanks, Thomas.
Jed