views:

111

answers:

2

Hi friends,

I have a huge access mdb file which contains a single table with 20-30 columns and over 50000 rows and i have some thing like this

columns:

id desc name phone email fax ab bc zxy sd country state zip .....
1  a     ab  12     fff   12  w 2  3   2    d     sd     233
2  d     ab  12     fff   12  s 2  3   1    d     sd     233

here I have some column values related to addresses repeating is there a way to normalize the above table so that we can remove duplicates or repeating data.

Thanks in advance.

+2  A: 

Here's a quick answer. You just need to move your address fields to a new table (remove dups) and add a FK back to your primary table.

Table 1 (People or whatever)

id desc name phone email fax  ab  bc  zxy  sd   address_id 
1  a     ab  12     fff   12   w  2   3    2    1
2  d     ab  12     fff   12   s  2   3    1    2
3  d     ab  12     fff   12   s  2   3    1    2
4  d     ab  12     fff   12   s  2   3    1    1

Table 2 (Address)

address_id country state  zip .....
1          d       sd     233
2          e       ac     123
Jim W
this is exactly what i did but the peoblem is how to insert address id when adding new rows into Table 1?
SweetGangster
You can build a form in Access to let the user pick an existing address or enter a new one. You just need to define the one-to-many relationship then build a form.
Jim W
First, you add the address in Table 2 (or find it if it's there) then pull the address_id and insert it in Table 1 with the rest of your data.
Lance Roberts
@Lance i wanted something similar to that but here is the problem i have raw data coming in the form of single table and i need to refine and send it to two tables i can add address in table 2 but i m not undertanding how would you insert the address_id in table 1
SweetGangster
That's more of a programming question than db design question. Can you reuse ids from each load or do all of the raw data loads eventually get loaded into the same master table?
Jim W
i have the single master data in access tablealready with huge data and now i m normalizing and using sql server to store it efficiently. now i created two tables in sql server.i included adress_id column which doesnt contain any rows in the first table and filled out the second table perfectly but i still have to fill in the adress_id column of the first table.??
SweetGangster
You need leave in the address fields in the first table and then do an update to it while joining to the second table on the address fields to get the address_id values loaded to the first table. If you're not familiar with SQL or comfortable with all of the data migration requirements, then you could just leave the data in the flatten/single table format. Normalization is good to save disk space, but I'm betting you have gigs of free space that's not be used. A flattened/warehouse style solution is going to be faster to retrieve and probably easier for you to manage.
Jim W
+1  A: 

Jim W has a good start, but to normalize even further, make your redundant address elements into separate tables as well.

Create the tables for which address data is repeated (Country, State, etc.) Once you have your data tables, you'll want to add columns such as StateID, CountryID, etc. to the Address table.

You now have options for fixing the existing data. You can be quick and dirty and use Update statements to set all the newly created ID fields to point to the right data table.

UPDATE Addresses SET StateID=1 WHERE STATE='AL'

You can do this fairly quickly as a batch .sql file, but I'd recommend a more programmatic solution that rolls through the Address table and tries to match the current 'State' to an entry in the new States table. If found, the StateID on the Address table is updated with the id from the corresponding row in States.

You can then delete the old State field from the address table, as it is now normalized nice and neatly into a separate States table.

This process can be repeated for all redundant data elements. However, IMO db normalization can be taken too far. For example, if you have a commonly used query that, after normalization, requires 10 joins to accomplish, you may see a performance reduction. This doesn't appear to be the case here, as I think you're on the right track.

From a comment above: @Lance i wanted something similar to that but here is the problem i have raw data coming in the form of single table and i need to refine and send it to two tables i can add address in table 2 but i m not undertanding how would you insert the address_id in table 1

You can retrieve the newly created ID from the address table using @@IDENTITY, and update the address_ID with this value.

IniTech
You can retrieve the newly created ID from the address table using @@IDENTITY, and update the address_ID with this value. can you explain this a bit clearly?
SweetGangster
He's talking about setting (address table) address_id to an IDENTITY data type (it's like autonumber in access). Then when you insert the address into that table using SQL INSERT command, you can get this autonumber and update the corresponding address_id field in your first table
Jim W
i did declare the adress_id column as identity in second table but i still couldnt undertand how to update the first table adress_id based on second table column values i thought of using sql cursors to compare two table columns and if it matches then update adres_id in first table but there seems to be something wrong with my cursor code i was using nested cursors
SweetGangster
After inserting the address record, you can perform 'SELECT @@IDENTITY from Addresses' and it will return the last id added to that table. Use this result to update the original table's Address_id field.
IniTech
Thanks for the help guys
SweetGangster