views:

46

answers:

4

I've decided to rewrite a database I have that is poorly normalized. I've created the new database but now need to move data from the old database into the new one. I'm not sure exactly how to accomplish though. For example, in the old database I have a webDorms table that looks like this:

id (PK, int)
room_type (varchar) 
description (char) 
available (varchar) 
max_occupancy (varchar) 
current_occupancy (varchar) 
dorm_room (varchar) 
dorm_building (varchar) 
building_code (varchar) 

In the new database I split this out into two tables Buildings and Housing. Housing looks like this:

id (PK, int) 
building (FK, int) 
room (nvarchar)
current_occupancy (int) 
max_occupancy (int) 
is_available (bit) 
gender (nvarchar) 
room_type (nvarchar)

Buildings looks like this:

id (PK, int) 
building_code (nvarchar) 
building_name (nvarchar)

I've manually repopulated the Buildings table (it was around twenty rows) but the Housing table (which is most similar to the previous webDorms table) contains around three hundred rows - and I'd rather not rekey all that data.

Any suggestions on the best way to accomplish this import?

+2  A: 

If I'm missing something obvious in your question, please let me know with a comment, but what's wrong with using INSERT queries to copy the data over instead of retyping the data?

JohnFx
Can you give me a sample of how I might do this? The biggest dilemma is that in some of the old columns data is strings, but in the new table it needs to be integers (IDs). I need to perform that conversion during the import.
davemackey
Use the CAST or CONVERT keywords in your query to get the data into the correct formats. For example CAST(StringValueField as INT)
JohnFx
Tbe values aren't a strict conversion from string to integer. Rather, it is that another table now exists which has ID's that correlate to each building.
davemackey
The trick is to add a column to your original table and populate it with the new Building ID's then you should be able to transfer the housing data in the same INSERT INTO query.
JohnFx
Duhh. :) Sometimes I feel pretty stupid when the answer is so simple!
davemackey
@davemackey - We all have those days. Easy for the mind to wander on a Friday. =)
JohnFx
+1  A: 

This is one reason why SQL has SELECT DISTINCT -- normalization.

Two SELECT DISTINCT queries from the source table should create the resulting two normalized tables.

You have two approaches.

INSERT INTO new_table_1(...) AS SELECT DISTINCT a, b, c FROM old table
INSERT INTO new_table_2(...) AS SELECT DISTINCT d, e, f FROM old table

Or do two separate queries to create two separate extract files (in CSV format). Then load those two files.

S.Lott
This looks good...What about converting the string data to integer (ID) data?
davemackey
@davemackey: String to integer. Do you have a SQL reference? Check that. Data conversions are almost always automatic.
S.Lott
+1  A: 

Hi Dave,

you would want to do something like this:

SELECT CAST(current_occupancy AS int) AS current_occupancy,
       CAST(max_occupancy AS int) AS max_occupancy,
       CAST(available AS bit) AS available

You need to take a look at books online and specifically CAST and CONVERT.

You mentioned normalisation as the reason for the re-design of the table, but you have not yet normalised all the possibilities. Room type and gender are candidates here, also the data types for max and current occupancy are rather large, unless you are wanting to have rooms that allow 2 billion occupants. :0)

WilliamD
Yeah, I thought about normalizing room type and gender - but that would create more work than I can justify at this juncture for very little or no practical benefit.
davemackey
+1  A: 

It would look similar to this:

INSERT INTO ['database'].dbo.[housing] (id, building, current_occupancy, max_occupancy, room_type, is_available)
  SELECT id, [building].id, cast(current_occupancy as int), cast(max_occupancy as int), room_type, CASE available WHEN 'Y' THEN 1 ELSE 0 END)
  FROM ['database'].dbo.[webDorms]
  INNER JOIN ['database'].dbo.[building] ON [building].building_code = [webDorms].building_code

[Room] and [Gender] would need to be added manually, or you can populate them with defaults during the insert.

This is of course after you've populated your [building] table, which could have been done by:

INSERT INTO ['database'].dbo.[building] (building_code)
  SELECT DISTINCT building_code 
  FROM ['database'].dbo.[webDorms] 

Then populating your [building_name] manually (instead of keying in all the data).

rickp
Thanks, this is very helpful. Though apparently you can't cast Y/N to bit. :( But I'll add another column to the old table and do the conversion manually...
davemackey
You could wrap it in a case, and set the result to 1 or 0 respectfully (to avoid having to do it manually). I modified my post to reflect this.
rickp
Thanks Rick. I ended up doing it by adding another column - but it wasn't difficult - I'll keep the case method in mind for future reference.
davemackey