views:

768

answers:

20

I know that that is not a question... erm anyway HERE is the question.

I have inherited a database that has 1(one) table in that looks much like this. Its aim is to record what species are found in the various (200 odd) countries.

ID 
Species
Afghanistan
Albania
Algeria
American Samoa
Andorra
Angola
....
Western Sahara
Yemen
Zambia
Zimbabwe

A sample of the data would be something like this

id Species Afghanistan Albania American Samoa
1  SP1         null     null        null
2  SP2          1         1         null
3  SP3         null      null         1

It seems to me this is a typical many to many situation and I want 3 tables. Species, Country, and SpeciesFoundInCountry

The link table (SpeciesFoundInCountry) would have foreign keys in both the species and Country tables.

(It is hard to draw the diagram!)

Species
SpeciesID  SpeciesName

Country
CountryID CountryName

SpeciesFoundInCountry
CountryID SpeciesID

Is there a magic way I can generate an insert statement that will get the CountryID from the new Country table based on the column name and the SpeciesID where there is a 1 in the original mega table?

I can do it for one Country (this is a select to show what I want out)

SELECT Species.ID, Country.CountryID
FROM Country, Species
WHERE (((Species.Afghanistan)=1)) AND (((Country.Country)="Afghanistan"));

(the mega table is called species)

But using this strategy I would need to do the query for each column in the original table.

Is there a way of doing this in sql?

I guess I can OR a load of my where clauses together and write a script to make the sql, seems inelegant though!

Any thoughts (or clarification required)?

Cheers for reading all that :D

+8  A: 

Why do you want to do it in SQL? Just write a little script that does the conversion.

Corporal Touchy
A: 

You're probably going to want to create replacement tables in place. The script sort of depends on the scripting language you have available to you, but you should be able to create the country ID table simply by listing the columns of the table you have now. Once you've done that, you can do some string substitutions to go through all of the unique country names and insert into the speciesFoundInCountry table where the given country column is not null.

nsayer
+5  A: 

I would use a script to generate all the individual queries, since this is a one-off import process.

Some programs such as Excel are good at mixing different dimensions of data (comparing column names to data inside rows) but relational databases rarely are.

However, you might find that some systems (such as Microsoft Access, surprisingly) have convenient tools which you can use to normalise the data. Personally I'd find it quicker to write the script but your relative skills with Access and scripting might be different to mine.

Leigh Caldwell
+3  A: 

When I run into these I write a script to do the conversion rather than trying to do it in SQL. It is typically much faster and easier for me. Pick any language you are comfortable with.

StubbornMule
A: 

You could probably get clever and query the system tables for the column names, and then build a dynamic query string to execute, but honestly that will probably be uglier than a quick script to generate the SQL statements for you.

Hopefully you don't have too much dynamic SQL code that accesses the old tables buried in your codebase. That could be the really hard part.

Eric Z Beard
+1  A: 

In SQL Server this will generate your custom select you demonstrate. You can extrapolate to an insert

select 
  'SELECT Species.ID, Country.CountryID FROM Country, Species WHERE (((Species.' + 
 c.name + 
 ')=1)) AND (((Country.Country)="' +
 c.name + 
 '"))'
from syscolumns c
inner join sysobjects o
on o.id = c.id
where o.name = 'old_table_name'
Tom Ritter
A: 

As with the others I would most likely just do it as a one time quick fix in whatever manner works for you.

With these types of conversions, they are one off items, quick fixes, and the code doesn't have to be elegant, it just has to work. For these types of things I have done it many ways.

Mitchel Sellers
A: 

If this is SQL Server, you can use the sys.columns table to find all of the columns of the original table. Then you can use dynamic SQL and the pivot command to do what you want. Look those up online for syntax.

eulerfx
A: 

I would definitely agree with your suggestion of writing a small script to produce your SQL with a query for every column.

In fact your script could have already been finished in the time you've spent thinking about this magical query (that you would use only one time and then throw away, so what's the use in making it all magicy and perfect)

+2  A: 

If this was SQL Server, you'd use the Unpivot commands, but looking at the tag you assigned it's for access - am I right?

Although there is a pivoting command in access, there is no reverse statement.

Looks like it can be done with a complex join. Check this interesting article for a lowdown on how to unpivot in a select command.

digiguru
+1  A: 

I would make it a three step process with a slight temporary modification to your SpeciesFoundInCountry table. I would add a column to that table to store the Country name. Then the steps would be as follows.

1) Create/Run a script that walks columns in the source table and creates a record in SpeciesFoundInCountry for each column that has a true value. This record would contain the country name. 2) Run a SQL statement that updates the SpeciesFoundInCountry.CountryID field by joining to the Country table on Country Name. 3) Cleanup the SpeciesFoundInCountry table by removing the CountryName column.

Here is a little MS Access VB/VBA pseudo code to give you the gist

Public Sub CreateRelationshipRecords()

Dim rstSource as DAO.Recordset Dim rstDestination as DAO.Recordset Dim fld as DAO.Field dim strSQL as String Dim lngSpeciesID as Long

strSQL = "SELECT * FROM [ORIGINALTABLE]" Set rstSource = CurrentDB.OpenRecordset(strSQL) set rstDestination = CurrentDB.OpenRecordset("SpeciesFoundInCountry")

rstSource.MoveFirst

' Step through each record in the original table Do Until rstSource.EOF lngSpeciesID = rstSource.ID ' Now step through the fields(columns). If the field ' value is one (1), then create a relationship record ' using the field name as the Country Name For Each fld in rstSource.Fields If fld.Value = 1 then with rstDestination .AddNew .Fields("CountryID").Value = Null .Fields("CountryName").Value = fld.Name .Fields("SpeciesID").Value = lngSpeciesID .Update End With End IF Next fld
rstSource.MoveNext Loop

' Clean up rstSource.Close Set rstSource = nothing ....

End Sub

After this you could run a simple SQL statement to update the CountryID values in the SpeciesFoundInCountry table.

UPDATE SpeciesFoundInCountry INNER JOIN Country ON SpeciesFoundInCountry.CountryName = Country.CountryName SET SpeciesFoundInCountry.CountryID = Country.CountryID;

Finally, all you have to do is cleanup the SpeciesFoundInCountry table by removing the CountryName column.

****SIDE NOTE: I have found it usefull to have country tables that also include the ISO abbreviations (country codes). Occassionally they are used as Foreign Keys in other tables so that a join to the Country table does not have to be included in queries.

For more info: http://en.wikipedia.org/wiki/Iso_country_codes

Thanks.If you go back and edit your post then select the code bits and click on the code button (zeros and ones!) it will indent your code with four spaces, which gives you the groovy code colouring etc.
Aidan
A: 

Sorry, but the bloody posting parser removed the whitespace and formatting on my post. It makes it a log harder to read.

A: 

@stomp:

Above the box where you type the answer, there are several buttons. The one that is 101010 is a code sample. You select all your text that is code, and then click that button. Then it doesn't get messed with much.

cout>>"I don't know C"
cout>>"Hello World"
CindyH
+1  A: 

I would use a Union query, very roughly:

Dim db As Database
Dim tdf As TableDef

Set db = CurrentDb

Set tdf = db.TableDefs("SO")

strSQL = "SELECT ID, Species, """ & tdf.Fields(2).Name _
    & """ AS Country, [" & tdf.Fields(2).Name & "] AS CountryValue FROM SO "

For i = 3 To tdf.Fields.Count - 1
    strSQL = strSQL & vbCrLf & "UNION SELECT ID, Species, """ & tdf.Fields(i).Name _
    & """ AS Country, [" & tdf.Fields(i).Name & "] AS CountryValue FROM SO "
Next

db.CreateQueryDef "UnionSO", strSQL

You would then have a view that could be appended to your new design.

Remou
+1  A: 

When I read the title 'bad BAD database design', I was curious to find out how bad it is. You didn't disappoint me :)

As others mentioned, a script would be the easiest way. This can be accomplished by writing about 15 lines of code in PHP.

SELECT * FROM ugly_table;
while(row)
foreach(row as field => value)
if(value == 1)
SELECT country_id from country_table WHERE country_name = field;

if(field == 'Species')
SELECT species_id from species_table WHERE species_name = value;

INSERT INTO better_table (...)

Obviously this is pseudo code and will not work as it is. You can also populate the countries and species table on the fly by adding insert statements here.

Gaurav
I wept real tears when I saw it! What further glories have I to discover that is the real question :)
Aidan
A: 

Sorry, I've done very little Access programming but I can offer some guidance which should help.

First lets walk through the problem. It is assumed that you will typically need to generate multiple rows in SpeciesFoundInCountry for every row in the original table. In other words species tend to be in more then one country. This is actually easy to do with a Cartesian product, a join with no join criteria.

To do a Cartesian product you will need to create the Country table. The table should have the country_id from 1 to N (N being the number of unique countries, 200 or so) and country name. To make life easy just use the numbers 1 to N in column order. That would make Afghanistan 1 and Albania 2 ... Zimbabwe N. You should be able to use the system tables to do this.

Next create a table or view from the original table which contains the species and a sting with a 0 or 1 for each country. You will need to convert the null, not null to a text 0 or 1 and concatenate all of the values into a single string. A description of the table and a text editor with regular expressions should make this easy. Experiment first with a single column and once that's working edit the create view/insert with all of the columns.

Next join the two tables together with no join criteria. This will give you a record for every species in every country, you're almost there.

Now all you have to do is filter out the records which are not valid, they will have a zero in the corresponding location in the string. Since the country table's country_code column has the substring location all you need to do is filter out the records where it's 0.

where substring(new_column,country_code) = '1'

You will still need to create the species table and join to that

where a.species_name = b.species_name

a and b are table aliases.

Hope this help

+1  A: 

OBTW,

If you have queries that already run against the old table you will need to create a view which replicates the old tables using the new tables. You will need to do a group by to denormalize the tables.

Tell your users that the old table/view will not be supported in the future and all new queries or updates to older queries will have to use the new tables.

+1  A: 

If I ever have to create a truckload of similar SQL statements and execute all of them, I often find Excel is very handy. Take your original query. If you have a country list in column A and your SQL statement in column B, formated as text (in quotes) with cell references inserted where the country appears in the sql

e.g. ="INSERT INTO new_table SELECT ... (species." & A1 & ")= ... ));"

then just copy the formula down to create 200 different SQL statements, copy/paste the column to your editor and hit F5. You can of course do this with as many variables as you want.

Glenn M
A: 

This is (hopefully) a one-off exercise, so an inelegant solution might not be as bad as it sounds.

The problem (as, I'm sure you're only too aware!) is that at some point in your query you've got to list all those columns. :( The question is, what is the most elegant way to do this? Below is my attempt. It looks unwieldy because there are so many columns, but it might be what you're after, or at least it might point you in the right direction.

Possible SQL Solution:

CREATE TABLE Country
(id    int, 
 name  varchar(50)) 

INSERT Country
      SELECT 1, 'Afghanistan'
UNION SELECT , 'Albania', 
UNION SELECT , 'Algeria' ,
UNION SELECT , 'American Samoa' ,
UNION SELECT , 'Andorra' ,
UNION SELECT , 'Angola' ,
...
UNION SELECT , 'Western Sahara', 
UNION SELECT , 'Yemen', 
UNION SELECT , 'Zambia', 
UNION SELECT , 'Zimbabwe', 



CREATE TABLE #tmp
(key        varchar(255),  -- needs to be as long as the number of countries you have  
 country_id int) 

INSERT #tmp 
SELECT '1________ ... _',     -- '1' followed by underscores to make the length = number of countries 
       'Afghanistan' 
UNION SELECT '_1_______ ... ___', 'Albania'
UNION SELECT '__1______ ... ___', 'Algeria'
...
UNION SELECT '________ ... _1_', 'Zambia'
UNION SELECT '________ ... __1', 'Zimbabwe'

CREATE TABLE new_table
(country_id int, 
species_id int) 

INSERT new_table
SELECT species.id, country_id
FROM   species s , 
       #tmp    t
WHERE  isnull( s.Afghanistan, ' ' ) + isnull( s.Albania, ' ' ) + ... + isnull( s.Zambia, ' ' ) + isnull( s.Zimbabwe, ' ' ) like t.key

This is (hopefully) a one-off exercise, so an inelegant solution might not be as bad as it sounds.

However, you could try this:

CREATE TABLE Country
(id    int, 
 name  varchar(50)) 

INSERT Country
      SELECT 1, 'Afghanistan'
UNION SELECT , 'Albania', 
UNION SELECT , 'Algeria' ,
UNION SELECT , 'American Samoa' ,
UNION SELECT , 'Andorra' ,
UNION SELECT , 'Angola' ,
...
UNION SELECT , 'Western Sahara', 
UNION SELECT , 'Yemen', 
UNION SELECT , 'Zambia', 
UNION SELECT , 'Zimbabwe', 



CREATE TABLE #tmp
(key        varchar(255),  
 country_id int) 
/* "key" field needs to be as long as the number of countries you have */  


INSERT #tmp 
SELECT '1________ ... _', 'Afghanistan' 
/* '1' followed by underscores to make the length = number of countries */

UNION SELECT '_1_______ ... ___', 'Albania'
UNION SELECT '__1______ ... ___', 'Algeria'
...
UNION SELECT '________ ... _1_', 'Zambia'
UNION SELECT '________ ... __1', 'Zimbabwe'

CREATE TABLE new_table
(country_id int, 
species_id int) 

INSERT new_table
SELECT species.id, country_id
FROM   species s , 
       #tmp    t
WHERE  isnull( s.Afghanistan, ' ' ) +  
       isnull( s.Albania, ' ' ) +  
       ... +  
       isnull( s.Zambia, ' ' ) +  
       isnull( s.Zimbabwe, ' ' ) like t.key

My Suggestion

Personally, I would not do this. I would do a quick and dirty solution like the one to which you allude, except that I would hard-code the country ids (because you're only going to do this once, right? And you can do it right after you create the country table, so you know what all the IDs are):

INSERT new_table SELECT Species.ID, 1 FROM Species WHERE Species.Afghanistan = 1 
INSERT new_table SELECT Species.ID, 2 FROM Species WHERE Species.Albania= 1 
...
INSERT new_table SELECT Species.ID, 999 FROM Species WHERE Species.Zambia= 1 
INSERT new_table SELECT Species.ID, 1000 FROM Species WHERE Species.Zimbabwe= 1
AJ
A: 

When I've been faced with similar problems, I've found it convenient to generate a script that generates SQL scripts. Here's the sample you gave, abstracted to use %PAR1% in place of Afghanistan.

SELECT Species.ID, Country.CountryID
FROM Country, Species
WHERE (((Species.%PAR1%)=1)) AND (((Country.Country)="%PAR1%"))
UNION

Also the key word union has been added as a way to combine all the selects.

Next, you need a list of countries, generated from your existing data:

Afghanistan Albania . , .

Next you need a script that can iterate through the country list, and for each iteration, produce an output that substitutes Afghanistan for %PAR1% on the first iteration, Albania for the second iteration and so on. The algorithm is just like mail-merge in a word processor. It's a little work to write this script. But, once you have it, you can use it in dozens of one-off projects like this one.

Finally, you need to manually change the last "UNION" back to a semicolon.

If you can get Access to perform this giant union, you can get the data you want in the form you want, and insert it into your new table.

Walter Mitty