views:

25

answers:

1

I'm looking for an example of how to split-out comma-delimited data in a field of one table, and fill in a second table with those individual elements, in order to make a one-to-many relational database schema. This is probably really simple, but let me give an example:

I'll start with everything in one table, Widgets, which has a "state" field to contain states that have that widget:

Table: WIDGET

===============================
| id | unit | states          |
===============================
|1   | abc  | AL,AK,CA        |
-------------------------------
|2   | lmn  | VA,NC,SC,GA,FL  |
-------------------------------
|3   | xyz  | KY              |
===============================

Now, what I'd like to create via code is a second table to be joined to WIDGET called *Widget_ST* that has widget id, widget state id, and widget state name fields, for example

Table: WIDGET_ST

==============================
| w_id | w_st_id | w_st_name |
------------------------------
|1     | 1       | AL        |
|1     | 2       | AK        |
|1     | 3       | CA        |
|2     | 1       | VA        |
|2     | 2       | NC        |
|2     | 1       | SC        |
|2     | 2       | GA        |
|2     | 1       | FL        |
|3     | 1       | KY        |
==============================

I am learning C# and PHP, so responses in either language would be great.

Thanks.

+3  A: 

I wrote some scripts to import the Stack Overflow data dump into an SQL database. I split the tags list to populate a many-to-many table as you describe. I use a technique similar to the following:

  1. Read a row from WIDGET

    while ($row = $pdoStmt->fetch()) {
    
  2. Use explode() to split on a comma

    $states = explode(",", $row["state"]);
    
  3. Loop over elements, writing to a new CSV file

    $stateid = array();
    $stfile = fopen("states.csv", "w+");
    $mmfile = fopen("manytomany.csv", "w+");
    $i = 0;
    foreach ($state as $st) {
        if (!array_key_exists($st, $stateid)) {
            $stateid[$st] = ++$i;
            fprintf($stfile, "%d,%s\n", $i, $st);
        }
        fprintf($mmfile, "%s,%s\n", $row["id"], $stateid[$st]);
    }
    fclose($stfile);
    fclose($mmfile);
    
  4. When you're done, load the CSV files into the database. You can do this in the mysql client:

    mysql> LOAD DATA INFILE 'states.csv' INTO TABLE STATES;
    mysql> LOAD DATA INFILE 'manytomany.csv' INTO TABLE WIDGET_ST;
    

It may seem like a lot of work, but using the LOAD DATA command runs 20x faster than inserting one row at a time, so it's worthwhile if your data set is large.


Re your comment:

Right, I also have data in a database already. It turns out that the solution I show above, dumping to CSV files and re-importing in normalized format, is many times faster than doing INSERT statements inside the loop that splits the data.

Each brand of database has its own tool for importing bulk data. See my answer to Optimizing big import in PHP for a list of bulk import solutions per database.

You should use the tools provided by each database. Trying to remain cross platform only makes your code Jack of all trades, master of none. Besides, in 90% of cases when people bend over backwards to make their code database-independent, it turns out they never use more than one database. And you can't achieve complete database independence anyway.

Bill Karwin
I'll give you a point - but I'm also looking for something that's more "cross platform", since my DBMS might be SQL Server 2008, or postgres, etc. I also already have the original table in a database format, and I'm trying to update it by making use of more relations as needed. Thanks all the same
Yaaqov