views:

498

answers:

6

I have a table that looks like this:

ProductId, Color
"1", "red, blue, green"
"2", null
"3", "purple, green"

And I want to expand it to this:

ProductId, Color
1, red
1, blue
1, green
2, null
3, purple
3, green

Whats the easiest way to accomplish this? Is it possible without a loop in a proc?

+2  A: 

Take a look at this function. I've done similar tricks to split and transpose data in Oracle. Loop over the data inserting the decoded values into a temp table. The convent thing is that MS will let you do this on the fly, while Oracle requires an explicit temp table.

MS SQL Split Function
Better Split Function

Edit by author: This worked great. Final code looked like this (after creating the split function):

select pv.productid, colortable.items as color
from product p 
 cross apply split(p.color, ',') as colortable
chilltemp
those loops will be slow, I'd bet money that my query would smoke it any day...
KM
A: 

Fix your database if at all possible. Comma delimited lists in database cells indicate a flawed schema 99% of the time or more.

Joel Coehoorn
I agree, but that doesn't really help.
TheSoftwareJedi
the requested result set is a better design, just put "INTO YourNewTableName" between the select list and the FROM of the query and a new table will be created, where the colors are split.
KM
How do you propose the poster fix his database without the answer to this question? -1
brian
At the time of the post it wasn't clear that he was fixing the db rather than building on it.
Joel Coehoorn
A: 

I would create a CLR table-defined function for this:

http://msdn.microsoft.com/en-us/library/ms254508(VS.80).aspx

The reason for this is that CLR code is going to be much better at parsing apart the strings (computational work) and can pass that information back as a set, which is what SQL Server is really good at (set management).

The CLR function would return a series of records based on the parsed values (and the input id value).

You would then use a CROSS APPLY on each element in your table.

casperOne
IMHO that is overkill for something so trivial.
James
@James: It's really simple, MUCH simpler than the code you have to jump through in order to parse stirngs in T-SQL, and it's always going to be faster at parsing. The cross apply is the natural choice here once you have ANY function that parses the lines apart.
casperOne
depending on the number or rows to process and the length of the CSV colors, a CLR will not scale. It will work fine in this example of 3 rows, but if you have to run this query all day every, it will be slow. A pure SQL query like mine, will be much faster.
KM
@mike: That's absolutely not true and I challenge you to show the tests to prove it. The CLR scales just fine, given that SQL Server manages everything the CLR requires (memory, threads) so it can't get too greedy. Additionally, the CLR is always going to be better at procedural code like this.
casperOne
@mike: I'd be willing to show you my own tests of CLR code vs T-SQL code when parsing strings. I've seen anywhere from a 30%-200% increase in speed when processing strings under 10000 characters, with all methods topping out at 100000 chars or so.
casperOne
did you split with a loop or a Numbers table? The CLR may have an advantage for long strings. In this example of color names, I'd doubt that the strings are 10,000 or 100,000 characters long. Do a test where you use a Numbers table to split 10,000 rows of 255 long strings?
KM
+1  A: 

You can try this out, doesnt require any additional functions:

declare @t table (col1 varchar(10), col2 varchar(200))
insert @t
          select '1', 'red,blue,green'
union all select '2', NULL
union all select '3', 'green,purple'


select col1, left(d, charindex(',', d + ',')-1) as e from (
    select *, substring(col2, number, 200) as d from @t col1 left join
        (select distinct number from master.dbo.spt_values where number between 1 and 200) col2
        on substring(',' + col2, number, 1) = ',') t
SomeMiscGuy
GREAT ANSWER, this is a much better query than my first try. See my answer for a modified version of this. Your use of table alias values which are the same as column names was confusing, and the use of a system table for numbers forces you to use an extra derived table. Other than that, GREAT JOB!
KM
A: 

based on your tables:

create table test_table
(
     ProductId  int
    ,Color      varchar(100)
)

insert into test_table values (1, 'red, blue, green')
insert into test_table values (2, null)
insert into test_table values (3, 'purple, green')

create a new table like this:

CREATE TABLE Numbers
(
    Number  int   not null primary key
)

that has rows containing values 1 to 8000 or so.

this will return what you want:

EDIT
here is a much better query, slightly modified from the great answer from @Christopher Klein:

I added the "LTRIM()" so the spaces in the color list, would be handled properly: "red, blue, green". His solution requires no spaces "red,blue,green". Also, I prefer to use my own Number table and not use master.dbo.spt_values, this allows the removal of one derived table too.

SELECT
    ProductId, LEFT(PartialColor, CHARINDEX(',', PartialColor + ',')-1) as SplitColor
    FROM (SELECT 
              t.ProductId, LTRIM(SUBSTRING(t.Color, n.Number, 200)) AS PartialColor
              FROM test_table             t
                  LEFT OUTER JOIN Numbers n ON n.Number<=LEN(t.Color) AND SUBSTRING(',' + t.Color, n.Number, 1) = ','
         ) t

EDIT END

SELECT
    ProductId, Color --,number
    FROM (SELECT
              ProductId
                  ,CASE
                       WHEN LEN(List2)>0 THEN LTRIM(RTRIM(SUBSTRING(List2, number+1, CHARINDEX(',', List2, number+1)-number - 1)))
                       ELSE NULL
                   END AS Color
                  ,Number
              FROM (
                       SELECT ProductId,',' + Color + ',' AS List2
                           FROM test_table
                   ) AS dt
                  LEFT OUTER JOIN Numbers n ON (n.Number < LEN(dt.List2)) OR (n.Number=1 AND dt.List2 IS NULL)
              WHERE SUBSTRING(List2, number, 1) = ',' OR List2 IS NULL
         ) dt2
    ORDER BY ProductId, Number, Color

here is my result set:

ProductId   Color
----------- --------------
1           red
1           blue
1           green
2           NULL
3           purple
3           green

(6 row(s) affected)

which is the same order you want...

KM
A: 

could you post same thing but for oracle? I would like to do that in oracle 10 and cannot find anything.