views:

40

answers:

2

I'm looking for a way to fix and/or abstract away a comma-separated values (CSV) list in a database field in order to reconstruct a usable relationship such that I can properly join the two tables below and query them using C# LINQ and its .Join method.

Following is a sample showing the Person table and CsvArticleIds field having a CSV value to represent a one-to-many association with Article records.

TABLE [dbo].[Person]

Id Name       CsvArticleIds
-- ---------- --------
1  Joe        "15,22"
5  Ed         "22"
10 Arnie      "8,15,22"

^^^(Of course a link table should have been created; nonetheless the relationship with articles is trapped inside that list of CSV values.)

TABLE [dbo].[Article]

Id Title
-- ----------
8  Beginning C#
15 A Historic look at Programming in the 90s
22 Gardening in January

Additional Info

  • the fix can be at any level: C#.NET or SQL Server
  • something easy because I will be repeating the solution for many other CSV values in other tables.
  • Elegant is nice too.
  • not looking for efficiency because this is part of a one-time data migration task and can take as long as it wants to run.
+1  A: 

I would fix this at the table level using SQL. I'd create a new table with the person Id and an article Id in it. After populating this new table, I'd drop the Person.CsvArticleIds column. You will then have a normalized table structure to store articles for people.

You'll need to split that CsvArticleIds string. There are many ways to split string in SQL Server. This article covers the PROs and CONs of just about every method:

"Arrays and Lists in SQL Server 2005 and Beyond, When Table Value Parameters Do Not Cut it" by Erland Sommarskog

You need to create a split function. This is how a split function can be used:

SELECT
    *
    FROM YourTable                               y
    INNER JOIN dbo.yourSplitFunction(@Parameter) s ON y.ID=s.Value

I prefer the number table approach to split a string in TSQL but there are numerous ways to split strings in SQL Server, see the previous link, which explains the PROs and CONs of each.

For the Numbers Table method to work, you need to do this one time table setup, which will create a table Numbers that contains rows from 1 to 10,000:

SELECT TOP 10000 IDENTITY(int,1,1) AS Number
    INTO Numbers
    FROM sys.objects s1
    CROSS JOIN sys.objects s2
ALTER TABLE Numbers ADD CONSTRAINT PK_Numbers PRIMARY KEY CLUSTERED (Number)

Once the Numbers table is set up, create this function:

CREATE FUNCTION [dbo].[FN_ListToTable]
(
     @SplitOn  char(1)      --REQUIRED, the character to split the @List string on
    ,@List     varchar(8000)--REQUIRED, the list to split apart
)
RETURNS TABLE
AS
RETURN 
(

    ----------------
    --SINGLE QUERY-- --this will not return empty rows
    ----------------
    SELECT
        ListValue
        FROM (SELECT
                  LTRIM(RTRIM(SUBSTRING(List2, number+1, CHARINDEX(@SplitOn, List2, number+1)-number - 1))) AS ListValue
                  FROM (
                           SELECT @SplitOn + @List + @SplitOn AS List2
                       ) AS dt
                      INNER JOIN Numbers n ON n.Number < LEN(dt.List2)
                  WHERE SUBSTRING(List2, number, 1) = @SplitOn
             ) dt2
        WHERE ListValue IS NOT NULL AND ListValue!=''

);
GO 

You can now easily split a CSV string into a table and join on it:

select * from dbo.FN_ListToTable(',','1,2,3,,,4,5,6777,,,')

OUTPUT:

ListValue
-----------------------
1
2
3
4
5
6777

(6 row(s) affected)

To make what you need work, use CROSS APPLY:

DECLARE @YourTable table (Id int, Name varchar(10), CsvArticleIds varchar(500))
INSERT @YourTable VALUES (1  ,'Joe'        ,'15,22')
INSERT @YourTable VALUES (5  ,'Ed'         ,'22')
INSERT @YourTable VALUES (10 ,'Arnie'      ,'8,15,22')

DECLARE @YourTableNormalized table (Id int, ArticleId int)

    INSERT INTO @YourTableNormalized
        (Id, ArticleId)
    SELECT 
        y.Id, st.ListValue
        FROM @YourTable y 
            CROSS APPLY  dbo.FN_ListToTable(',',y.CsvArticleIds) AS st
        ORDER BY st.ListValue

SELECT * FROM @YourTableNormalized ORDER BY Id,ArticleId

OUTPUT:

Id          ArticleId
----------- -----------
1           15
1           22
5           22
10          8
10          15
10          22

(6 row(s) affected)
KM
Impressive solution. I need to understand why 10,000 Numbers instead of a different amount? Might I overrun this range in a circumstance?
John K
a Numbers table can be used for multiple things: [Why should I consider using an auxiliary numbers table?](http://sqlserver2000.databases.aspfaq.com/why-should-i-consider-using-an-auxiliary-numbers-table.html) Not all the code in that article is the most optimal, but it gives some ideas as to what you can do with a Numbers table. I keep one on all of my databases, it is just a tool to keep in your toolbox for whenever you need it, just like the split function.
KM
As for 10,000, it is just a round number that I typically use to populate my Numbers tables. You will need to have Numbers rows up to the length of the longest CsvArticleIds value.
KM
Thank you for clarifying. About the 10,000 explanation I'm reading it as the length of the entire CSV string is needed rather than the maximum identity value that might be stored inside it. For example, the CSV value `"1,50,100"` would need 8 Numbers in support of the length of the whole string rather than 100 Numbers to accommodate the max value embedded.
John K
you are correct, only 8 Numbers rows for the string: `1,50,100`. However, if you have a `"` double quote at the beginning and end of each CSV string, you'll need to strip those out before splitting. use this line of code: `CROSS APPLY dbo.FN_ListToTable(',',SUBSTRING(y.CsvArticleIds,2,LEN(y.CsvArticleIds)-2)) AS st`
KM
+1  A: 

transform the Person table into something more useful first, like

var newpersons =
    data.Persons.Select(p => new
        {
          Id = p.Id,
          Name = p.Name,
          ArticleIds = p.CsvArticleIds.Substring(1, p.CsvArticleIds.Length -2).Split(',').ToList()
        });

now you can join against the person.ArticleIds collection.

if holding the entire transformed Person table in memory can't be done, then use the same .Select to transform groups of records, pulling Person objects out of the DB, say 100 at a time, using Skip() and Take().

Mike Jacobs