views:

133

answers:

2

Say that the raw text of CSV exports and an associated timestamps are stored in a database, where one record is equivalent to one export.

Does anyone have a way to execute a query on the CSV file stored in that field without creating a second connection to the database or exporting the data to a file and then reopening it using the csv text driver?

Assume that:

1) you can't write out a physical file onto the server in the solution

2) you can't a second connection to the server w/ OPENROWSET (servers, usernames & passwords change)

3) that it must be a 100% SQL solution - must be able to be run as an SP

4) that you only need to work with one record at time - the solution doesn't need to account for selecting from multiple csv files stored in the DB.

A: 

You could set up a series of user-defined functions which could parse through the column. It would likely be slow and wouldn't be robust at all.

As an example though (with no real error checking, etc. and only minimally tested):

IF OBJECT_ID('dbo.Test_CSV_Search') IS NOT NULL
    DROP TABLE dbo.Test_CSV_Search
GO
CREATE TABLE dbo.Test_CSV_Search
(
    my_id   INT IDENTITY    NOT NULL,
    txt     VARCHAR(MAX)    NOT NULL,
    CONSTRAINT PK_Test_CSV_Search PRIMARY KEY CLUSTERED (my_id)
)
GO
INSERT INTO dbo.Test_CSV_Search (txt) VALUES ('11, 12, 13, 14,15,16
21,22, 23,24, 25,26
31,22,33,34,35,36')
GO
IF OBJECT_ID('dbo.Get_CSV_Row') IS NOT NULL
    DROP FUNCTION dbo.Get_CSV_Row
GO
CREATE FUNCTION dbo.Get_CSV_Row
(@my_id INT, @col_num SMALLINT, @search_value VARCHAR(100))
RETURNS @results TABLE (row_num INT, row_txt VARCHAR(MAX))
AS
BEGIN
    DECLARE
        @csv_txt    VARCHAR(MAX),
        @full_row   VARCHAR(MAX),
        @start_pos  INT,
        @end_pos    INT,
        @col_txt    VARCHAR(100),
        @cur_col    SMALLINT,
        @line_start INT,
        @line_end   INT,
        @row_num    INT

    SELECT @csv_txt = txt + CHAR(10) FROM dbo.Test_CSV_Search WHERE my_id = @my_id

    SELECT
        @line_start = 1,
        @cur_col = 1,
        @start_pos = 1,
        @row_num = 1

    WHILE (CHARINDEX(CHAR(10), @csv_txt, @line_start) > 0)
    BEGIN
        SELECT
            @line_end = CHARINDEX(CHAR(10), @csv_txt, @line_start),
            @end_pos = CHARINDEX(',', @csv_txt, @start_pos)

        WHILE (@cur_col < @col_num)
        BEGIN
            SET @start_pos = @end_pos + 1
            SET @end_pos = CHARINDEX(',', @csv_txt, @start_pos)
            SET @cur_col = @cur_col + 1
        END

        IF (RTRIM(LTRIM(SUBSTRING(@csv_txt, @start_pos, @end_pos - @start_pos))) = @search_value)
        BEGIN
            INSERT INTO @results (row_num, row_txt) VALUES (@row_num, RTRIM(LTRIM(SUBSTRING(@csv_txt, @line_start, @line_end - @line_start))))
        END

        SELECT
            @line_start = @line_end + 1,
            @start_pos = @line_end + 1,
            @cur_col = 1,
            @row_num = @row_num + 1
    END

    RETURN
END
GO

SELECT * FROM dbo.Get_CSV_Row(1, 1, '11')
Tom H.
A: 

My solution would be to create a UDF that will parse the CSV data into a table variable. Then, in the SP, retrieve the CSV, pass it to the UDF, then run the query against the table variable.

First, create a UDF to return a table from the CSV value (uses CHAR(13) to determine new lines, may need to be altered to work with your data):

CREATE FUNCTION [dbo].[fnParseCSV] (@InputString NVARCHAR(MAX), @Delimiter NCHAR(1) = ',')  
RETURNS @tbl TABLE (ID int, Val NVARCHAR(64)) AS 
BEGIN
    declare @singleLine nvarchar(max)
    declare @id int
    declare @val varchar(64)

    WHILE LEN(@InputString) > 0 BEGIN
        IF CHARINDEX(char(13), @InputString) > 0 BEGIN
            SELECT  @singleLine = SUBSTRING(@InputString, 1, CHARINDEX(char(13), @InputString) - 1)
            IF CHARINDEX(@Delimiter, @singleline) > 0 BEGIN
                SELECT  @id = convert(int, SUBSTRING(@singleline, 1, CHARINDEX(@Delimiter, @singleline) - 1))
                SELECT @val = RIGHT(@singleline, LEN(@singleline) - CHARINDEX(@Delimiter, @singleline) )
                INSERT INTO @tbl (id, val) values (@id, @val)
            END

            SELECT @InputString = RIGHT(@InputString, LEN(@InputString) - CHARINDEX(char(13), @InputString) )
        END
        ELSE 
        BEGIN
            IF CHARINDEX(@Delimiter, @inputString) > 0 
            BEGIN
                SELECT  @id = convert(int, SUBSTRING(@inputString, 1, CHARINDEX(@Delimiter, @inputString) - 1))
                SELECT @val = RIGHT(@inputString, LEN(@inputString) - CHARINDEX(@Delimiter, @inputString) )
                INSERT INTO @tbl (id, val) values (@id, @val)
            END
            set @inputString = ''
        END
    END
    RETURN
END

Then run the query against that output:

select * from dbo.fnParseCsv('123,val1' + char(13) + '456,val2' + CHAR(13) + '789,val3', ',')
Paul Kearney - pk