ansaurus

Question

Answer 1

A:

Since this is to run repeatedly, I would ask (strongly suggest) they provide me a proper file where the issueID and valid appear on every line. This is much easier to process. You need to know for sure what the values for these fields are to properly import to your system.

Based on the comments: IN SQL Server you can would build a function to split the data based on the charindex for commas. If you search Google for fn_split, you will find a sample of this. Not sure how you would do this in Access but it would probably be an interative process where you look for the last comma and move everything past it to a holding table and then get rid of the command, then do again until there are no more commas. It iseasiest to do imports like this to staging tables where you can manipulate the data the way you need it and then put the final result into your real tables.

HLGEM 2009-05-27 14:10:36

I'm not sure I was clear: what I'm trying to get is the list of IssueIDs for each DocID, then update the link table with that data. I could do this iteratively:SELECT IssueID FROM Sheet WHERE References LIKE "*DocID1*"SELECT IssueID FROM Sheet WHERE References LIKE "*DocID2*"etc.I'm just trying to get my SQL skills to the next level, and my instinct tells me that iterating through the table with VBA is bad form.

JeffP 2009-05-27 14:49:24

Your problem is the file is not in a format you can use easily. If the issueid is blank how do you know what the issue id really is? I would not accept this file in this format personally.To figure this out you will have toiteratively figure out each issue id assumming that the next one above it is the correct one. You need to import it with some kind of rowid column, so you can look for it. Another issue may be that records do not necessarily get imported in the order in the file. That is why the file should have something to tie the issue id to the references other than the order.

HLGEM 2009-05-27 15:03:16

Ah, now I understand your question. There is an IssueID in every row, the problem is that the cell in the references column may contain multiple DocIDs

JeffP 2009-05-27 15:10:15

Answer 2

A:

My first choice would be to put together a quick application in C# or VB.Net to handle this.

If that wasn't viable, I'd have an "Import" table which took everything as is. Then I would use a cursor to iterate the records in the table. Inside the cursor I'd keep track of the IssueId and Val1 and parse the References column to create my child records. This part I'd package into a stored procedure.

Chris Lively 2009-05-27 14:41:02

Answer 3

A:

I suggest that you research SQL Server Integration Services (SSIS). This tool was created to do this kind of data import/export as quickly as possible with as little code.

Read about it. Do some hands on labs to see if any examples are close to what your trying to do.

http://en.wikipedia.org/wiki/SQL_Server_Integration_Services http://www.microsoft.com/downloads/details.aspx?familyid=b1145e7a-a4e3-4d14-b1e7-d1d823b6a447&displaylang=en

opadilla 2009-05-27 14:50:59

My options are Access or Oracle, so a MSSQL Server solution will not be helpful.

JeffP 2009-05-27 15:13:04

Is a Visual Studio solution helpful? If so...You don't need to read from or write to SQL Server to use SSIS. You do need to to install an express edition (free) to have the Business Intelligence Projects available to you in Visual Studio.Refer to the "Creating a Package" section on this page:<BEGIN_LINK>http://www.accelebrate.com/sql_training/ssis_tutorial.htm<END_LINK>You can cannot to Oracle and MySQL with an SSIS package.

opadilla 2009-05-28 19:37:57

Answer 4

A:

Do you mean (typed, not tested):

Dim rs As DAO.Recordset
Dim rsIn As DAO.Recordset ''Or ADO if you link directly to Excel

Set rs=CurrentDB.OpenRecordset( _
   "SELECT * FROM DocLinks dl INNER JOIN Docs d ON dl.DocKey=d.DocKey")

Do While Not rsIn.EOF

   astrDocs=Split(rsIn!References, vbCrLf)

   For Each strDoc In astrDocs
      rs.FindFirst "DocID='" & strDoc & "'"

      If rs.NoMatch Then 
         strSQL="INSERT INTO DocLinks (DocID, IssueID) " _
           & "VALUES ('" strDoc & "'," & rsIn!IssueID  & ")"
         CurrentDB.Execute strSQL, dbFailOnError
      End If
   Next

   rsIn.MoveNext
Loop

EDIT re COMMENTS

If the DocIDs are of a fixed length, you could consider something on these lines:

SELECT Sequence.Seq
       , ImportTable.IssueID
       , Mid(Replace([References],"<cr>",""),[seq],6) AS Docs
FROM Sequence, ImportTable
WHERE ([seq]+5) Mod 6=0) 
AND   Mid(Replace([References],"<cr>",""),[seq],6))<>"" 
AND   Mid(Replace([References],"<cr>",""),[seq],6)) 
      Not In (SELECT DocID FROM Docs)

You will need a sequence table with integers from 1 to at least max length of Reference.

Remou 2009-05-27 16:06:50

That's the basic idea, although:a) I was hoping to solve this with SQL syntax. I feel like there should be some nested query that could do what I want, but I can't quite get my head around it.b) Ultimately I want to update, not create the link table. I could probably use your code to create a temp table that would be used in an update query.Thanks for the input. Best option yet.

JeffP 2009-05-27 16:26:38

The SQL Line starting "INSERT INTO DocLinks ..." is intended to update the table, it is an append query. There is no form of nested query in Access that will work for you, as far as I know.

Remou 2009-05-27 16:39:03

I have thought about this and have outlined a further suggestion.

Remou 2009-05-28 09:07:29

Answer 5

A:

This can easily be done in SQL. I have written a TVF (table-valued function) specifically for line-splitting text that demonstrates how:

    ALTER function [dbo].[fnSplit3]( 
                @parameter varchar(Max)                -- the string to split
                , @Seperator Varchar(64)        -- the string to use as a seperator
        ) 
        RETURNS @Items TABLE(
                ID INT                                                -- the element number
                , item VARCHAR(8000)                -- the split-out string element
                , OffSet int                                -- the original offest
                --( not entirley accurate if LEN(@Seperator) > 1 because of the Replace() )
        ) 
AS
BEGIN 
/*
"Monster" Split in SQL Server 2005 
 From Jeff Moden, 2008/05/22

BYoung, 2008/06/18: Modified to be a Table-Valued Function
                    And to handle CL/LF or LF-only line breaks

Test: (scripts all procs & views in master)
    Select Lines.Item
     From Master.sys.syscomments C
      CROSS APPLY dbo.fnSplit3(C.text, char(13)+char(10)) Lines
     Order by C.ID, Lines.ID

Test2: (scripts all triggers in your database)
    Select Lines.Item
     From sys.sql_modules M
      Join sys.objects O on O.object_id = M.object_id
      CROSS APPLY dbo.fnSplit3(M.definition, char(13)+char(10)) Lines
     Where O.Type = 'TR' 
     Order by O.create_date, Lines.ID
*/
Declare @Sep char(1)
Set @Sep = char(10)        --our seperator character (convenient, doesnt affect performance)
--NOTE: we make the @Sep character LF so that we will automatically
-- parse out rogue LF-only line breaks.

--===== Add start and end seprators to the Parameter so we can handle
        -- all the elements the same way
        --  Also change the seperator expressions to our seperator
        -- character to keep all offsets = 1
SET @Parameter = @Sep+ Replace(@Parameter,@Seperator,@Sep) +@Sep
-- This reduces run-time about 10%

;WITH cteTally AS
(--==== Create a Tally CTE from 1 to whatever the length
        -- of the parameter is
 SELECT TOP (LEN(@Parameter))
        ROW_NUMBER() OVER (ORDER BY t1.object_id) AS N
  FROM Master.sys.system_Columns t1
   CROSS JOIN Master.sys.system_Columns t2
)
INSERT into @Items
        SELECT ROW_NUMBER() OVER (ORDER BY N) AS Number,
                SUBSTRING(@Parameter, N+1, CHARINDEX(@Sep, @Parameter, N+1)-N-1) AS Value
                , N+1
         FROM cteTally
         WHERE N < LEN(@Parameter)
          AND SUBSTRING(@Parameter, N, 1) = @Sep --Notice how we find the seperator

        Return 
END

In order to use this with your current table & data do this:

SELECT Issues.IssueID, Lines.Item as Reference
 From Issues
  Cross Apply dbo.fnSplit3(Issues.Reference, char(13)) Lines
 Order By IssueID, Reference

RBarryYoung 2009-05-28 16:54:38

Gah - that seems a bit like overkill here. See #6 above for my pure SQL solution. Now I'm only stuck on how to implement this in Access.

JeffP 2009-05-28 18:04:29

Answer 6

A:

I'm having problems coming up with a set-based SQL solution here. I've done this kind of thing before, I had to refresh my memory somewhat, but I'm running into a problem. I think it's an issue (feature/bug?) with the engine but I could be doing something daft. Perhaps someone intimate with Jet/ACE and who can read VBA can take a look at the code at the end of this answer and hopefully take this forward...?

The basic approach is to use a Sequence table of integers with the MID() expression to parse the data column (which I've renamed to MyReferences because REFERENCES is a SQL keyword).

Here's some MS Access VBA to recreate the test tables/data using SQL DDL/DML. Notice the first SELECT query returns sub-strings and star- and end delimiters; obviously, we're looking for the row(s) where both delimiters are the delimiting character, CHR(13) in this case. The second SELECT query merely adds search conditions for the desired delimiters but errors with 'Invalid procedure call'; this happens when the MID() expression is called using invalid parameter values e.g.

SELECT MID('A', 0, 0)

I guess what is happening is the optimizer is not using the subquery as a 'shortcut' and instead is evaluating the MID() expression before the search conditions from Sequence table. If so it's a bit dumb and I can't think of a way of forcing the order of evaluation.

So, is is my or the engine at fault here?

Sub main()

  Dim sql As String

  sql = _
      "DROP TABLE ImportTable;"

  On Error Resume Next  ' Table may not exist
  CurrentProject.Connection.Execute sql
  On Error GoTo 0

  sql = _
      "DROP TABLE Sequence;"

  On Error Resume Next  ' Table may not exist
  CurrentProject.Connection.Execute sql
  On Error GoTo 0

  sql = _
      "CREATE TABLE ImportTable ( " & _
      "IssueID INTEGER NOT NULL UNIQUE, MyReferences VARCHAR(90) NOT NULL);"

  CurrentProject.Connection.Execute sql

  sql = _
      "INSERT INTO ImportTable VALUES (1234, 'DocID1' & Chr(13) & 'DocID22' & Chr(13) & 'DocID3');"

  CurrentProject.Connection.Execute sql

  sql = _
      "CREATE TABLE Sequence (seq INTEGER NOT NULL UNIQUE);"

  CurrentProject.Connection.Execute sql

  sql = _
      "INSERT INTO Sequence VALUES (-1);"

  CurrentProject.Connection.Execute sql

 sql = _
        "INSERT INTO [Sequence] (seq) SELECT Units.nbr + Tens.nbr" & _
        " FROM ( SELECT" & _
        " nbr FROM ( SELECT 0 AS nbr FROM [Sequence] UNION" & _
        " ALL SELECT 1 FROM [Sequence] UNION ALL SELECT 2 FROM" & _
        " [Sequence] UNION ALL SELECT 3 FROM [Sequence] UNION" & _
        " ALL SELECT 4 FROM [Sequence] UNION ALL SELECT 5 FROM" & _
        " [Sequence] UNION ALL SELECT 6 FROM [Sequence] UNION" & _
        " ALL SELECT 7 FROM [Sequence] UNION ALL SELECT 8 FROM" & _
        " [Sequence] UNION ALL SELECT 9 FROM [Sequence] ) AS" & _
        " Digits ) AS Units, ( SELECT nbr * 10 AS nbr FROM" & _
        " ( SELECT 0 AS nbr FROM [Sequence] UNION ALL SELECT" & _
        " 1 FROM [Sequence] UNION ALL SELECT 2 FROM [Sequence]" & _
        " UNION ALL SELECT 3 FROM [Sequence] UNION ALL SELECT" & _
        " 4 FROM [Sequence] UNION ALL SELECT 5 FROM [Sequence]" & _
        " UNION ALL SELECT 6 FROM [Sequence] UNION ALL SELECT" & _
        " 7 FROM [Sequence] UNION ALL SELECT 8 FROM [Sequence]" & _
        " UNION ALL SELECT 9 FROM [Sequence] ) AS Digits )" & _
        " AS Tens;"

  CurrentProject.Connection.Execute sql

  sql = _
      "SELECT DT1.IssueID, DT1.parsed_text, DT1.delimiter_1, DT1.delimiter_2 " & _
      "FROM ( " & _
      "SELECT I1.IssueID, MID(I1.MyReferences, S1.seq, S2.seq - S1.seq - LEN(CHR(13))) AS parsed_text, " & _
      " MID(CHR(13) & I1.MyReferences & CHR(13), S1.seq, LEN(CHR(13))) AS delimiter_1, " & _
      " MID(CHR(13) & I1.MyReferences & CHR(13), S2.seq, LEN(CHR(13))) AS delimiter_2 " & _
      "FROM ImportTable AS I1, Sequence AS S1, Sequence AS S2 " & _
      "WHERE S1.seq < S2.seq " & _
      "AND S2.seq - S1.seq - LEN(CHR(13)) > 0 " & _
      "AND S1.seq BETWEEN 1 AND LEN(CHR(13)) + LEN(I1.MyReferences) + LEN(CHR(13)) " & _
      "AND S2.seq BETWEEN 1 AND LEN(CHR(13)) + LEN(I1.MyReferences) + LEN(CHR(13)) " & _
      ") AS DT1;"

  Dim rs As ADODB.Recordset
  Set rs = CurrentProject.Connection.Execute(sql)

  MsgBox rs.GetString

  sql = _
      "SELECT DT1.IssueID, DT1.parsed_text, DT1.delimiter_1, DT1.delimiter_2 " & _
      "FROM ( " & _
      "SELECT I1.IssueID, MID(I1.MyReferences, S1.seq, S2.seq - S1.seq - LEN(CHR(13))) AS parsed_text, " & _
      " MID(CHR(13) & I1.MyReferences & CHR(13), S1.seq, LEN(CHR(13))) AS delimiter_1, " & _
      " MID(CHR(13) & I1.MyReferences & CHR(13), S2.seq, LEN(CHR(13))) AS delimiter_2 " & _
      "FROM ImportTable AS I1, Sequence AS S1, Sequence AS S2 " & _
      "WHERE S1.seq < S2.seq " & _
      "AND S2.seq - S1.seq - LEN(CHR(13)) > 0 " & _
      "AND S1.seq BETWEEN 1 AND LEN(CHR(13)) + LEN(I1.MyReferences) + LEN(CHR(13)) " & _
      "AND S2.seq BETWEEN 1 AND LEN(CHR(13)) + LEN(I1.MyReferences) + LEN(CHR(13)) " & _
      ") AS DT1 " & _
      "WHERE DT1.delimiter_1 = CHR(13) " & _
      "AND DT1.delimiter_2 = CHR(13);"

  Set rs = CurrentProject.Connection.Execute(sql)

  MsgBox rs.GetString

End Sub

FWIW here's a PROCEDURE I wrote years ago for parsing a delimited list into a table. It seems to work OK for values up to 255 characters; any more and you get a very nasty ACE/Jet engine error. Again, I don't see what the problem is other than the engine can't cope! Anyhow, my point is that this works (for small values) and I can't figure out why I can't adapt it to the problem at hand:

CREATE PROCEDURE ListToTable
(
   delimted_text MEMO,
   delimiter VARCHAR(4) = ','
)
AS
SELECT MID(I1.input_string, S1.seq, MIN(S2.seq) - S1.seq - LEN(delimiter)) AS param
  FROM
      (
       SELECT DISTINCT delimted_text AS input_string
         FROM Sequence AS S3
        WHERE S3.seq BETWEEN 1 AND LEN(delimted_text)
      ) AS I1, Sequence AS S1, Sequence AS S2
 WHERE MID(delimiter & I1.input_string & delimiter, S1.seq, LEN(delimiter)) = delimiter
       AND MID(delimiter & I1.input_string & delimiter, S2.seq, LEN(delimiter)) = delimiter
       AND S1.seq < S2.seq
       AND S1.seq BETWEEN 1 AND LEN(delimiter) + LEN(delimted_text) + LEN(delimiter)
       AND S2.seq BETWEEN 1 AND LEN(delimiter) + LEN(delimted_text) + LEN(delimiter)
 GROUP 
    BY I1.input_string, S1.seq
HAVING LEN(MID(I1.input_string, S1.seq, MAX(S2.seq) - S1.seq - LEN(delimiter))) > 0;

onedaywhen 2009-06-02 13:40:15

Answer 7

A:

I think using the word "parse" in the title has confused the crap out of everyone. The bug in Access was that a correlated query performed on a query (instead of a table) causes a hang. So instead, I created a temporary table that ads the References column (with the multi-line text) to the Issues table so I have access to the other fields. The final query creates the link table described above, along with the DocID and IssueID for reference:

SELECT Q1.IssueID, Q1.IssueKey, D1.DocKey, D1.DocID
FROM Issues AS Q1, Documents AS D1
WHERE Q1.IssueID in 
  (SELECT  Q2.IssueID FROM Issues AS Q2 WHERE Q2.References LIKE ("*" & D1.DocID & "*"));

The inner select pulls the list of issues which has a given document in the references column. The outer select performs this for each document, resulting in the aggregate list.

JeffP 2009-06-02 17:49:34

Rather than "confused the crap", perhaps everyone was giving you the benefit of the doubt :) You see, your column named 'References' violates first normal form (1NF) by containing non-scalar data. Because normalization is A Good Thing, we naturally assumed you wanted to scub your data.

onedaywhen 2009-06-03 07:06:46

...because working with multi-valued data (NFNF) in SQL is a pain. Sure, ACE and Access2007 introduced multi-valued data types but no new expressions for working with multi-valued data. We gave you the benefit of the doubt by thinking you wanted to fix your data by normalizing it :)

onedaywhen 2009-06-03 07:23:58

I am indeed trying to normalize the data, but I don't think I made the assumptions clear enough. See my comment below.

JeffP 2009-06-03 13:02:31

Further: What surprised me about the other answers is that they all seemed to iterate over the DB and parse the field, rather than searching it. Hence my conclusion that my use of the word "parse" in the title misdirected...

JeffP 2009-06-03 13:10:33

"I am indeed trying to normalize the data" -- well you don't need you references..LIKE..DocID predicate, then! Instead, your first task IMO is to change the NFNF column 'References' into a scalar column. In RM theory this violation of 1NF is referred to as 'repeating groups'. See: http://en.wikipedia.org/wiki/1NF#Example_3:_Repeating_groups_within_columns.

onedaywhen 2009-06-04 07:55:55

You are missing the use case I described at the beginning of the question: the References column is user input coming from an excel spreadsheet. I do not have the option to force this data to be entered directly in the database. This spreadsheet will be updated periodically and I need to perform some complex queries (which can't be done in excel) based on that data.This whole effort is a prototype to justify changes to the real back-end (Oracle). Once the changes are approved, I'll be able to eliminate the use of excel for live data.

JeffP 2009-06-04 12:45:33

Answer 8

+1 A:

This has been chosen as the answer:

Q2.References LIKE ("*" & D1.DocID & "*"));

However, I don't think this is safe.

Consider if one of the value for the column named 'References' contained this data:

DocID1<cr>DocID999<cr>DocID3

and a value DocID = 9 existed in the other table.

The problem here is that

"DocID1<cr>DocID999<cr>DocID3" LIKE "*" & "DocID9" & "*"

will evaluate to TRUE, which is probably undesirable.

To address this problem, I think the values in the search/join condition should be made safe by surrounding the values using the delimiter character e.g.

(CHR(13) & Q2.References & CHR(13)) LIKE ("*" & CHR(13) & D1.DocID & CHR(13) & "*"));

onedaywhen 2009-06-03 07:03:09

Assumption I didn't make clear: all of the DocIDs are actually unique 5 digit numbers formatted as text, so the case you describe is not possible. This is a good point to consider in a general solution, and I'd vote this up as a "good to know" if I had any rep :)

JeffP 2009-06-03 13:07:41

ansaurus

tags:

views:

answers:

SQL for parsing multi-line data?

related questions