views:

26

answers:

1

I have this xml document that is provided as a data feed (right off the bat I can not modify the source of the data feed) and i import it into excel with the xml import. there is no schema that comes with this xml so i get a table that ends up having a whole bunch of duplicates for an identifier, because of the unique values spread throughout the spreadsheet.

XML in XLS

Col1(IDnum)   Col2(name)   Col3(Type)   Col4(Category)   Col(etc)
=================================================================
0011           Item          01                           6B
0011           Item                        xxj9           7B
0011           Item                        xxj9
0011           Item          02
0011           Item          01            xxj9           6B
0012          etc

I need to delete all rows where columnA string/number matches while concatenating all potential values from Col3, Col4 & Col5 together so it looks like this

Col1(IDnum)   Col2(name)   Col3(Type)   Col4(Category)   Col(etc)
=================================================================
0011           Item          01, 02          xxj9          6B, 7B

what visual basic method would allow me to accomplish this?

thanks

+1  A: 

Perhaps:

Dim cn As Object
Dim rs As Object
Dim strFile As String
Dim strCon As String
Dim strSQL As String
Dim s As String
Dim i As Integer, j As Integer

''This is not the best way to refer to the workbook
''you want, but it is very conveient for notes
''It is probably best to use the name of the workbook.

strFile = ActiveWorkbook.FullName

''Note that if HDR=No, F1,F2 etc are used for column names,
''if HDR=Yes, the names in the first row of the range
''can be used.
''This is the Jet 4 connection string, you can get more
''here : http://www.connectionstrings.com/excel

strCon = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & strFile _
    & ";Extended Properties=""Excel 8.0;HDR=Yes;IMEX=1"";"

''Late binding, so no reference is needed

Set cn = CreateObject("ADODB.Connection")
Set rs = CreateObject("ADODB.Recordset")


cn.Open strCon

''Name and Type are reserved words:
''http://office.microsoft.com/en-us/access/hp010322491033.aspx,
''so square brackets are required. Brackets would also be
''needed if the column name had a space in it.
''Name is just about the worst thing to call something that will
''be used with VBA or SQL :)

''This selects the fields (columns) by name and groups on all
''fields. If some fields (columns) contain unimportant data or
''data that does not need to be concatenated, you can use, say
''First ... SELECT IDNum First(Category)
''You can also simply ORDER BY IdNum, rather than GROUP BY
''it may be a little slower.

strSQL = "SELECT IDNum, [Name], [Type], Category, Etc " _
       & "FROM [Sheet1$] " _
       & "WHERE IDNum Is Not Null " _
       & "GROUP BY IDNum, [Name], [Type], Category, Etc"

''It is best to use column names, but at a pinch and for a
''once-off you coukd use:

''strSQL = "SELECT * " _
''       & "FROM [Sheet1$] " _
''       & "WHERE IDNum Is Not Null " _
''       & "ORDER BY IDNum"

''Open the recordset for more processing
''Cursor Type: 3, adOpenStatic
''Lock Type: 3, adLockOptimistic
''Not everything can be done with every cirsor type and
''lock type. See http://www.w3schools.com/ado/met_rs_open.asp

rs.Open strSQL, cn, 3, 3


''Pick a suitable empty worksheet for the results

With Worksheets("Sheet2")

    ''Fill headers into the first row of the worksheet

    For i = 0 To rs.Fields.Count - 1
        .Cells(1, i + 1) = rs.Fields(i).Name
Next

    ''Counter for Fields/Columns in Recordset and worksheet
    ''Row one is used with tiles, so ...
    i = 1

    ''Working with the recordset ...

    Do While Not rs.EOF

        ''While there are records ...
        If rs.EOF Then Exit Do

        ''Store the IDNum to a string (if it is a long,
        ''change the type) ...

        s = rs!IDNum

        ''(Counter)
        i = i + 1

        ''(First cell in sheet)
        .Cells(i, 1) = rs!IDNum

        ''Working with the saved IDNum string ...
        Do While s = rs!IDNum

           ''And the fields (columns) in the Recorset ...
           ''(-1 because we already have IDNum, which is zero)

           For j = 1 To rs.Fields.Count - 1

                ''If the cell (row=i, column=j) that contains Recordset field (column)
                ''Does not already have this data (Instr) and the value in the recordset
                ''is not Null, ZLS, Space filled then ...

                If InStr(.Cells(i, j + 1), rs(j)) = 0 _
                   And Trim(rs(j) & vbNullString) <> vbNullString Then

                     ''If the cell already has a value ...
                     If Not IsEmpty(.Cells(i, j + 1)) Then
                       ''add a comma delimiter
                       .Cells(i, j + 1) = .Cells(i, j + 1) & ", "
                     End If

                    ''Add the value from the recordset to the cell (concatenate).
                    .Cells(i, j + 1) = .Cells(i, j + 1) & rs(j)
                End If
            Next

        ''Keep going for this IDNum
        rs.MoveNext

        ''But stop if at the end of the recordset.
        If rs.EOF Then Exit Do
    Loop
    ''Keep going for this recordset
Loop

''Finished with the sheet
End With

''Tidy up
rs.Close
Set rs = Nothing
cn.Close
Set cn = Nothing
Remou
Remou that is awesome. Would you please consider breaking this down and explaining it section by section??? please! i want to understand it and not just repeat it, and quite honestly i don't. either way, thanks as always!
Justin
also the actual thing i am trying to apply this to has about 21 different column/fields, and I know to add them to the sql string, but what about the output. so each letter above declared (i,j,s) represent the "extra columns" from teh example....or the place holders you apply those columns to?
Justin
I have added some comments.
Remou
thanks very much Remou!
Justin
so when i am trying this revision out, i am getting a run time error that states "object does not support this property or method". though it is not pointing to a specific portion of code?? os there an additional reference i need to set that i missed?
Justin
nevermind man...typo on my part....debugged stepped through and found it...THANKS REMOU!! this is exactly what i needed!
Justin