views:

616

answers:

4

I have a row of data as follows:

            header1      header2      header3      header4      header5
row key     datavalue1   datavalue2   datavalue3   datavalue4   datavalue5....

so basically, I have a denormalized data set where the datavalues may or may not be empty on a row-by-row basis. I need to normalize them.

ie

12345678    NULL         10           3            NULL         14

would become:

12345678   header2   10
12345678   header3   3
12345678   header5   14

I could do this by using a paste special transform, but I have thousands of rows and I'd need to make sure that I get the right row key for each. furthermore, each row has a bunch of descriptives associated with it that I need copied over with each datavalue.

What is the easiest way to convert each row of columns such that I have multiple rows of a single column with all non-empty datavalues plus the associated datavalue reference? I need to be able to pivot the dataset.

A: 

I would create a VBA macro that loops through each row and output the data to another page. This would let you create your pivot table in the new page once the data has been outputed.

Not sure how familiar you are with VBA, but this could pretty easily be done by loading the data into an array (or collection of objects if you really want to do it correctly) and writing it back out.

Here is a link to a good VBA document.

http://social.msdn.microsoft.com/Forums/en/isvvba/thread/d712dbdd-c876-4fe2-86d2-7d6323b4262c

Edit

Please note this is not meant to be a fully working solution but really a generic framework to help you in the right direction.

As a generic example that does a lot of what you would need to do (not the best way, but probably the easiest for a beginer), something like this should get you started, although it is hard to say without seeing more of your worksheet.

Sub RowsToColumns ()
  Application.ScreenUpdating = False
  Dim srcWrkSheet As Worksheet
  Dim destWrkSheet As Worksheet
  Dim excelData as pExcelData
  Dim srcRowNumber As Long
  Dim srcRolNumber As Long
  Dim destRowNumber As Long
  Dim destColNumber As Long

  SET srcWrkSheet = Sheets("YourSourceWorkSheetName")
  SET destWrkSheet = Sheets("YourDestinationWorkSheetName")

  srcRowNumber = 1
  srcColNumber = 1
  destRowNumber = 1
  destColNumber = 1

  'Loop until blank row is encountered in column 1
  Do
    destWrkSheet.Cells(destRowNumber ,1).Value = "Header 1 " & srcWrkSheet.Cells(srcRowNumber,srcColNumber )
    destWrkSheet.Cells(destRowNumber ,1).Value = "Header 2 " & srcWrkSheet.Cells(srcRowNumber ,srcColNumber)

    srcRowNumber = srcRowNumber + 1
    srcColNumber = srcColNumber + 1
    destRowNumber = destRowNumber  + 1
  Loop Until srcWrkSheet .Cells(rowNumber, 1).value = ""

End Sub
Irwin M. Fletcher
ugh. I was really hoping to avoid VBA. I have zero familiarity with VBA. Will read the link though.
jhc
If you turn on the record macro function and perform the task manualy once and post the code, I will try to trim it up real quick or at least point you in the right direction.
Irwin M. Fletcher
"I was really hoping to avoid VBA." - I felt that way for a long time, but the truth is, learning VBA is wonderful for doing excactly what you want to do. Excel functions usually take a lot of hacking to get just right.
Ben McCormack
@Irwin, I think your code has lots of issues: you need to initialize row numbers and column numbers to be 1. You really should use nested loops for both rows and columns. You need a separate counter for the rows on your destination worksheet, else you'll only write one value per key. Good idea for VBA use, though...just make sure it's good VBA use :)
Ben McCormack
@bmccormack, you are absolutely correct. I wrote that code in the editor as a framework to work off of, but I should have used more care. I am correcting it now.
Irwin M. Fletcher
@bmccormak, I would have likely done this myself by populating a class or at a minimum a custom type. By running this, it should provide (now) some of what VBA can do.
Irwin M. Fletcher
My thanks to everyone who replied. I have a lot of information here to process and I can't guarantee I won't be back with more questions! Thanks again!
jhc
+1  A: 

If you have five "header" columns, enter these formulas

H1: =OFFSET($A$1,INT((ROW()-1)/5)+1,0)
I1: =OFFSET($A$1,0,IF(MOD(ROW(),5)=0,5,MOD(ROW(),5)))
J1: =INDEX($A$1:$F$9,MATCH(H1,$A$1:$A$9,FALSE),MATCH(I1,$A$1:$F$1,FALSE))

Copy H1:J?? and paste special values over the top. Sort on column J and delete anything that's a zero. If you have legitmate zeros in the data, then you first need to replace blank cells with some unique string that you can then delete later.

If you have more columns, then replace the '5' in all the above formulas with whatever number you have.

Dick Kusleika
Thank you. Your formulas are the simplest solution posted (and one I vaguely understand!) so I'm planning to start with this approach before digging into the depths of VBA.
jhc
THANK YOU!!!!!! These formulas work wonderfully. I had to make some mods and figure out exactly what was going on, but they saved me from having to learn VBA on the fly!
jhc
Is there a way to modify the formula in J1 to reference multiple columns in the MATCH? I am finding instances where the same person can have multiple leaves over different periods of time. I'd like to match on 2 columns so that I can be sure to pull the right data; otherwise, I get double-counting of the first leave type.
jhc
A: 

Let's look at a possible solution in VBA. I think this will really help. Here are a few things you should know about my code.

  • You'll need to put this code in a code module in VBA (the same place where Macros go)
  • Look at what I named the sheets: Original and Normalized. You'll either want to change your sheet names or the code
  • I'm checking for values with a string field of NULL. If the cell is empty, you'll want to check for If IsEmpty(rngCurrent.Value) Then instead.

'

Sub NormalizeSheet()
Dim wsOriginal As Worksheet
Dim wsNormalized As Worksheet
Dim strKey As String
Dim clnHeader As Collection
Dim lngColumnCounter As Long
Dim lngRowCounterOriginal As Long
Dim lngRowCounterNormalized As Long
Dim rngCurrent As Range
Dim varColumn As Variant

Set wsOriginal = ThisWorkbook.Worksheets("Original")     'This is the name of your original worksheet'
Set wsNormalized = ThisWorkbook.Worksheets("Normalized") 'This is the name of the new worksheet'
Set clnHeader = New Collection

wsNormalized.Cells.ClearContents        'This deletes the contents of the destination worksheet'

lngColumnCounter = 2
lngRowCounterOriginal = 1
Set rngCurrent = wsOriginal.Cells(lngRowCounterOriginal, lngColumnCounter)

' We'll loop through just the headers to get a collection of header names'
Do Until IsEmpty(rngCurrent.Value)
    clnHeader.Add rngCurrent.Value, CStr(lngColumnCounter)
    lngColumnCounter = lngColumnCounter + 1
    Set rngCurrent = wsOriginal.Cells(lngRowCounterOriginal, lngColumnCounter)
Loop

'Here we'll reset our Row Counter and loop through the entire data set'
lngRowCounterOriginal = 2
lngRowCounterNormalized = 1
lngColumnCounter = 1

Do While Not IsEmpty(wsOriginal.Cells(lngRowCounterOriginal, lngColumnCounter))

    Set rngCurrent = wsOriginal.Cells(lngRowCounterOriginal, lngColumnCounter)
    strKey = rngCurrent.Value ' Get the key value from the current cell'
    lngColumnCounter = 2

    'This next loop parses the denormalized values for each row'
    Do While Not IsEmpty(wsOriginal.Cells(lngRowCounterOriginal, lngColumnCounter))
        Set rngCurrent = wsOriginal.Cells(lngRowCounterOriginal, lngColumnCounter)

        'We're going to check to see if the current value'
        'is equal to NULL. If it is, we won't add it to'
        'the Normalized Table.'
        If rngCurrent.Value = "NULL" Then
            'Skip it'
        Else
            'Add this item to the normalized sheet'
            wsNormalized.Range("A" & lngRowCounterNormalized).Value = strKey
            wsNormalized.Range("B" & lngRowCounterNormalized).Value = clnHeader(CStr(lngColumnCounter))
            wsNormalized.Range("C" & lngRowCounterNormalized).Value = rngCurrent.Value
            lngRowCounterNormalized = lngRowCounterNormalized + 1
        End If

        lngColumnCounter = lngColumnCounter + 1
    Loop
    lngRowCounterOriginal = lngRowCounterOriginal + 1
    lngColumnCounter = 1    'We reset the column counter here because we're on a new row'
Loop



End Sub
Ben McCormack
A: 

Seems to me that part of what you are trying to do is to "de-pivot" a pivot table. I've found this tip to be a tremendous help when I've had to do similar tasks: http://spreadsheetpage.com/index.php/tip/creating_a_database_table_from_a_summary_table/

Note that in Excel 2007, you can get to the old Excel 2003 pivot table wizard using the keystrokes Alt+D, P .

f106dart