views:

196

answers:

1

I have a file that looks like this:
a,1
b,2
c,3
a,4
b,5
c,6
(...repeat 1,000s of lines)

How can I transpose it into this?
a,b,c
1,2,3
4,5,6

Thanks

+4  A: 

Here's a brute-force one-liner from hell that will do it:

PS> Get-Content foo.txt | 
      Foreach -Begin {$names=@();$values=@();$hdr=$false;$OFS=',';
                      function output { if (!$hdr) {"$names"; $global:hdr=$true}
                                        "$values";
                                        $global:names=@();$global:values=@()}} 
              -Process {$n,$v = $_ -split ',';
                        if ($names -contains $n) {output};
                        $names+=$n; $values+=$v } 
              -End {output}
a,b,c
1,2,3
4,5,6

It's not what I'd call elegant but should get you by. This should copy/paste correctly as-is. However if you reformat it to what is shown above you will need put back-ticks after the last curly on both the Begin and Process scriptblocks. This script requires PowerShell 2.0 as it relies on the new -split operator.

This approach makes heavy use of the Foreach-Object cmdlet. Normally when you use Foreach-Object (alias is Foreach) in the pipeline you specify just one scriptblock like so:

Get-Process | Foreach {$_.HandleCount}

That prints out the handle count for each process. This usage of Foreach-Object uses the -Process scriptblock implicitly which means it executes once for each object it receives from the pipeline. Now what if we want to total up all the handles for each process? Ignore the fact that you could just use Measure-Object HandleCount -Sum to do this, I'll show you how Foreach-Object can do this. As you see in the original solution to this problem, Foreach can take both a Begin scriptblock that is executed once for the first object in the pipeline and a End scripblock that executes when there are no more objects in the pipeline. Here's how you can total the handle count using Foreach-Object:

gps | Foreach -Begin {$sum=0} -Process {$sum += $_.HandleCount } -End {$sum}

Relating this back to the problem solution, in the Begin scriptblock I initialize some variables to hold the array of names and values as well as a bool ($hdr) that tells me whether or not the header has been output (we only want to output it once). The next mildly mind blowing thing is that I also declare a function (output) in the Begin scriptblock that I call from both the Process and End scriptblocks to output the current set of data stored in $names and $values.

The only other trick is that the Process scriptblock uses the -contains operator to see if the current line's field name has already been seen before. If so, then output the current names and values and reset those arrays to empty. Otherwise just stash the name and value in the appropriate arrays so they can be saved later.

BTW the reason the output function needs to use the global: specifier on the variables is that PowerShell performs a "copy-on-write" approach when a nested scope modifies a variable defined outside its scope. However when we really want that modification to occur at the higher scope, we have to tell PowerShell that by using a modifier like global: or script:.

Keith Hill
Are most of those `\`` even necessary? From what I see there many of them appear in blocks and those shouldn't need that you tell PowerShell that it continues in the next line (as this can be inferred from the not-yet-finished block).
Joey
If you take out the backticks then copy/paste into the PowerShell console doesn't behave correctly.
Keith Hill
Hmm, copy/paste from SO doesn't behave the same between the preview window and the final post. I've just been testing by using the preview window which is obviously inadequate. In the preview window the newlines make it to the console via copy/paste but they don't in the final post. What a PITA.
Keith Hill
This works perfectly thank you. Can you add a little explanation about what's it's doing?
fenster
+1 for good explaination.
Bratch