views:

402

answers:

4

I have a file that is tab delimited. I would like a powershell script that counts the number of tabs in each line. I came up with this:

${C:\tabfile.txt} |% {$_} | Select-String \t | Measure-Object | fl count

it yields 3, Which is the number of lines in the file.

any pointers to what I'm doing wrong? I would like it to print a single number for each line in the file.

+3  A: 

A couple issues with your code, but they all revolve around grouping / array management / nested loops.

gc test.txt | % { ($_ | select-string `t -all).matches | measure | select count }
  • After reading the text file into lines, you need to wrap the rest of the pipeline into a scriptblock. Otherwise downstream cmdlets cannot distinguish which elements came from the "current" line. The PS pipeline is all about dealing with objects one by one -- there's no concept of nested arrays or iterator state or anything else -- blind enumeration.
  • You need to specify -AllMatches, otherwise select-string will stop as soon as it finds the first match on each line. You then need to get the Matches property from its nominal resultset to get the "inner resultset" of this intra-line matching.
Richard Berg
+1. Nice explanation why his original code didn't work. Probably helps more than just providing a solution :-) (also I'd have done it but I never used Select-String :-))
Joey
I still get confused with when to use the scriptblock. I also noticed that both solutions use gc instead of consuming the file with ${file.txt} is that just a matter of style?
JasonHorner
When using `${...}` you have to put the complete absolute path between the braces, while `Get-Content` allows using relative paths. As for me, I don't have any user files lying around in `C:\` so it'd always be something like `${C:\Users\me\...}` which is cumbersome (ok, I created a `Home:` drive but still, I don't like absolute paths :-). Also `Get-Content` gives you an exception when it can't find something, which sometimes comes in handy in debugging weird failures :-)
Joey
@Jason - my solution doesn't have any more scriptblocks than your attempt, or Johannes' solution for that matter. Our fixes simply move the closing } to the end. When you close the block before piping, you're executing Measure once per tab instead of once per line.
Richard Berg
FYI you don't need to use measure to get the count. The Matches array on MatchInfo has both a Count and Length property.
Keith Hill
+5  A: 

First attempt, not very sophisticated:

gc .\tabfile.txt | % { ($_ -split "`t").Count - 1 }

Utilizing the fact here, that when I split the string at tab characters, I'll get an array with one more item than there are tabs in the line.

Another approach, avoiding splitting the lines:

gc .\tabfile.txt | % { ([char[]] $_ -eq "`t").Count }

Strings can be cast to char[] (also there is the ToCharArray() method), then I am using the fact that comparison operators work differently on collections, by returning all matching items, instead of a boolean. So the comparison there returns an array containing all tabs from the original line from which I just need to get the number of items.

Joey
Casting to char[] and then using -eq to automatically unroll the array is clever. +1
Richard Berg
Trying code golf with Powershell has its merits :-)
Joey
This is probably the "better' answer but the other post answered my original question better. Thanks for your help
JasonHorner
+2  A: 

Another option:

$content = Get-Content file.txt | Out-String
[regex]::matches($content,"\t").count
Shay Levy
+3  A: 

And yet another option if you are running V2.

select-string \t c:\tabfile.txt -All | 
    %{"$($_.matches.count) tabs on $($_.LineNumber)"}
Keith Hill