tags:

views:

91

answers:

3

My google-fu's failing me again. The information is (probably) out there, but I can't find it. I know UNIX like the back of my hand, use cygwin etc. however with the increased availability of Powershell on servers, and (on production servers at least) the difficulty at getting cygwin in place, I'm attempting to pick up Powershell. If nothing else, it's another weapon in my arsenal.

Essentially, I'm looking for the Powershell equivalent of the awk command:

awk '$9 == "503" { print $0 }' < access_log

For those that don't know awk, this is basically comparing field 9 of the input file, and then executing the block (this is an apache access log, so it returns me all lines from access_log where the HTTP status code returned is 503). Awk handles the split of the file into fields based on whitespace automagically; $0 is the entire line (unadultered), with individual fields going into $1, $2, ... [etc].

I know I can use split like this:

cat access_log | %{ $_.split() }

which splits the incoming lines into an array, but I can't work out from here how to use select-object or where-object to select (and output) whole lines based on a given field.

The alternative is select-string but I can't seem to see any way to pass in an expression along the lines of %{ $_.split()[8] -eq "503" }. (I note powershell is zero-based, hence looking at field 8).

I'm not sure if I'm missing something obvious here, and I've not found the right google-fu to give me the info (so wouldn't be suprised if this is a dupe somewhere).

Cheers for any help :-)

+1  A: 

Yeah, where-object (alias ?) is better in this case:

cat access_log | ?{($_ -split '\s+',0,'regexmatch')[8] -eq 503} 

Note that the .NET split method will create empty string entries for consecutive spaces so I'm using the -split operator in PowerShell 2.0 to avoid this.

My regex is weak in this area but I imagine there is a way to get the 9th field using a regex (more easily than the brain dead approach below - anybody??):

Updated regex pattern per Johannes' comment:

cat access_log | Select-String '^\s*(?:\w+\s+){8}503'
Keith Hill
`^\s*(?:\w+\s+){8}503`
Joey
Ta for pointing out that split() won't normalise consecutive whitespace. Unfortuantly I'm only on PS 1... this might give me a reason to upgrade to 2 though.
Chris J
On PowerShell 1, IIRC you can use another overload of split e.g. `"a b`tc".Split(" `t", [StringSplitOptions]::RemoveEmptyEntries)`
Keith Hill
Powershell likes being verbose doesn't it :-) Doesn't quite match the simple elegance of awk; but it does do the job without having to install extra stuff. Ta for the hints on the overload. Hopefully all will come in useful :-)
Chris J
In the `[StringSplitOptions]::RemoveEmptyEntries` case, it is .NET that is being verbose since that is a call into the underlying .NET Framework. In theory, you could provide just `'RemoveEmptyEntries'` for that parameter to System.String.Split but the last time I tried that the method binder got confused and couldn't find the right overload of the Split method.
Keith Hill
+1  A: 

Found the answer - although still happy to see if there's alternative ways to do this [so I'll leave this unanswered for a couple of days to see if someone else has alternative methods]. The method I've found is:

cat access_log | where-object { $_.split()[8] -eq "503" }

which can be abbreviated to:

cat access_log | where { $_.split()[8] -eq 503 }

So it was a case of getting things in the right order. I was along the right lines originally, but sticking too many pipes in the way.

Chris J
You could use shorter aliases: `gc` instead of `cat` and `?` instead of `where` – but generally you should only use aliases for interactive use, not within scripts that might be deployed on other computers.
Joey
A: 

As I understand from the code you have posted, you are looking to find lines whose 9th fields are '503' and then write out the 1st field of those lines? If so:

Get-Content -Path "access_log" | ForEach-Object {
    if ($_ -match '(?<Field0>\d+)\s(?:\d+\s){7}503')
    {
        Write-Host $Matches["Field0"]
    }
}

EDIT:

An example using Select-String (better than my previous one):

Select-String -Path "access_log" -Pattern '(?<Field0>\d+)\s(?:\d+\s){7}503' | ForEach-Object {
    Write-Host $_.Matches[0].Groups["Field0"]
}
George Howarth
Yikes, I think regex is seriously overkill here.
Joey
But thanks for posting the regex pattern. I figured there was a way to specify repeated groups - should've realized it would be the same way you do it for things like \d{3}. :-)
Keith Hill
@Keith: I would've given you the regular expression if I had enough rep to leave comments :(
George Howarth