views:

924

answers:

7

I have a text file containing lines of data. I can use the following powershell script to extract the lines I'm interested in:

select-string -path *.txt -pattern "subject=([A-Z\.]+),"

Some example data would be:

blah blah subject=THIS.IS.TEST.DATA, blah blah blah

What I want is to be able to extract just the actual contents of the subject (i.e. the "THIS.IS.TEST.DATA" string). I tried this:

select-string -path *.txt -pattern "subject=([A-Z\.]+)," | %{ $_.Matches[0] }

But the "Matches" property is always null. What am I doing wrong?

+1  A: 

See these notes on Regular expressions in PowerShell

John D. Cook
+3  A: 

I don't know why your version doesn't work. It should work. Here is an uglier version that works.

$p = "subject=([A-Z\.]+),"
select-string -path *.txt -pattern $p | % {$_ -match $p > $null; $matches[1]}

Edit. Explanation for dant:

-match is a regular expression matching operator:

>"foobar" -match "oo.ar"
True

The > $null just suppresses the True being written to the output. (Try removing it.) There is a cmdlet that does the same thing whose name I don't recall at the moment.

$matches is a magic variable that holds the result of the last -match operation.

dangph
Thanks, that works, but could you explain what you're doing? Particularly the "$_ -match $p > $null" bit.
d4nt
The cmdlet dangph is thinking of is "Out-Null". But you can also cast the whole line to [void]: [void]($_ -match $p)
JasonMArcher
+2  A: 

The problem with the code you are typing is that select-string does not pass down the actual Regex object. Instead it passes a different class called MatchInfo which does not have the actual regex matches information.

If you only want to run the regex once, you will have to roll you're own function which isn't too difficult.

function Select-Match() {
  param ($pattern = $(throw "Need a pattern"), 
         $filePath = $(throw "Need a file path") )
  foreach ( $cur in (gc $filePath)) { 
    if ( $cur -match $pattern ) { 
      write-output $matches[0];
    }
  }
}

gci *.txt | %{ Select-Match "subject=([A-Z\.]+)," $_.FullName }
JaredPar
But why does the MatchInfo.Matches property not work?http://msdn.microsoft.com/en-us/library/microsoft.powershell.commands.matchinfo.matches(VS.85).aspx
dangph
@dangph, I believe that's a bug in the docs. You can verify this by running "gci a *.txt | gm". The resulting type has no Matches property.
JaredPar
JaredPar, that didn't work for me, but I believe you are right. Try this: "gm -inputobject (new-object Microsoft.PowerShell.Commands.MatchInfo)".
dangph
I would guess that they just haven't implemented the Matches property yet. After all, I would expect a class called "MatchInfo" to actually contain info about, uh, matches :))
dangph
@dangph, I would expect that as well but there is clearly a gap between documentation and implementation here
JaredPar
Yes, that does appear to be the case.
dangph
I just check in CTP3, the Matches property is implemented for v2.
JasonMArcher
+2  A: 

Yet another option

gci *.txt | foreach { [regex]::match($_,'(?<=subject=)([^,]+)').value }
Shay Levy
+2  A: 

Having learnt a lot from all the other answers I was able to get what I want using the following line:

gci *.txt | gc | %{ [regex]::matches($_, "subject=([A-Z\.]+),") } | %{ $_.Groups[1].Value }

This felt nice as I was only running the regex once per line and as I was entering this at the command prompt it was nice not to have multiple lines of code.

d4nt
Glad you found a solution. I just checked in v2 and the Matches property works from Select-String. So in the future this will be less painful for you. :)
JasonMArcher
A: 

Another variation, matching 7 digits in a string

echo "123456789 hello test" | % {$_ -match "\d{7}" > $null; $matches[0]}

returns: 1234567

Jeffrey Knight
A: 

In PowerShell V2 CTP3, the Matches property is implemented. So the following will work:

select-string -path *.txt -pattern "subject=([A-Z\.]+)," | %{ $_.Matches[0].Groups[1].Value }
Philippe