views:

723

answers:

2

I have a list of pdf files (from daily processing), some with date stamps of various formatting, some without.

Example:

$f = @("testLtr06-09-02.pdf", "otherletter.pdf","WelcomeLtr043009.pdf")

I am trying to remove the datestamp by stripping out dashes, then replacing any consecutive group of numbers (4 or more, I may change this to 6) with the string "DATESTAMP".

So far I have this:

$d =  $f | foreach {$_ -replace "-", ""} | foreach { $_ -replace ([regex]::Matches($_ , "\d{4,}")), "DATESTAMP"}
echo $d

The output:

testLtrDATESTAMP.pdf
DATESTAMPoDATESTAMPtDATESTAMPhDATESTAMPeDATESTAMPrDATESTAMPlDATESTAMPeDATESTAMPtDATESTAMPtDATESTAMPeDATESTAMPrDATESTAMP.DATESTAMPpDATESTAMPdDATESTAMPfDATESTAMP
WelcomeLtrDATESTAMP.pdf

It works fine if the file has a datestamp but it seems to be freaking out the -replace and inserting DATESTAMP after every character. Is there a way to fix this? I tried to change it to a foreach loop but I couldn't figure out how to get true/false from regex.

Thanks in advance.

+2  A: 
$_ -replace ([regex]::Matches($_ , "\d{4,}")), "DATESTAMP"

Means in $_ replace every finding of ([regex]::Matches($_ , "\d{4,}")) with "DATESTAMP".

As in a filename with no timestamp (or at least 4 consecutive numbers) there is no match, it returns "" (an empty string).

Thus every empty string gets replaced with DATESTAMP. And such a empty string "" sits at the start of the string and after every other character.

Thats why you get this long string with every character surrounded by DATESTAMP.


To check if there even exists a \d{4,} in your string you should able to use

[regex]::IsMatch($_, "\d{4,}")


I'm no Powershell user but this line alone should do the job. But I'm not sure about being able to use the if in a pipeline and wether or not the assignment and the echo $d are needed

$f | foreach-object {$_ -replace "-", ""} | foreach-object {if ($_ -match "\d{4,}") { $_ -replace "\d{4,}", "DATESTAMP"} else { $_ }}
jitter
Thanks! I think that was what I was looking for in the loop version. Is there a way to do something akin to case or if clause in order to use IsMatch in the single statement without switching to a foreach ($item in $f)
wtjones
You can also use the -match operator instead of [regex]::matches. Should be a little shorter and more readable.
Joey
Updated my answer to what I think should work. But be warned I'm no powershell user and the if-else statement in the pipeline might not be allowed.
jitter
You can do an if/else in a pipeline if it is part of a script block used by foreach-object. I'll edit your pipeline to show the correct syntax. By Shay Levy's answer really takes advantage of the regular expression support native in PowerShell.
Steven Murawski
That was basically what I was looking for. I was actually very close but I think I messed up the else { $_ } part. Wish I could accept two answers since Shay's is very nice as well.Thanks everyone!
wtjones
+4  A: 

You can simply do:

PS > $f -replace "(\d{2}-){2}\d{2}|\d{4,}","DATESTAMP"
testLtrDATESTAMP.pdf
otherletter.pdf
WelcomeLtrDATESTAMP.pdf
Shay Levy
This works perfectly as well. I need to brush up on my Regex to make some sense out of it, however :)Thanks! I'll bump this as soon as I get the required rep points.
wtjones