tags:

views:

100

answers:

4

I am issuing a sed replace on a line, trying to extract a particular floating-point value, but all that appears to be matched is the right-side of the decimal

Text Line:

63.544: [GC 63.544: [DefNew: 575K->63K(576K), 0.0017902 secs]63.546: [Tenured: 1416K->1065K(1536K), 0.0492621 secs] 1922K->1065K(2112K), 0.0513331 secs]

If I issue s/^.*\([0-9]*\.[0-9]*\): \[Tenured:.*$/\1/, my output is:

.546

I'm wanting the 63.546 out of the line. Why is the very first [0-9]* not matching?

+2  A: 

My feeling is that your .* at the beginning is acting greedy, so it absorbs everything up to the dot, but I could be wrong.

Don't use sed. I gave up on this. perl is a better choice (I was starting to play with it) but the solution with awk beats me. Go for that, unless you really love sed for some particular reason...

Stefano Borini
Was thinking that too, but otherwise I'm thinking wouldn't have matched the right-side of the float? Hmmm...thanks for any help :)
Xepoch
+1 for suggesting Perl (much nicer way to do regex stuff) and also your answer is right -- the ".*" is greedy and swallows everything up to the decimal point.
AAT
A: 

As Stefano pointed out, the pattern is performing a greedy match at the beginning of your text input.

If you can use perl, this command works to match your line on standard input:

perl -e '<STDIN> =~ m/^.*?([\d]+\.[\d]+):\s+\[Ten/ && print "$1\n";'
jheddings
That's the problem. I think that sed does not support the non-greedy.
Stefano Borini
Good call... I just removed that solution from above.
jheddings
Yes, sed does not support said constructs.
Xepoch
@Xepoch did the `[^\d]*` help you?
jheddings
Just tried both of your updates examples, they all produce the same output as I have above without the left-side.
Xepoch
do you really, really need sed ? :)
Stefano Borini
Yes and no. I have a sed script file that parses nested GC logs that was written due to ironically the troubles of so doing in awk. Perl is like speaking a clicking language to me for some reason since I can remember. The syntax and maintaining OPC (other people's code) confuses me. That said, it may be time I dust off the old Camel Book.
Xepoch
Just added a Perl example if it will help get you started.
jheddings
+1  A: 

use awk instead sed. why bother creating complex regex?

$ more file
63.544: [GC 63.544: [DefNew: 575K->63K(576K), 0.0017902 secs]63.546: [Tenured: 1416K->1065K(1536K), 0.0492621 secs] 1922K->1065K(2112K), 0.0513331 secs]

$ awk -vRS="]" -F":" '$1+0==$1{print $1}' file
63.544
63.546
ghostdog74
yes please do that :)
Stefano Borini
I LOVE awk. However this sed snippet is from a long list of actions in a much larger sed script that acts as a poor man's parser generator, a few things that sed can do that leaves awk wanting.
Xepoch
"a few things that sed can do that leaves awk wanting" -- have you heard of the amazing awk assembler?
ghostdog74
+1  A: 

Also match the ] before the number you want:

s/^.*]\([0-9]*\.[0-9]*\): \[Tenured:.*$/\1/

Per comment below, here is a more generic approach, matching a non-digit first:

s/^.*[^0-9]\([0-9]*\.[0-9]*\): \[Tenured:.*$/\1/
Jeremy Stein
I need to keep that context-free though, there are other JVM GC options that may/will put other data there. I may be able to select all combinations empirically.
Xepoch
OK, then how about matching any non-digit first. See my edit.
Jeremy Stein