ansaurus

Question

Answer 1

+2 A:

My feeling is that your .* at the beginning is acting greedy, so it absorbs everything up to the dot, but I could be wrong.

Don't use sed. I gave up on this. perl is a better choice (I was starting to play with it) but the solution with awk beats me. Go for that, unless you really love sed for some particular reason...

Stefano Borini 2009-11-06 15:26:22

Was thinking that too, but otherwise I'm thinking wouldn't have matched the right-side of the float? Hmmm...thanks for any help :)

Xepoch 2009-11-06 15:28:03

+1 for suggesting Perl (much nicer way to do regex stuff) and also your answer is right -- the ".*" is greedy and swallows everything up to the decimal point.

AAT 2009-11-06 16:18:36

Answer 2

A:

As Stefano pointed out, the pattern is performing a greedy match at the beginning of your text input.

If you can use perl, this command works to match your line on standard input:

perl -e '<STDIN> =~ m/^.*?([\d]+\.[\d]+):\s+\[Ten/ && print "$1\n";'

jheddings 2009-11-06 15:34:30

That's the problem. I think that sed does not support the non-greedy.

Stefano Borini 2009-11-06 15:37:15

Good call... I just removed that solution from above.

jheddings 2009-11-06 15:39:55

Yes, sed does not support said constructs.

Xepoch 2009-11-06 15:40:07

@Xepoch did the `[^\d]*` help you?

jheddings 2009-11-06 15:41:21

Just tried both of your updates examples, they all produce the same output as I have above without the left-side.

Xepoch 2009-11-06 15:41:27

do you really, really need sed ? :)

Stefano Borini 2009-11-06 15:42:55

Yes and no. I have a sed script file that parses nested GC logs that was written due to ironically the troubles of so doing in awk. Perl is like speaking a clicking language to me for some reason since I can remember. The syntax and maintaining OPC (other people's code) confuses me. That said, it may be time I dust off the old Camel Book.

Xepoch 2009-11-06 16:01:05

Just added a Perl example if it will help get you started.

jheddings 2009-11-06 16:09:09

Answer 3

+1 A:

use awk instead sed. why bother creating complex regex?

$ more file
63.544: [GC 63.544: [DefNew: 575K->63K(576K), 0.0017902 secs]63.546: [Tenured: 1416K->1065K(1536K), 0.0492621 secs] 1922K->1065K(2112K), 0.0513331 secs]

$ awk -vRS="]" -F":" '$1+0==$1{print $1}' file
63.544
63.546

ghostdog74 2009-11-06 15:46:01

yes please do that :)

Stefano Borini 2009-11-06 15:48:22

I LOVE awk. However this sed snippet is from a long list of actions in a much larger sed script that acts as a poor man's parser generator, a few things that sed can do that leaves awk wanting.

Xepoch 2009-11-06 15:52:49

"a few things that sed can do that leaves awk wanting" -- have you heard of the amazing awk assembler?

ghostdog74 2009-11-06 15:56:32

Answer 4

+1 A:

Also match the ] before the number you want:

s/^.*]\([0-9]*\.[0-9]*\): \[Tenured:.*$/\1/

Per comment below, here is a more generic approach, matching a non-digit first:

s/^.*[^0-9]\([0-9]*\.[0-9]*\): \[Tenured:.*$/\1/

Jeremy Stein 2009-11-06 16:26:33

I need to keep that context-free though, there are other JVM GC options that may/will put other data there. I may be able to select all combinations empirically.

Xepoch 2009-11-06 17:46:58

OK, then how about matching any non-digit first. See my edit.

Jeremy Stein 2009-11-06 18:33:23

ansaurus

tags:

views:

answers:

Extract float from text line with sed?

related questions