tags:

views:

101

answers:

5

I have a log file containing statistics from different servers. I am separating the statistics from this log file using regex only. I am trying to capture the CPU usage from the running process. For SunOS, I have below output:

process,10050,user1,218,59,0,1271M,1260M,sleep,58.9H,0.02%,java

Here the CPU % is at 11th field if we separate by commas (,). This field has % sign which is unique and I can use below regex To get this value:

regex => q/^process,(?:.*?),((?:\d+)\.(?:\d+))%,java$/,

For the linux system I have below output:

process,26190,user1,20,0,1236m,43m,6436,S,0.0,1.1,0:00.00,java,

Here the CPU usage is at 10th column but without % sign and there is nothing unique I can see with this field.

What regex pattern should I use to get this value?

+3  A: 

If the line is already comma separated, you can just use split on the string and pick the correct field.

e.g.

my @fields = split(/,/, $input);
Brian Rasmussen
Thanks but I have to define a regex to get the required value as I have done for SunOS process. If I use split, I have to change my complete code :(.
Space
I looks to me like you're using the regex, to pick out fields from the line. Can't you just change that to use split instead? Do you use the regex all over the place?
Brian Rasmussen
I am trying to get other values using regex only and I am done with most of the things. so for this I dont want to change the complete code. I have regex all over the place.
Space
In your programming career you'll find yourself changing most of the code quite a bit. Bite the bullet. Don't ever get attached to any code.
brian d foy
Agree with @brian. If you code already uses a regex then it will be more complex than what is being suggested. Why make it more complex
justintime
When down voting please leave a comment. Thanks.
Brian Rasmussen
A: 

you have data structure that has distinct delimiter, so don't use regex but just use splitting and get your item by index(or slicing). Its easier.

$output="process,10050,user1,218,59,0,1271M,1260M,sleep,58.9H,0.02%,java";
@s = split /,/,$output;
print "$s[10]\n";

for linux, just get $s[9]

ghostdog74
A: 

I know nothing about Linux but just ignore the code if it looks too naive-minded :)

 /^process.*(?<=[A-Z],)((?:\d+)\.(?:\d+)).*java$/;
Mike
why downvote? can someone explain? you can just ignore the code if you think it looks too naive-minded or you can offer your better version but please explain :)
Mike
+1  A: 

Use Text::CSV_XS to work with comma-separated values. If you're asking the question, you shouldn't be trying to handle it yourself. The module is extremely optimized and you won't be able to do a better job on your own.

Once you extract the right position, you can strip off the % sign if it's there.

brian d foy
What advantage does this module have over plain `split /,/`?
Nathan Fellman
It's extremely fast, which is why I said "extremely optimized". Read its docs. It also handles CSV correctly, which split doesn't.
brian d foy
@brian, why people always recommend the use of this or that module? Won't that be an overkill? Won't that look irrelevant to the question. For OP's question, if splitting is allowed (the OP specified he wanted the regex pattern though), some kind of one liner like "((split/,/$input)[10])" is enough.
Mike
People always recommend that module because its correct and very fast. That people keep recommending it should be a big, flashing sign. :)
brian d foy
@brian, I remember you once said something like we shouldn't pay too much attention to micro-optimization. Although the module you're recommending may be very fast, isn't that also some kind of micro-optimization that we shouldn't pay too much attention to?
Mike
@brian, btw, I really like your co-authored book Learning Perl (5th edition) and now I've started reading Intermediate Perl. They're very interesting books :)
Mike
You shouldn't pay attention to micro-optimization when you're the one who has to do it and you haven't handled the big problems yet. When someone else has already done it, you should use it.
brian d foy
Also, to emphasize brian's other point, "It also handles CSV correctly, which split doesn't". That's probably even a bigger benefit than optimization.
DVK
A: 

Do you need to recognize the line as well, or only extract the value, i.e. do you expect some unrelated lines? If not the answer is /^(?:[^,]+,){9}([^,]+)/' this will extract the tenth field from some comma separated list

Giant Hare
Thanks gianthare: I have to recognize the line as well, which I am doing with ^process and java$.
Space
Then change it to `/^process,(?:[^,]+,){8}([^,]+)(?:[^,]+,)*java$/`
Giant Hare
hi gianthare: I have tried this but still not getting the required output. Any suggestion please.
Space
I forgot a comma after 'java', and a comma after the CPU field `/^process,(?:[^,]+,){8}([^,]+),(?:[^,]+,)*java,$/'`. Should work now
Giant Hare