ansaurus

Question

How to extract a text part by regexp in linux shell?

Answer 1

A:

I'd suggest perl. (\d+.\d+.\d+.\d+) should probably do the trick.

EDIT: Just to make it more like a complete program, you could do something like the following (not tested):

#!/usr/bin/perl -w
use strict;

while (<>) {
    if (/(\d+\.\d+\.\d+\.\d+)/) {
        print "$1\n";
    }
}

This handles one IP per line. If you have more than one IPs per line, you need to use the /g option. man perlretut gives you a more detailed tutorial on regular expressions.

PolyThinker 2009-01-09 13:08:26

I do not know perl well... Could You write a full command?

softly.lt 2009-01-09 13:11:34

You could always write that in one line and use perl -n or perl -e to execute it but IMHO a small script like this is easier to handle, especially if you want to keep it and modify it later for other patterns.

PolyThinker 2009-01-09 13:24:16

Answer 2

+4 A:

You could use grep to pull them out.

grep -o '[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}' file.txt

brien 2009-01-09 13:11:27

This won't extract the interesting part of the lines, just the whole lines that have ip addresses.

Avi 2009-01-09 13:13:04

You need to add the -o option. Then it works.

Ben 2009-01-09 13:14:16

Thanks, Ben, I edited to correct it.

brien 2009-01-09 13:15:01

Thanks -o was what I needed... I overlooked this in the manual.

softly.lt 2009-01-09 13:18:34

The right tool for the job. Sure lots of unix tools can do this job, but grep is clearly built for it. Gotta love unix!

PEZ 2009-01-09 13:19:57

Answer 3

+2 A:

You can use sed. But if you know perl, that might be easier, and more useful to know in the long run:

perl -n '/(\d+\.\d+\.\d+\.\d+)/ && print "$1\n"' < file

Avi 2009-01-09 13:14:00

Answer 4

A:

You could use awk, as well. Something like ...

awk '{i=1; if (NF > 0) do {if ($i ~ /regexp/) print $i; i++;} while (i <= NF);}' file

-- may need cleaning. just a quick and dirty response to show basically how to do it with awk

2009-01-09 13:28:33

still the one with grep looks most elegant and easy

softly.lt 2009-01-09 13:33:25

Oh, agreed. Just thought it would be useful to show a variety of methods in case someone wanted to know, specifically, how to do it with awk.

2009-01-09 14:03:03

Answer 5

A:

I usually start with grep, to get the regexp right.

# [multiple failed attempts here]
grep    '[0-9]*\.[0-9]*\.[0-9]*\.[0-9]*'                 file  # good?
grep -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' file  # good enough

Then I'd try and convert it to sed to filter out the rest of the line. (After reading this thread, you and I aren't going to do that anymore: we're going to use grep -o instead)

sed -ne 's/.*\([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\).*/\1/p  # FAIL

That's when I usually get annoyed with sed for not using the same regexes as anyone else. So I move to perl.

$ perl -nle '/[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/ and print $&'

Perl's good to know in any case. If you've got a teeny bit of CPAN installed, you can even make it more reliable at little cost:

$ perl -MRegexp::Common=net -nE '/$RE{net}{IPV4}/ and say $&' file(s)

JB 2009-01-09 13:35:48

Answer 6

+5 A:

Most of the examples here will match on 999.999.999.999 which is not technically a valid IP address.

The following will match on only valid IP addresses (including network and broadcast addresses).

grep -E -o '(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) file.txt'

Omit the -o if you want to see the entire line that matched.

sjbotha 2009-01-09 13:46:53

ansaurus

tags:

views:

answers:

How to extract a text part by regexp in linux shell?

related questions