views:

2817

answers:

6

How to extract a text part by regexp in linux shell? Lets say, I have file where in every line is an IP address, but in different position. What is the most simple way to extract those IP addresses using common unix command-line tools?

A: 

I'd suggest perl. (\d+.\d+.\d+.\d+) should probably do the trick.

EDIT: Just to make it more like a complete program, you could do something like the following (not tested):

#!/usr/bin/perl -w
use strict;

while (<>) {
    if (/(\d+\.\d+\.\d+\.\d+)/) {
        print "$1\n";
    }
}

This handles one IP per line. If you have more than one IPs per line, you need to use the /g option. man perlretut gives you a more detailed tutorial on regular expressions.

PolyThinker
I do not know perl well... Could You write a full command?
softly.lt
You could always write that in one line and use perl -n or perl -e to execute it but IMHO a small script like this is easier to handle, especially if you want to keep it and modify it later for other patterns.
PolyThinker
+4  A: 

You could use grep to pull them out.

grep -o '[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}' file.txt
brien
This won't extract the interesting part of the lines, just the whole lines that have ip addresses.
Avi
You need to add the -o option. Then it works.
Ben
Thanks, Ben, I edited to correct it.
brien
Thanks -o was what I needed... I overlooked this in the manual.
softly.lt
The right tool for the job. Sure lots of unix tools can do this job, but grep is clearly built for it. Gotta love unix!
PEZ
+2  A: 

You can use sed. But if you know perl, that might be easier, and more useful to know in the long run:

perl -n '/(\d+\.\d+\.\d+\.\d+)/ && print "$1\n"' < file
Avi
A: 

You could use awk, as well. Something like ...

awk '{i=1; if (NF > 0) do {if ($i ~ /regexp/) print $i; i++;} while (i <= NF);}' file

-- may need cleaning. just a quick and dirty response to show basically how to do it with awk

still the one with grep looks most elegant and easy
softly.lt
Oh, agreed. Just thought it would be useful to show a variety of methods in case someone wanted to know, specifically, how to do it with awk.
A: 

I usually start with grep, to get the regexp right.

# [multiple failed attempts here]
grep    '[0-9]*\.[0-9]*\.[0-9]*\.[0-9]*'                 file  # good?
grep -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' file  # good enough

Then I'd try and convert it to sed to filter out the rest of the line. (After reading this thread, you and I aren't going to do that anymore: we're going to use grep -o instead)

sed -ne 's/.*\([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\).*/\1/p  # FAIL

That's when I usually get annoyed with sed for not using the same regexes as anyone else. So I move to perl.

$ perl -nle '/[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/ and print $&'

Perl's good to know in any case. If you've got a teeny bit of CPAN installed, you can even make it more reliable at little cost:

$ perl -MRegexp::Common=net -nE '/$RE{net}{IPV4}/ and say $&' file(s)
JB
+5  A: 

Most of the examples here will match on 999.999.999.999 which is not technically a valid IP address.

The following will match on only valid IP addresses (including network and broadcast addresses).

grep -E -o '(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) file.txt'

Omit the -o if you want to see the entire line that matched.

sjbotha