tags:

views:

181

answers:

4

I want to use grep to find all of the headers in a corpus, I want to find every thing up to the : and ignore every thing after that. Does anyone know how to do that? (Could I get a complete line of code)

+3  A: 

Use sed or awk.

A sed example:

sed -e '/^[^:]*$/d' -e 's/\(.*\):.*/\1/' filename
Jeremy Stein
+1  A: 

If all you want to do is display the first portion of the matched line then you can say

grep your_pattern | cut -d: -f 1

but if you want to not match against data after the colon, you need a different tool. There are many tools available sed, awk, perl, python, etc. For instance, the Perl code would look something like this

perl -nle '($s) = split /:/; print $s if $s =~ /your_pattern/'

or the longer script version:

#!/usr/bin/perl

use strict;
use warnings;

while (my $line = <>) {
    my $substring = split /:/, $line;
    if ($substring =~ /your_pattern/) {
        print "$substring\n";
    }
}
Chas. Owens
+1  A: 

(I'm not sure I fully understand your question)

you must use 'grep' AND 'cut', one solution (albeit far from perfect) would be:

$ cat file | grep ':' | cut -f 1 -d ':'

atrent
A: 

sed -n '/^$/q;/:/{s/:.*/:/;p;}'

This will stop after all the headers are processed.

Edit: a bit improved version:

sed -n '/^$/q;/^[^ :\t]{1,}:/{s/:.*/:/;p;}'

mitchnull