I want to use grep to find all of the headers in a corpus, I want to find every thing up to the : and ignore every thing after that. Does anyone know how to do that? (Could I get a complete line of code)
+3
A:
Use sed or awk.
A sed example:
sed -e '/^[^:]*$/d' -e 's/\(.*\):.*/\1/' filename
Jeremy Stein
2009-07-02 13:41:27
+1
A:
If all you want to do is display the first portion of the matched line then you can say
grep your_pattern | cut -d: -f 1
but if you want to not match against data after the colon, you need a different tool. There are many tools available sed
, awk
, perl
, python
, etc. For instance, the Perl code would look something like this
perl -nle '($s) = split /:/; print $s if $s =~ /your_pattern/'
or the longer script version:
#!/usr/bin/perl
use strict;
use warnings;
while (my $line = <>) {
my $substring = split /:/, $line;
if ($substring =~ /your_pattern/) {
print "$substring\n";
}
}
Chas. Owens
2009-07-02 13:50:01
+1
A:
(I'm not sure I fully understand your question)
you must use 'grep' AND 'cut', one solution (albeit far from perfect) would be:
$ cat file | grep ':' | cut -f 1 -d ':'
atrent
2009-07-02 13:51:06
A:
sed -n '/^$/q;/:/{s/:.*/:/;p;}'
This will stop after all the headers are processed.
Edit: a bit improved version:
sed -n '/^$/q;/^[^ :\t]{1,}:/{s/:.*/:/;p;}'
mitchnull
2009-07-02 14:11:48