I don't know how to do this in grep. I checked the man page for egrep(1)
and it can't match with a newline in the middle either.
I like the solution @Laurence Gonsalves suggested, of using tr(1)
to wipe out the newlines. But as he noted, it will be a pain to print the matching lines if you do it that way.
If you want to match despite a newline and then print the matching line(s), I can't think of a way to do it with grep, but it would be not too hard in any of Python, AWK, Perl, or Ruby.
Here's a Python script that solves the problem. I decided that, for lines that only match when joined to the previous line, I would print a -->
arrow before the second line of the match. Lines that match outright are always printed without the arrow.
This is written assuming that /usr/bin/python is Python 2.x. You can trivially change the script to work under Python 3.x if desired.
#!/usr/bin/python
import re
import sys
s_pat = "export\s+to\s+excel"
pat = re.compile(s_pat)
def print_ete(fname):
try:
f = open(fname, "rt")
except IOError:
sys.stderr.write('print_ete: unable to open file "%s"\n' % fname)
sys.exit(2)
prev_line = ""
i_last = -10
for i, line in enumerate(f):
# is ete within current line?
if pat.search(line):
print "%s:%d: %s" % (fname, i+1, line.strip())
i_last = i
else:
# construct extended line that included previous
# note newline is stripped
s = prev_line.strip("\n") + " " + line
# is ete within extended line?
if pat.search(s):
# matched ete in extended so want both lines printed
# did we print prev line?
if not i_last == (i - 1):
# no so print it now
print "%s:%d: %s" % (fname, i, prev_line.strip())
# print cur line with special marker
print "--> %s:%d: %s" % (fname, i+1, line.strip())
i_last = i
# make sure we don't match ete twice
prev_line = re.sub(pat, "", line)
try:
if sys.argv[1] in ("-h", "--help"):
raise IndexError # print help
except IndexError:
sys.stderr.write("print_ete <filename>\n")
sys.stderr.write('grep-like tool to print lines matching "%s"\n' %
"export to excel")
sys.exit(1)
print_ete(sys.argv[1])
EDIT: added comments.
I went to some trouble to make it print the correct line number on each line, using a format similar to what you would get with grep -Hn
.
It could be much shorter and simpler if you don't need line numbers, and you don't mind reading in the whole file at once into memory:
#!/usr/bin/python
import re
import sys
# This pattern not compiled with re.MULTILINE on purpose.
# We *want* the \s pattern to match a newline here so it can
# match across multiple lines.
# Note the match group that gathers text around ete pattern uses a character
# class that matches anything but "\n", to grab text around ete.
s_pat = "([^\n]*export\s+to\s+excel[^\n]*)"
pat = re.compile(s_pat)
def print_ete(fname):
try:
text = open(fname, "rt").read()
except IOError:
sys.stderr.write('print_ete: unable to open file "%s"\n' % fname)
sys.exit(2)
for s_match in re.findall(pat, text):
print s_match
try:
if sys.argv[1] in ("-h", "--help"):
raise IndexError # print help
except IndexError:
sys.stderr.write("print_ete <filename>\n")
sys.stderr.write('grep-like tool to print lines matching "%s"\n' %
"export to excel")
sys.exit(1)
print_ete(sys.argv[1])