tags:

views:

159

answers:

3

Lets say a directory has two files. Here are the contents

File1.txt

tagstart random string tagend

tagstart random string tagend

File2.txt

tagstart random string tagend

tagstart random string tagend

I want to grep the directory and extract the lines that have the following pattern

tagstart <any string> tagend

I also want to pipe the output to another file. Basically the grep command will result in an output file like this

out.txt

tagstart random string tagend

tagstart random string tagend

tagstart random string tagend

tagstart random string tagend
A: 
grep 'tagstart random string tagend' file1.txt file2.txt > out.txt
jim mcnamara
Jim, my question was not clear previously. I edited the question. The pattern I am looking for is any string that can occur between the tags
rakeshr
+1  A: 

file1.txt:

# This is the file nr.1
tagstart 123 tagend
tagstart abc tagend
kill tagstart def tagend kenny

file2.txt:

# This is the file nr.2
tagstart 123 tagend
tagstart abc tagend
kill tagstart xxx tagend kenny

This command will extract the tags and their enclosed strings:

 cat file1.txt file2.txt | grep -o -E "tagstart(.*?)tagend" > output.txt

output.txt:

tagstart 123 tagend
tagstart abc tagend
tagstart def tagend
tagstart 123 tagend
tagstart abc tagend
tagstart xxx tagend

Extra cookie for your pleasure:

This command will do something similar, but will display only sorted unique records, and they occurrences (for statistics purpose):

 sort file1.txt file2.txt | grep -o -E "tagstart(.*?)tagend" | uniq -c | \
 awk '{print $2" "$3" "$4" : "$1}' > output.txt

output.txt:

tagstart 123 tagend : 2
tagstart abc tagend : 2
tagstart def tagend : 1
tagstart xxx tagend : 1
Andrejs Cainikovs
the log file may have other unwanted information too. Let me be more specific. I am looking for a regex that will extract any string between the tags
rakeshr
I've updated my answer.
Andrejs Cainikovs
Thanks Andrejs. One slight twist to itwhat if file1.txt has thissome nonsense before start tagstart xyz tagend nonsense after endI still want to extract only 'tagstart xyz tagend'
rakeshr
I've updated my answer once more. Yes, it will work :-)
Andrejs Cainikovs
thanks. Perfect
rakeshr
A: 

Regexes are rarely a good way to parse xml. Have you thought about situations like tagstart one tagstart two tagend one tagend?

tagstart one tagstart two tagend one tagend
or
tagstart one tagstart two tagend
or
tagstart two tagend
or
tagstart two tagend one tagend
all satisfy your criteria. Which of these do you want?

Larry Wang
thanks kaestur. My log files don't have such tag structures. I am looking for a regex that will extract any string between the tags
rakeshr