tags:

views:

30

answers:

2

I have the following data:

a b c d FROM:<uniquepattern1>
e f g h TO:<uniquepattern2>
i j k l FROM:<uniquepattern1>
m n o p TO:<uniquepattern3>
q r s t FROM:<uniquepattern4>
u v w x TO:<uniquepattern5>

I would like a regex query that can find the contents of TO: when FROM:<uniquepattern1> is encountered, so the results would be uniquepattern2 and uniquepattern3.

I am hopeless with regex, I would appreciate any pointers on how to write this (lookahead parameters?) and any differences between regex on different platforms (eg the C# .NET Regex versus Grep vs Perl) that might be relevant here.

Thank you.

+2  A: 

Try:

/FROM:<uniquepattern1>.*\r?\n.*?TO:<(.*?)>/

This works by first finding the FROM anchor and then use a dot wildcard. The dot operator does not match a newline so this will consume the rest of the line. A non-greedy dot wildcard match then consumes up to the next TO and captures what's between the angle brackets.

cletus
Thank you for the response.
taspeotis
+1  A: 

your requirement for file parsing is simple. there is no need to use regular expression. Open the file for reading, go through each line check for FROM:<uniquepattern1>, get the next line and print them out. Furthermore, your TO lines are only separated by ":". therefore you can use that as field delimiter.

eg with awk

$ awk -F":" '/FROM:<uniquepattern1>/{getline;print $2}' file
<uniquepattern2>
<uniquepattern3>

the same goes for other languages/tools

ghostdog74
Thank you for the response. Cletus' answer is more what I want, but I upvoted your answer for introducing me to awk. I haven't used it and will research it for future scenarios.
taspeotis