views:

52

answers:

2

Hi I have a file that is just one line (one HUGE line) to parse. I want to parse out the value that appears between "Undefined error code" and " id" on this line. The thing is this appears multiple times on the same line with different values everywhere. The following code only gives me the last instance.

cat bad_events_P2J3.xml | sed -n 's/.*Undefined error code (\(.*\))\" id.*/\1\n/p'

How can I get all instances of this?

+1  A: 
$ cat file
text1 text2 Undefined error code text3 text4 id text5 text6 Undefined error code txt7 txt8 id
$ awk -vRS="id" '{gsub(/.*Undefined error code/,"")}1' file
 text3 text4
 txt7 txt8
ghostdog74
wonderful. I couldn't find how to do this anywhere. It seems like it would be an easy thing that you could do with simple sed but its a lot harder when you tackle it. This code was perfect. Thanks
amadain
+1  A: 

You were on the right track:

sed -n 's/.*Undefined error code\(.*\)id.*/\1/p' bad_events_P2J3.xml

Note that cat is unnecessary and, unless you need an extra newline, sed will provide one for you.

I missed the fact that this appears multiple times in your file. This should work in that case:

grep -Po 'Undefined error code.*?id' bad_events_P2J3.xml | sed 's/^Undefined error code//;s/id$//'
Dennis Williamson
so the cat caused the problem? I thought I should be able to do it with sed. I just wasn't seeing the wood for the trees. Thanks
amadain
@amadain: No, `cat` wasn't the problem. It just wasn't necessary since `sed` accepts a filename as an argument and you're not conCATenating multiple files. The problem was probably the extra set of parentheses. Without seeing a portion of the actual data, it's hard to be sure.
Dennis Williamson
@OP, this works only is you are sure you have 1 instance of those pair of words.It will only get the last instance is there are more because sed is greedy.
ghostdog74
thank you for this. Actually the double parenthesis are needed as the phrase that appears multiple times is actually "Undefined error code(code_here)" id="code_here" so the instance of the number I was matching was in parenthesis. The grep -Po was what I actually needed. The first solution had the same problem as mine i.e. it only displayed one instance - the last instance
amadain