tags:

views:

34

answers:

1

I am trying to convert a conversation I downloaded from Wikipedia into XML. I used the special export to get the page in XML format... that works great until I get to the main conversation.

<conversation>
    {{PersonA|Cheese}}
    {{PersonB|I like it too...}}
    {{PersonA|Cheese?}}
</conversation>

Thats not the real conversation... anyway, I'm wondering whats the easiest way to convert a MASSIVE conversation like that into valid XML?

<conversation>
    <personA>Cheese</personA>
    <personB>I like it too...</personB>
    <personA>Cheese?</personA>
</conversation>

Thanks, this is far too long to do it manually. I'm guessing regex can help out... somehow.

+2  A: 

Pattern:

\{\{(.*?)\|(.*?)\}\}

Replace:

<$1>$2</$1>

This is a simple solution that will fit your sample, but depending on the exact format, a more complex expression may be needed. E.g., what if a name contains a pipe? What if the text contains two closing curly brackets? Can text span multiple lines?

Max Shawabkeh
Thanks Max, I'll try that out... So will I need a special text editor to perform that action? Also, how are the $1 and $2 variables defined?
JackD-Laker
You need a text editor that supports regular expressions. I suppose most of them do by now, but you will have to tell the search/replace command to use them. `$1` (often also written `\1`) is a backreference, containing what was matched in the first set of parentheses.
Tim Pietzcker
Thanks for all the help guys... Saved me a few hours work :P
JackD-Laker