views:

108

answers:

2

How can i replace a "<" and a ">" (in the content of xml file) with a matching "&lt;" and "&gt;" (with a pre known set of tags) using a regex?

example: <abc>fd<jkh</abc><def>e>e</def> should result with: <abc>fd&lt;jkh</abc><def>e&lt;e</def>

it must be done with a regex! (no xml load and such...)

A: 
s/<(?=[^<>]*<)/&lt;/g
s/>(?<=\>[^<>]*)/&gt;/g

In C#,

new Regex("<(?=[^<>]*<)").Replace(your_xml_string, "&lt;");
new Regex(">(?<=\>[^<>]*)").Replace(your_xml_string, "&gt;");

Not tested. I don't have C# on my hand.

KennyTM
is it in C#? :)
Jack
@Jack: See update.
KennyTM
very good. but it didnt replace something like <abc>>dfddf</abc>and also didn't replace the not predifined tags like<abc>sdfsdf<job>dasf</abc>there has to be some predefine tags... otherwise we don't get the replacement of the undefined tags...
Jack
@Jack: Why can't you write down all arguments at once in the question? And you can't expect to use 1 or 2 simple regex if you need to detect `<x><y></x>`.
KennyTM
+2  A: 

I think the pattern

<([^>]*<)

will match a < that encounters another < before > (therefore not part of a tag)

...and the pattern

(>[^<]*)>

will match a > that follows another >

var first = Regex.Replace(@"<abc>fd<jkh</abc><def>e>e</def>",@"<([^>]*?<)",@"&lt;$1");
var final = Regex.Replace(first,@"(>[^<]*?)>",@"$1&gt;");

EDIT:

This does work, but you have to pass over it multiple times. I'm sure there's a purer method, but this does work.

class Program
{
    static void Main(string[] args)
    {
        var next = @"<abc>dffs<<df</abc>";
        string current;
        do
        {
            current = next;
            next = Regex.Replace(current, @"<([^>]*?<)", @"&lt;$1");
            next = Regex.Replace(next, @"(>[^<]*?)>", @"$1&gt;");
        } while(next != current);
        Console.WriteLine(current);
        Console.ReadKey();
    }
}
Jay
didn't work with <abc>dffs<<df</abc>
Jack
@Jack you didn't give <abc>dffs<<df</abc> as an example. I think you should learn some RegEx so that you can take Jay's example and expand it. Remember we are supposed to point you down a path, not walk it for you.
David Basarab
you are right. i will take it into my account.
Jack
this is my actual example, which will solve all the cases. "<abc>>sdfsdf<<asdada>>asdasd<>asdasd<asdsad>asds<</abc>" only the <abc> and the matching </abc> are predefined.
Jack
this one is very good! but still didn't work with replacing the <asdada> and <> !
Jack
You didn't ask for that.
Jay