tags:

views:

162

answers:

3

i have the string which is a part of an xml.

a<b>b</b>c<i>d</i>e<b>f</b>g

the problem is that i want to extract from the string the parts that are not inside any tags. so i need to extract the string"aceg" from this string and leave the characters "bdf" how can this be done?

Edit: this was a part of an xml let asume its

<div>a<b>b</b>c<i>d</i>e<b>f</b>g</div>

now its a valid xml :)

+2  A: 

The following regular expression will remove all tags from the string:

Regex.Replace("a<b>b</b>c<i>d</i>e<b>f</b>g", "<[^>]+>", string.Empty);
Stoo
i tried this code and it will return "abcdefg" but i need to remove the text that is inside the tags. so i need it to return "aceg"
Karim
Actually what he wants is this to replace tags and inner text, so: <.*?>.*?<.*?>
Paulo Manuel Santos
Processing XML with regex is a bad idea. If those tags can contain attributes, or anything like comments, such an expression can go badly wrong.
bobince
Sorry, slightly misread the question ;) pauloya or grenade's answer will finish the job.
Stoo
`<obligatory>` Now you have two problems. `</obligatory>`
MatrixFrog
+3  A: 

That string is not valid XML.

However, assuming you had a valid XML string, then you could do something like this:

class Program
{
    static void Main(string[] args)
    {
        string contents = string.Empty;

        XmlDocument document = new XmlDocument();
        document.LoadXml("<outer>a<b>b</b>c<i>d</i>e<b>f</b>g</outer>");

        foreach(XmlNode child in document.DocumentElement.ChildNodes)
        {
            if (child.NodeType == XmlNodeType.Element)
            {
                contents += child.InnerText;
            }
        }

        Console.WriteLine(contents);

        Console.ReadKey();
    }
}

This will print out the string "bdf"

sgrassie
this one works but it works in reverse. i mean it extracts the string "bdf" but if u change the condition "child.NodeType == XmlNodeType.Element" to "child.NodeType == XmlNodeType.Text" it will return the needed result
Karim
A: 

Following from @Stoo's answer you should be able to omit the tag contents as well with something like this:

Regex.Replace("a<b>b</b>c<i>d</i>e<b>f</b>g", "<[^>]+>[^<]+</[^>]+>", string.Empty);
grenade
this one works but i will go for the xml solution by sgrassie because it gives me more control over the elements
Karim