ansaurus

Question

Answer 1

+3 A:

Have you considered using XSLT? Seems like the perfect soution, as you are doing exactly what XSLT is meant for, transforming one XML doc into another. The templating system will delve into nested nastiness for you without problems.

Here is a basic example

Andrew Bullock 2008-11-03 20:45:53

Answer 2

A:

I would recommend either doing XSLT as Trull recommended as the best solution.

Or you might look at using a string builder and regex matching to remove the items.

You could look at walking through the document, and working with nodes and parent nodes to effectively move the code from inside the node to the parent, but it would be tedious, and very un-necessary with the other potential solutions out there.

Mitchel Sellers 2008-11-03 20:55:10

Answer 3

A:

A lightweight solution would be to use XmlReader to go trough the input document and XmlWriter to write the output.

Note: XmlReader and XmlWriter clases are abstract, use the appropriate for your situation derived classes.

Sunny 2008-11-03 21:06:03

Answer 4

+2 A:

You'll have to skip the deferred execution with a call to ToList, which probably won't hurt your performance in large documents as you're just going to be iterating and replacing at a much lower big-O than the original search. As @jacob_c pointed out, I should be using element.Nodes() to replace it properly, and as @Panos pointed out, I should reverse the list in order to handle nested replacements accurately.

Also, use XElement.ReplaceWith, much faster than your current approach in large documents:

var elements = doc.Descendants("RemovalTarget").ToList().Reverse();
/* reverse on the IList<T> may be faster than Reverse on the IEnumerable<T>,
 * needs benchmarking, but can't be any slower
 */

foreach (var element in elements) {
    element.ReplaceWith(element.Nodes());
}

One last point, in reviewing what this MAY be used for, I tend to agree with @Trull that XSLT may be what you're actually looking for, if say you're removing all say <b> tags from a document. Otherwise, enjoy this fairly decent and fairly well performing LINQ to XML implementation.

sixlettervariables 2008-11-03 21:14:29

.Value won't work if the RemovalTarget element contains child elements

Jacob Carpenter 2008-11-03 21:18:42

Answer 5

A:

Depending on how you manage your XML, you could use a regular expression to remove the tags.

Here's a simple console application that demonstrates the use of a regex:

    static void Main(string[] args)
    {
        string content = File.ReadAllText(args[0]);

        Regex openTag = new Regex("<([/]?)RemovalTarget([^>]*)>", RegexOptions.Multiline);

        string cleanContent = openTag.Replace(content, string.Empty);
        File.WriteAllText(args[1], cleanContent);
    }

This leaves newline characters in the file, but it shouldn't be too difficult to augment the regular expression.

Philipp Schmid 2008-11-03 22:34:00

Processing XML as string data is very simple if you have control over your source XML and fraught with innumerable complexities if you don't. XML in the wild contains CDATA and comments, and those introduce so many special cases that it's usually best to stick with DOM-based approaches.

Robert Rossney 2008-11-04 19:53:42

ansaurus

tags:

views:

answers:

strip out tag occurrences from XML

related questions