tags:

views:

4307

answers:

7

I am looking for the clean, elegant and smart solution to remove namespacees from all XML elements? How would function to do that look like?

Defined interface:

public interface IXMLUtils
{
        string RemoveAllNamespaces(string xmlDocument);
}

Sample XML to remove NS from:

<?xml version="1.0" encoding="utf-16"?>
<ArrayOfInserts xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"&gt;
  <insert>
    <offer xmlns="http://schema.peters.com/doc_353/1/Types"&gt;0174587&lt;/offer&gt;
    <type2 xmlns="http://schema.peters.com/doc_353/1/Types"&gt;014717&lt;/type2&gt;
    <supplier xmlns="http://schema.peters.com/doc_353/1/Types"&gt;019172&lt;/supplier&gt;
    <id_frame xmlns="http://schema.peters.com/doc_353/1/Types" />
    <type3 xmlns="http://schema.peters.com/doc_353/1/Types"&gt;
      <type2 />
      <main>false</main>
    </type3>
    <status xmlns="http://schema.peters.com/doc_353/1/Types"&gt;Some state</status>
  </insert>
</ArrayOfInserts>

After we call RemoveAllNamespaces(xmlWithLotOfNs), we should get:

  <?xml version="1.0" encoding="utf-16"?>
    <ArrayOfInserts>
      <insert>
        <offer >0174587</offer>
        <type2 >014717</type2>
        <supplier >019172</supplier>
        <id_frame  />
        <type3 >
          <type2 />
          <main>false</main>
        </type3>
        <status >Some state</status>
      </insert>
    </ArrayOfInserts>

Preffered language of solution is C# on .NET 3.5 SP1.

+2  A: 

What are you trying to accomplish? I've never seen a case where removing namespaces was "smart".

The namespace+local name is the identify of the element or attribute. Removing the namespace would remove half the identity. You may have the first example I've seen where this is a good idea, but if so, please tell me why it makes sense.

John Saunders
Yes, I understand your concern. Namespaces are meant to be, but I have special case. I am working on integration service which connect some BPM tool with 3th party WebService. I have to do deserialization from BPM tool's type xml to WebService type. Both have the same schema (with different namespace of course), but as you can see BPM tool messes with namespaces, which makes deserialization to WS type impossible.
Peter Stegnar
That's what XSLT has been developed for. I recommend an XSL transformation that changes unsuitable input (wrong namespaces or structure) into useful input (right namespaces or structure), instead of simply ripping out namespaces.
Tomalak
Generally I agree with you, I could also done this with XSLT and this may be more right. But I wanted to find some clean solution in code. This question is not what is the right way to make XML transformations with (obviously with XSL).
Peter Stegnar
Why do they have the "same schema" but with different namespaces? Why isn't one of them using the schema from the other? Besides, in this case, rather than removing namespaces, I'd consider defining a third namespace and mapping from each into this third schema. The third schema would be the one you depend on.
John Saunders
@Peter Stegnar: XSLT is not some dirty hack. It is the cleanest method of handling XML transformation, I would say that everything else is inferior.
Tomalak
+1  A: 

Search for the string

xmlns=

using String.IndexOf. This is the start point of your delete.

Find the second quote using String.IndexOf. This is the end point of your delete.

Delete the text, using the String.Remove function

String.Remove(StartPoint, EndPoint-StartPoint)

Rinse and repeat.

Robert Harvey
this is a good method. I would suspect dumping the string into a stringbuilder first would be more efficient (operating in-place on the string with Replace)
Jimmy
Actually, this is not a good method at all. What about things like <element foo="xmlns=" />?
Tomalak
This method breaks the XML since it isn't anymore conformant to namespaces spec because name prefixes can't be resolved. Therefore some parsers/processors might reject the file. Methods that convert all qualified names to local names by stripping prefixes actually *do* preserve namespace conformity.
jasso
+1  A: 

Regular Expressions to the rescue!!!

string XMLPattern = "xmlns=\\\".+\\\"";
Regex regXML = new Regex(pattern);

string XMLInput = FancyMethodThatPutsXMLIntoString();
string Results = regXML.Replace(XMLInput, "");

Note: The triple slashes serve to escape the escaping of the quotes for your regex formula. Technically the formula is xmlns=\".+\"

Dillie-O
Hm... What about "<element><![CDATA[ xmlns="asdasd" ]]></element>"? Trying to process XML with text tools like regex is futile and should be avoided like the plague.
Tomalak
And independently of that, your regex is wrong as well. Had you tested it, you would have noticed.
Tomalak
You don't need the extra escaping in your regex, use a litteral string @"..." instead.
Richard
"Regular Expressions to the rescue!!!" ...so now you have two problems.
jasso
Ugh. The war cry of the mediocre. "I don't understand regexs...so now I still have a problem."
annakata
+4  A: 

the obligatory answer using LINQ:

static XElement stripNS(XElement root) {
    return new XElement(
        root.Name.LocalName,
        root.HasElements ? 
            root.Elements().Select(el => stripNS(el)) :
            (object)root.Value
    );
}
static void Main() {
    var xml = XElement.Parse(@"<?xml version=""1.0"" encoding=""utf-16""?>
    <ArrayOfInserts xmlns:xsi=""http://www.w3.org/2001/XMLSchema-instance"" xmlns:xsd=""http://www.w3.org/2001/XMLSchema""&gt;
      <insert>
        <offer xmlns=""http://schema.peters.com/doc_353/1/Types""&gt;0174587&lt;/offer&gt;
        <type2 xmlns=""http://schema.peters.com/doc_353/1/Types""&gt;014717&lt;/type2&gt;
        <supplier xmlns=""http://schema.peters.com/doc_353/1/Types""&gt;019172&lt;/supplier&gt;
        <id_frame xmlns=""http://schema.peters.com/doc_353/1/Types"" />
        <type3 xmlns=""http://schema.peters.com/doc_353/1/Types""&gt;
          <type2 />
          <main>false</main>
        </type3>
        <status xmlns=""http://schema.peters.com/doc_353/1/Types""&gt;Some state</status>
      </insert>
    </ArrayOfInserts>");
    Console.WriteLine(stripNS(xml));
}
Jimmy
Does that really work? How cool. Recursive?
Robert Harvey
I guess you can show the VB folks you can have an XML literal in C# after all.
Robert Harvey
@Robert, that's not an XML literal. It's a string. There's a big difference!
Dennis Palmer
Jimmy, you are close but not there yet. :) I am writing final solution based on your idea. I will post it there.
Peter Stegnar
you're right :) while you're at it, I offer my own version of the fix.
Jimmy
+5  A: 

The obligatory answer using XSLT:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
  <xsl:output method="xml" indent="no" encoding="UTF-8"/>

  <xsl:template match="/|comment()|processing-instruction()">
    <xsl:copy>
      <xsl:apply-templates/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="*">
    <xsl:element name="{local-name()}">
      <xsl:apply-templates select="@*|node()"/>
    </xsl:element>
  </xsl:template>

  <xsl:template match="@*">
    <xsl:attribute name="{local-name()}">
      <xsl:value-of select="."/>
    </xsl:attribute>
  </xsl:template>

</xsl:stylesheet>
annakata
+1 for the "obligatory". :-) I still wonder why removing namespaces would be a smart decision. This probably crashes and burns on <element ns:attr="a" attr="b"/>.
Tomalak
Oh sure, but every NS removing technique will to a greater or lesser extent. As or validity, I can tell you where I have needed it: importing third party XML where they can't sort out a valid XSD but insist on namespacing. Practicality rules ultimately.
annakata
@annakata: Namespaces are perfectly usable without XSD, and very useful where nodes come from multiple sources. You just have to remember that nodes are named by namespace+localname always (and the namespace tag (or prefix) is just a local document detail).
Richard
@Richard - this entirely depends on what you have parsing the XML...
annakata
@annakata: The solution is simpler than you think. Stop enabling. Refuse to use any technique which does not understand XML. The only reason we're still forced to use such garbage is because people keep saying, "yes", when they need to say "no" a bit more often. The standards are over 10 years old! Why else do we still have software that doesn't understand XML Namespaces, except that we continue to enable it to exist?
John Saunders
@John - ha, there are those things which should be done, and there are those things which the management deems will be done. All is for the best in the best of all possible worlds.
annakata
+3  A: 

Well, here is the final answer. I have used great Jimmy idea (which unfortunately is not complete itself) and complete recursion function to work properly.

Based on interface:

string RemoveAllNamespaces(string xmlDocument);

I represent here final clean and universal C# solution for removing XML namespaces:

//Implemented based on interface, not part of algorithm
public static string RemoveAllNamespaces(string xmlDocument)
{
    XElement xmlDocumentWithoutNs = RemoveAllNamespaces(XElement.Parse(xmlDocument));

    return xmlDocumentWithoutNs.ToString();
}

//Core recursion function
private static XElement RemoveAllNamespaces(XElement xmlDocument)
{
    if (!xmlDocument.HasElements)
    {
        XElement xElement = new XElement(xmlDocument.Name.LocalName);
        xElement.Value = xmlDocument.Value;
        return xElement;
    }
    return new XElement(xmlDocument.Name.LocalName, xmlDocument.Elements().Select(el => RemoveAllNamespaces(el)));
}

It's working 100%, but I have not tested it much so it may not cover some special cases ... But it is good base to start.

Peter Stegnar
How well does that work with attributes that have namespaces? In fact, your code just ignores attributes entirely.
John Saunders
I realize namespaces may be useful in some applications, but not at all in mine; they were causing a huge annoyance. This solution worked for me.
JYelton
A: 

The reply's by Jimmy and Peter were a great help, but they actually removed all attributes, so I made a slight modification:

Imports System.Runtime.CompilerServices

Friend Module XElementExtensions

    <Extension()> _
    Public Function RemoveAllNamespaces(ByVal element As XElement) As XElement
        If element.HasElements Then
            Dim cleanElement = RemoveAllNamespaces(New XElement(element.Name.LocalName, element.Attributes))
            cleanElement.Add(element.Elements.Select(Function(el) RemoveAllNamespaces(el)))
            Return cleanElement
        Else
            Dim allAttributesExceptNamespaces = element.Attributes.Where(Function(attr) Not attr.IsNamespaceDeclaration)
            element.ReplaceAttributes(allAttributesExceptNamespaces)
            Return element
        End If

    End Function

End Module
andygjp