tags:

views:

87

answers:

3

I have this huge xml which contains a lot of comments.

Whats the "best way" to strip out all the comments and nicely format the xml from the linux command line?

+4  A: 

You might want to look at the xmllint tool. It has several options (one of which --format will do a pretty print), but I can't figure out how to remove the comments using this tool.

Also, check out XMLStarlet, a bunch of command line tools to do anything you would want to with xml. Then do:

xml c14n --without-comments # XML file canonicalization w/o comments

EDIT: OP eventually used this line:

xmlstarlet c14n --without-comments old.xml > new.xml
Daren Thomas
xmllint is a cmdline interface to libxml2, a library with bindings for many languages. E.g. I use XML::LibXML in Perl.
reinierpost
I evntually used: xmlstarlet c14n --without-comments old.xml > new.xml
elcuco
A: 

The best way would be to use an XML parser to handle all the obscure corner cases correctly. But if you need something quick and dirty, there are a variety of short solutions using Perl regexes which may be sufficient.

ire_and_curses
Don't use regexes on XML.
reinierpost
@reinierpost: I understand why this answer seems distasteful. But if this is a one-off, and you know your comments are a well constrained subset of the comment spec, then what's wrong with a regex solution? I agree a parsing tool is preferable (and is the best answer), but I do think this is a valid alternative in some specific situations (e.g. simple testing, or 2AM crisis callouts on a read-only system), and can be quite convenient.
ire_and_curses
+4  A: 

Run your XML through an identity transform XSLT, with an empty template for comments.

All of the XML content, except for the comments, will be passed through to the output.

In order to niecely format the output, set the output @indent="yes":

<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;

<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

<!--Match on Attributes, Elements, text nodes, and Processing Instructions-->
<xsl:template match="@*| * | text() | processing-instruction()">
   <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
   </xsl:copy>
</xsl:template>

<!--Empty template prevents comments from being copied into the output -->
<xsl:template match="comment()"/>

</xsl:stylesheet>
Mads Hansen