I have this huge xml which contains a lot of comments.
Whats the "best way" to strip out all the comments and nicely format the xml from the linux command line?
I have this huge xml which contains a lot of comments.
Whats the "best way" to strip out all the comments and nicely format the xml from the linux command line?
You might want to look at the xmllint
tool. It has several options (one of which --format
will do a pretty print), but I can't figure out how to remove the comments using this tool.
Also, check out XMLStarlet, a bunch of command line tools to do anything you would want to with xml. Then do:
xml c14n --without-comments # XML file canonicalization w/o comments
EDIT: OP eventually used this line:
xmlstarlet c14n --without-comments old.xml > new.xml
The best way would be to use an XML parser to handle all the obscure corner cases correctly. But if you need something quick and dirty, there are a variety of short solutions using Perl regexes which may be sufficient.
Run your XML through an identity transform XSLT, with an empty template for comments.
All of the XML content, except for the comments, will be passed through to the output.
In order to niecely format the output, set the output @indent="yes":
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<!--Match on Attributes, Elements, text nodes, and Processing Instructions-->
<xsl:template match="@*| * | text() | processing-instruction()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<!--Empty template prevents comments from being copied into the output -->
<xsl:template match="comment()"/>
</xsl:stylesheet>