views:

259

answers:

7

Hi -

I am searching for an XSLT or command-line tool (or C# code that can be made into a command-line tool, etc) for Windows that will do XML pretty-printing. Specifically, I want one that has the ability to put attributes one-to-a-line, something like:

<Node>
   <ChildNode 
      value1='5'
      value2='6'
      value3='happy' />
</Node>

It doesn't have to be EXACTLY like that, but I want to use it for an XML file that has nodes with dozens of attributes and spreading them across multiple lines makes them easier to read, edit, and text-diff.

NOTE: I think my preferred solution is an XSLT sheet I can pass through a C# method, though a Windows command-line tool is good too.

A: 

XML Notepad 2007 can do so manually ... let me see if it can be scripted.

Nope ... it can launch it like so:

XmlNotepad.exe a.xml

The rest is just clicking the save button. Power Shell, other tools can automate that.

Hamish Grubijan
@Hamish Grubijan, that'd probably work but automating GUIs is awfully hacky -- there must be an easier way!
Scott Stafford
A: 

Just use this xslt:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
  <xsl:output method="xml" encoding="ISO-8859-1"/>
  <xsl:param name="indent-increment" select="'   '"/>

  <xsl:template name="newline">
    <xsl:text disable-output-escaping="yes">
</xsl:text>
  </xsl:template>

  <xsl:template match="comment() | processing-instruction()">
    <xsl:param name="indent" select="''"/>
    <xsl:call-template name="newline"/>    
    <xsl:value-of select="$indent"/>
    <xsl:copy />
  </xsl:template>

  <xsl:template match="text()">
    <xsl:param name="indent" select="''"/>
    <xsl:call-template name="newline"/>    
    <xsl:value-of select="$indent"/>
    <xsl:value-of select="normalize-space(.)"/>
  </xsl:template>

  <xsl:template match="text()[normalize-space(.)='']"/>

  <xsl:template match="*">
    <xsl:param name="indent" select="''"/>
    <xsl:call-template name="newline"/>    
    <xsl:value-of select="$indent"/>
      <xsl:choose>
       <xsl:when test="count(child::*) > 0">
        <xsl:copy>
         <xsl:copy-of select="@*"/>
         <xsl:apply-templates select="*|text()">
           <xsl:with-param name="indent" select="concat ($indent, $indent-increment)"/>
         </xsl:apply-templates>
         <xsl:call-template name="newline"/>
         <xsl:value-of select="$indent"/>
        </xsl:copy>
       </xsl:when>       
       <xsl:otherwise>
        <xsl:copy-of select="."/>
       </xsl:otherwise>
     </xsl:choose>
  </xsl:template>    
</xsl:stylesheet>

Or, as another option, here is a perl script: http://software.decisionsoft.com/index.html

glebm
That looks good. To complete the chore, is there a commandline xslt processor / can I use c# to process XML with an XSLT?
Scott Stafford
Both, but saxon command-line xslt processor (http://saxon.sourceforge.net/) should be enough :)
glebm
@glebm: I tried it and it doesn't seem to work. It indents each NODE, but doesn't newline/indent on each ATTRIBUTE. I replaced the indent-increment string with XYXY to make sure it was processing, and it was, but attributes are still in one long line.
Scott Stafford
I'm going to open this up as a separate question, because I'm curious WHY this wasn't working for me.
Scott Stafford
+1  A: 

There is a tool, that can split attributes to one per line: xmlpp. It's a perl script, so you'll have to install perl. Usage:

perl xmlpp.pl -t input.xml

You can also determine the ordering of attributes by creating a file called attributeOrdering.txt, and calling perl xmlpp.pl -s -t input.xml . For more options, use perl xmlpp.pl -h

I hope, it doesn't have too many bugs, but it has worked for me so far.

Chris Lercher
@chris_l - thanks. my team is all ASP.NET C# developers so nobody will have perl installed. Getting them to install it is probably possible but not ideal. But I tried it out and it does do what it's supposed to do!
Scott Stafford
A: 

You can implement a simple SAX application that will copy everything as is and indent attributes how you like.

UPD:

SAX stands for Simple API for XML. It is a push model of XML parsing (a classical example of Builder design pattern). The API is present in most of the current development platforms (though native .Net class library lacks one, having XMLReader intead)

Here is a raw implementation in python, it is rather cryptic but you can realize the main idea.

from sys import stdout
from xml.sax import parse
from xml.sax.handler import ContentHandler
from xml.sax.saxutils import escape

class MyHandler(ContentHandler):

    def __init__(self, file_, encoding):
        self.level = 0
        self.elem_indent = '    '

        # should the next block make a line break
        self._allow_N = False
        # whether the opening tag was closed with > (to allow />)
        self._tag_open = False

        self._file = file_
        self._encoding = encoding

    def _write(self, string_):
        self._file.write(string_.encode(self._encoding))

    def startElement(self, name, attrs):
        if self._tag_open:
            self._write('>')
            self._tag_open = False

        if self._allow_N:
            self._write('\n')
            indent = self.elem_indent * self.level
        else:
            indent = ''
        self._write('%s<%s' % (indent, name))

        # attr indent equals to the element indent plus '  '
        attr_indent = self.elem_indent * self.level + '  '
        for name in attrs.getNames():
            # write indented attribute one per line
            self._write('\n%s%s="%s"' % (attr_indent, name, escape(attrs.getValue(name))))

        self._tag_open = True

        self.level += 1
        self._allow_N = True

    def endElement(self, name):
        self.level -= 1
        if self._tag_open:
            self._write(' />')
            self._tag_open = False
            return

        if self._allow_N:
            self._write('\n')
            indent = self.elem_indent * self.level
        else:
            indent = ''
        self._write('%s</%s>' % (indent, name))
        self._allow_N = True

    def characters(self, content):
        if self._tag_open:
            self._write('>')
            self._tag_open = False

        if content.strip():
            self._allow_N = False
            self._write(escape(content))
        else:
            self._allow_N = True


if __name__ == '__main__':
    parser = parse('test.xsl', MyHandler(stdout, stdout.encoding))
newtover
@newtover: Can you provide a little more information? I'm sure that's true, but if I knew how, I wouldn't have asked the question. :)
Scott Stafford
+2  A: 

Try Tidy over on SourceForge. Although its often used on [X]HTML, I've used it successfully on XML before - just make sure you use the -xml option.

http://tidy.sourceforge.net/docs/tidy_man.html

Tidy reads HTML, XHTML and XML files and writes cleaned up markup. ... For generic XML files, Tidy is limited to correcting basic well-formedness errors and pretty printing.

People have ported to several platforms and it available as an executable and callable library.

Tidy has a heap of options including:

http://tidy.sourceforge.net/docs/quickref.html#indent-attributes

indent-attributes
Top Type: Boolean
Default: no Example: y/n, yes/no, t/f, true/false, 1/0
This option specifies if Tidy should begin each attribute on a new line.

One caveat:

Limited support for XML

XML processors compliant with W3C's XML 1.0 recommendation are very picky about which files they will accept. Tidy can help you to fix errors that cause your XML files to be rejected. Tidy doesn't yet recognize all XML features though, e.g. it doesn't understand CDATA sections or DTD subsets.

But I suspect unless your XML is really advanced, the tool should work fine.

Bert F
+2  A: 

Here's a small C# sample, which can be used directly by your code, or built into an exe and called at the comand-line as "myexe from.xml to.xml":

    static void Main(string[] args) {
        XmlWriterSettings settings = new XmlWriterSettings {
            NewLineHandling = NewLineHandling.Entitize,
            NewLineOnAttributes = true, Indent = true, IndentChars = "  ",
            NewLineChars = Environment.NewLine
        };
        using (XmlReader reader = XmlReader.Create(args[0]))
        using (XmlWriter writer = XmlWriter.Create(args[1], settings)) {
            writer.WriteNode(reader, false);
            writer.Close();
        }
    }

Sample input:

<Node><ChildNode value1='5' value2='6' value3='happy' /></Node>

Sample output (note you can remove the <?xml ... with settings.OmitXmlDeclaration):

<?xml version="1.0" encoding="utf-8"?>
<Node>
  <ChildNode
    value1="5"
    value2="6"
    value3="happy" />
</Node>

Note that if you want a string rather than write to a file, just swap with StringBuilder:

StringBuilder sb = new StringBuilder();
using (XmlReader reader = XmlReader.Create(new StringReader(oldXml)))
using (XmlWriter writer = XmlWriter.Create(sb, settings)) {
    writer.WriteNode(reader, false);
    writer.Close();
}
string newXml = sb.ToString();
Marc Gravell
This was what I needed, thanks. I didn't know the XmlWriterSettings existed, I was working with an XDocument which has a .Save and a very limited set of SaveOptions... once I knew C# had the XmlWriterSettings object and (obviously) NewLineOnAttributes, I was good to go.
Scott Stafford
+1  A: 

Here's a PowerShell script to do it. It takes the following input:

<?xml version="1.0" encoding="utf-8"?>
<Node>
    <ChildNode value1="5" value2="6" value3="happy" />
</Node>

...and produces this as output:

<?xml version="1.0" encoding="utf-8"?>
<Node>
  <ChildNode
    value1="5"
    value2="6"
    value3="happy" />
</Node>

Here you go:

param(
    [string] $inputFile = $(throw "Please enter an input file name"),
    [string] $outputFile = $(throw "Please supply an output file name")
)

$data = [xml](Get-Content $inputFile)

$xws = new-object System.Xml.XmlWriterSettings
$xws.Indent = $true
$xws.IndentChars = "  "
$xws.NewLineOnAttributes = $true

$data.Save([Xml.XmlWriter]::Create($outputFile, $xws))

Take that script, save it as C:\formatxml.ps1. Then, from a PowerShell prompt type the following:

C:\formatxml.ps1 C:\Path\To\UglyFile.xml C:\Path\To\NeatAndTidyFile.xml

This script is basically just using the .NET framework so you could very easily migrate this into a C# application.

NOTE: If you have not run scripts from PowerShell before, you will have to execute the following command at an elevated PowerShell prompt before you will be able to execute the script:

Set-ExecutionPolicy RemoteSigned

You only have to do this one time though.

I hope that's useful to you.

Damian Powell