tags:

views:

1737

answers:

6

I am having an XML file

<?xml version="1.0" encoding="ISO-8859-1"?>
<Results>
    <Row>
     <COL1></COL1>
     <COL2>25.00</COL2>
     <COL3>2009-07-06 15:49:34.984</COL3>
     <COL4>00001720</COL4>
    </Row>
    <Row>
     <COL1>RJ</COL1>
     <COL2>26.00</COL2>
     <COL3>2009-07-06 16:04:16.156</COL3>
     <COL4>00001729</COL4>
    </Row>
    <Row>
     <COL1>SD</COL1>
     <COL2>28.00</COL2>
     <COL3>2009-07-06 16:05:04.375</COL3>
     <COL4>00001721</COL4>
    </Row> 
</Results>

I have to convert this XML into CSV file. I have heard we can do such thing using XSLT. How can i do this in Java ( with/without XSLT )?

+1  A: 

With XSLT you can use the JAXP interface to the XSLT processor and then use <xsl:text> in your stylesheet to convert to text output.

<xsl:text>&#10;</xsl:text>

generates a newline. for example.

Gerco Dries
+1  A: 

Read the XML file in.

Loop throught each record and add it to a csv file.

Derek Organ
Agreed. Using XSLT in this situation is overkill. No need to learn a new language just to output CSV from a format this simple.
Welbog
and how we will do that Derek
Rakesh Juyal
I don't think this is desperately helpful if you're not familiar with the available XML apis :-(
Brian Agnew
You forgot escaping (what if there is a comma in the data?).
bortzmeyer
+4  A: 

In pseudo code:

loop through the rows:
    loop through all children of `Row`:
        write out the text
        append a comma
    new line

That quick little loop will write a comma at the end of each line, but I'm sure you can figure out how to remove that.

For actually parsing the XML, I suggest using JDOM. It has a pretty intuitive API.

geowa4
I think the issue is the understanding of how to parse the XML, not so much the writing of the resultant values
Brian Agnew
now with parser suggestion goodness.
geowa4
You forgot escaping (what if there is a comma in the data?).
bortzmeyer
hey, i cant do everything for him. good comment though
geowa4
+1  A: 

Use the straightforward SAX API via the standard Java JAXP package. This will allow you to write a class that receives events for each XML element your reader encounters.

Briefly:

  1. read your XML in using SAX
  2. record text values via the SAX DefaultHandler characters() method
  3. when you get an end event for a COL, record this string value
  4. when you get the ROW end event, simply write out a comma separated line of previously recorded values
Brian Agnew
@Brian: If it is possible, please give the example.
Rakesh Juyal
I'd have a look at the tutorials linked, and implement a simple DefaultHandler. When you run it, you'll see (in a debugger, or via print outs) how the event methods are called, and that should make it clear. Sorry - I can't easily post a sample
Brian Agnew
@Tomalak - did you comment on the wrong answer ?
Brian Agnew
Oops, you are right. Sorry. Deleting now. :-)
Tomalak
+4  A: 

Using XSLT is often a bad idea. Use Apache Commons Digester. It's fairly easy to use - here's a rough idea::

Digester digester = new Digester();

digester.addObjectCreate("Results/Row", MyRowHolder.class);
digester.addCallMethod("Results/Row/COL1","addCol", 0);
// Similarly for COL2, etc.
digester.parse("mydata.xml");

This will create a MyRowHolder instance (where this is a class you provide). This class would have a addCol() method which would be called for each <COLn> with the contents of that tag.

Vinay Sajip
"Using XSLT is often a bad idea" - May I ask why? :)
Tomalak
(a) performance, (b) hard to debug.
Vinay Sajip
Plus, the poster asked how to do it in Java WITHOUT XSLT. So I'm not sure why I got down-voted :-(
Vinay Sajip
Digester is underused. +1 for this
Brian Agnew
i added +1. i think this answer is good
VP
@Vinay Sajip: Agreed, XSLT is a bit harder to debug. However, converting the above to CSV is so trivial that the required XSLT would probably not need much debugging anyway.
Tomalak
@Vinay Sajip: Also, the question was with OR without XSLT, and generally seems to be in favor of XSLT even. ;-)
Tomalak
@Tomalak - You're right, I must've read "with/without" as "without".
Vinay Sajip
+2  A: 

In XSLT 1.0:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;

  <xsl:output method="text" encoding="ISO-8859-1" />

  <xsl:template match="/Results">
    <xsl:apply-templates select="Row" />  
  </xsl:template>

  <xsl:template match="Row">
    <xsl:apply-templates select="*" />  
    <xsl:if test="not(last())">
      <xsl:value-of select="'&#10;'" />  
    </xsl:if>
  </xsl:template>

  <xsl:template match="Row/*">
    <xsl:value-of select="." />
    <xsl:if test="not(last())">
      <xsl:value-of select="','" />  
    </xsl:if>
  </xsl:template>

</xsl:stylesheet>

If your COL* values can contain commas, you could wrap the values in double quotes:

  <xsl:template match="Row/*">
    <xsl:value-of select="concat('"', ., '"')" />
    <!-- ... --->

If they can contain commas and double quotes, things could get a bit more complex due to the required escaping. You know your data, you'll be able to decide how to best format the output. Using a different separator (e.g. TAB or a pipe symbol) is also an option.

Tomalak
P.S.: I'll leave it as an exercise for the reader to find sample that shows how to use XSLT from within Java. It's not hard. :)
Tomalak