views:

1951

answers:

8

I'm designing a system which is receiving data from a number of partners in the form of CSV files. The files may differ in the number and ordering of columns. For the most part, I will want to choose a subset of the columns, maybe reorder them, and hand them off to a parser. I would obviously prefer to be able to transform the incoming data into some canonical format so as to make the parser as simple as possible.

Ideally, I would like to be able to generate a transformation for each incoming data format using some graphical tool and store the transformation as a document in a database or on disk. Upon receival of data, I would apply the correct transformation (never mind how I determine the correct transformation) to get an XML document in a canonical format. If the incoming files had contained XML I would just have created an XSLT document for each format and been on my way.

I've used BizTalk's Flat File XSLT Extensions (or whatever they are called) for something similar in the past, but I don't want the hassle of BizTalk (and I can't afford it either) on this project.

Does anyone know if there are alternative technologies and/or XSLT extensions which would enable me to achieve my goal in an elegant way?

I'm developing my app in C# on .NET 3.5 SP1 (thus would prefer technologies supported by .NET).

A: 

I think you need something like this (sorry, not supported by .NET but code is very simple)

http://csv2xml.sourceforge.net

Dmitry Khalatov
A: 

You can also take a look at altova's MapForce

Aaron Fischer
Altova MapForce wants to generate code to go from a flat file to an XML file. It doesn't do it purely with XSLT.
Stimy
+1  A: 

XSLT provides new features that make it easier to parse non-XML files.

Andrew Welch posted an XSLT 2.0 example that converts CSV into XML

Mads Hansen
A: 

IIRC someone has created a "LINQ to CSV" library that might be a starting point to create the intermediate XML (in memory) as input into the transform.

Found it here.

Richard
A: 

I have found 2 potential solutions when looking into a similar problem space.

Progress Software has a set of tools and API (.Net), which when used in conjuction with .conv (flat to XML converter) files created in their Stylus Studio tool allows for transformation of any pre-defined flat file format into XML at run time. More info here: http://www.datadirect.com/developer/data-integration/tutorials/converter-sample-code/index.ssp

In addition there is an XML format called XFLAT which allows for the description of flat files in a variety of formats, delimited, fixed width etc... There is a java program which will convert flat files, where you've provied the XFLAT description into XML so that you can continue with a standard XML to XML XSLT transformation. More details can be found here: http://www.unidex.com/overview.htm

I have never actually used either of these tools, but found them when researching a similar problem.

Stimy
A: 

You might try LINQ to CSV. There is one offering from Microsoft's Eric White and another from Matt Perdeck. Others are out there...

rasx
A: 

Check out this article on implementing an XmlReader that processes non-XML input. It's not a terrifically difficult task, and once you've got it working you don't need to use an XSLT-like technology, you can just use XSLT.

Robert Rossney
A: 

this will parse the output from the linux ip route list command. It's just what I had laying around.

you must wrap the output from the comman in an element called 'output' and the style sheet will take it from there. The real key here is the tokenize command in the xpath 2.0 spec. I don't know how you could do this before that. Also this doesn't make a single root element, as that was not what I needed it for. In your case, instead spliting on space, Id spli on a ','

<?xml version="1.0" encoding="UTF-8"?>

<xsl:output method="xml" indent="yes" />

<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()" />
    </xsl:copy>
</xsl:template>

<xsl:template match="//output">
    <!-- split things up for each new line -->
    <xsl:variable name="line" select="tokenize(.,'\n')"/>
    <xsl:for-each select="$line">                        
        <!-- split each line into peices based on space -->
        <xsl:variable name="split" select="tokenize(.,' +')"/>
        <xsl:if test="count($split) &gt; 1">
            <xsl:element name="route">                                        
                <xsl:for-each select="$split">
                    <xsl:choose>
                        <xsl:when test="position() = 1">
                            <xsl:attribute name="address" select="."/>
                        </xsl:when>
                        <xsl:otherwise>
                            <xsl:variable name="index" select="position()"/>
                            <xsl:variable name="fieldName" select="."/>
                            <xsl:if test="$fieldName and position() mod 2 = 0">
                                <xsl:attribute name="{$fieldName}" select="$split[$index + 1]"/>
                            </xsl:if>
                        </xsl:otherwise>
                    </xsl:choose>
                </xsl:for-each>
            </xsl:element>
        </xsl:if>
    </xsl:for-each>
</xsl:template>

Jeremiah Jahn