views:

377

answers:

2

I've got two documents - one is a custom XML file format, the other is an RSS feed with a bunch of custom extensions. I want to fill in fields in the XML file with values found in the RSS feed when one element value matches.

This is for an offline process that will be run a few times manually - it doesn't need to perform well, be all that fault tolerant, etc. Manual labor or intervention is fine.

My master XML document looks like this:

 <videos>
  <video>
   <title>First Video</title>
   <code>AAA123</code>
   <id>decaf-decaf-decaf-decaf</id>
   <description>lots of text here...</description>
  </video>
  <video>
   <title>Second Video with no code</title>
   <code></code>
   <id>badab-badab-badab-badab</id>
   <description>lots of text here...</description>
  </video>
 </videos>

The RSS feed is standard RSS with some extra field:

  <ns:code>AAA123</ns:code>
  <ns:type>Awesome</ns:type>
  <ns:group>Wonderful</ns:group>

I'd like to pull the extra fields from the RSS document in to the XML document when the value matches the value:

 <videos>
  <video>
   <title>First Video</title>
   <code>AAA123</code>
   <id>decaf-decaf-decaf-decaf</id>
   <description>lots of text here...</description>
   <type>Awesome</type>
   <group>Wonderful</group>
  </video>
  <video>
   <title>Second Video with no code</title>
   <code></code>
   <id>badab-badab-badab-badab</id>
   <description>lots of text here...</description>
   <type></type>
   <group></group>
  </video>
 </videos>

I'd most like to use c#, LINQ, or some kind of Excel-fu. I guess if I had to I could deal with XSLT as long as it doesn't involve me writing much XSLT myself.

I looked at this question, but it didn't seem all that helpful for what I'm trying to do: http://stackoverflow.com/questions/80609/merge-xml-documents

+4  A: 

Sounds like a job for LINQ to XML!

var vidDoc = XDocument.Parse(vidXml);
var rssDoc = XDocument.Parse(rssXml);
var videos = vidDoc.XPathSelectElements("/videos/video");
var rssItems = rssDoc.XPathSelectElements("/rss/channel/item");
var matches = videos.Join(
    rssItems,
    video => video.Element(XName.Get("code")).Value,
    rssItem => rssItem.Element(XName.Get("code", "http://test.com")).Value,
    (video, item) => new {video, item});

foreach (var match in matches)
{
    var children = match.item.Elements()
        .Where(child => child.Name.NamespaceName == "http://test.com" &&
                        child.Name.LocalName != "code");

    foreach (var child in children)
    {
        //remove the namespace
        child.Name = XName.Get(child.Name.LocalName);
        match.video.Add(child);
    }
}

vidDoc.Save(Console.Out);

The above solution assumes that the RSS document looks something like this:

<rss xmlns:ns="http://test.com" version="2.0">
  <channel>
    <item>
      <title>AAA123</title>
      <link>http://test.com/AAA123&lt;/link&gt;
      <pubDate>Sun, 26 Jul 2009 23:59:59 -0800</pubDate>
      <ns:code>AAA123</ns:code>
      <ns:type>Awesome</ns:type>
      <ns:group>Wonderful</ns:group>
    </item>
  </channel>
</rss>
Nathan Baulch
This looks great! I'll test tomorrow morning and mark as answer then.
Jon Galloway
+1  A: 

Add this to an XSLT identity transform (you'll also need to add the namespace declaration for the http://test.com namespace to the transform's top-level element):

<xsl:variable name="rss" select="document('rss.xml')"/>

<xsl:template match="video">
   <xsl:apply-templates select="@* | node()"/>
   <xsl:apply-templates select="$rss/rss/channel/item[ns:code=current()/code]"/>
</xsl:template>

<!-- this keeps the code element from getting copied -->
<xsl:template match="ns:code"/>

<!-- this will copy all of the content of the ns:* elements, not just their text -->
<xsl:template match="ns:*">
   <xsl:element name="{local-name()}">
      <xsl:apply-templates select="@* | node()"/>
   </xsl:element>
</xsl:template>

If you've already read the RSS into an XmlDocument in your program, you can pass it into the XSLT as a parameter instead of using the document() function to read it.

Robert Rossney
Thanks, this looks good. I'm going to stick with the c# approach since I'm more comfortable with it and it's easier (for me) to debug.
Jon Galloway