tags:

views:

229

answers:

3

I have an XML file that looks like:

  <results>
    <result>
      <title>Welcome+to+The+JASON+Project%21</title>
      <url>http%3A%2F%2Fwww.jason.org%2F</url>
      <domain />
      <inside_links>
        <inside_link>
          <description>News</description>
          <url>http%3A%2F%2Fwww.jason.org%2FPublic%2FNews%2FNews.aspx</url>
        </inside_link>
        <inside_link>
          <description>register</description>
          <url>http%3A%2F%2Fwww.jason.org%2Fpublic%2Fregistration%2Fregistration.aspx</url>
        </inside_link>
        <inside_link>
          <description>Argonauts</description>
          <url>http%3A%2F%2Fwww.jason.org%2FPublic%2FArgonauts%2FArgonauts.aspx</url>
        </inside_link>
        <inside_link>
          <description>Curriculum</description>
          <url>http%3A%2F%2Fwww.jason.org%2FPublic%2FCurriculum%2FCurriculum.aspx</url>
        </inside_link>
        <inside_link>
          <description>Credits</description>
          <url>http%3A%2F%2Fwww.jason.org%2Fpublic%2FMisc%2FCredits.aspx</url>
        </inside_link>
      </inside_links>
      <inside_keywords>National+Science+Education+Standards, National+Geographic+Society, Physical+Science, Professional+Development, Earth+Science</inside_keywords>
    </result>
  </results>

...And I'm very confused as to how to read it. I simply want to get the Title, Description, and URL into separate strings. Something like:

foreach line in lines
string title = gettitle;
string description = getdescription;
string url = geturl;

...I've read so many tutorials but all of them seem to not be relative to what i need to do.. Can somebody please help me out with this?

+6  A: 

If you are using .NET 3.5, I'd suggest using LINQ to XML...

XDocument doc = XDocument.Load(filename);
XElement insideLinks = doc.Root.Element("result").Element("inside_links");
foreach (XElement insideLink in insideLinks.Elements())
{
    string description = (string)insideLink.Element("description");
    string url = (string)insideLink.Element("url");
}

This also lets you use the built-in "query" syntax so you could do something like this...

XDocument doc = XDocument.Load(filename);
XElement insideLinks = doc.Root.Element("result").Element("inside_links");
var allTitles = from XElement insideLink 
                in insideLinks.Elements("inside_link")
                select (string)insideLink.Element("title");

(edited per comment)

Chris Vig
+1 for L2XML. Would suggest casting to string instead of .Value to avoid null issues: (string)insideLink.Element("description")
dahlbyk
Thanks for pointing that out, I didn't know that was possible. (It also led me to a Google search about overloading cast operators, which I ALSO did not know was possible in C# :D)
Chris Vig
Glad to help! Not enough libraries provide smart casts so people don't think to use them, but XElement definitely does it right (string, value and nullable types).
dahlbyk
+2  A: 

try this:

XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load("yourfile.xml");
foreach (XmlNode result in xmlDoc.SelectNodes("/results/result"))
{
    string title = result.SelectSingleNode("title").InnerText;
    string url = result.SelectSingleNode("url").InnerText;
    foreach (XmlNode insideLink in result.SelectNodes("inside_links/inside_link"))
    {
        string description = insideLink.SelectSingleNode("description").InnerText;
    }
}
Rubens Farias
than you for this, although i keep getting an error when debugging saying that there are multiple root elements in the xml file... do you know what this means?
baeltazor
your xml isnt well-formed; you must have only a root element
Rubens Farias
thank you @Rubens :) ill fix that up
baeltazor
+5  A: 

To extend the LINQ to XML suggestion, you can use a select clause to create objects to represent the parsed links:

XDocument doc = XDocument.Load(filename);
var links = from link in doc.Descendants("inside_link")
            select new
            {
                Description = (string)link.Element("description"),
                Url = HttpUtility.UrlDecode((string)link.Element("url"))
            };

foreach(var l in links)
    Console.WriteLine("<a href=\"{0}\">{1}</a>", l.Url, l.Description);

In this case, links will be a sequence of objects that have an anonymous type with Description and Url properties, with Url decoded. This foreach would show something like this:

<a href="http://www.jason.org/Public/News/News.aspx"&gt;News&lt;/a&gt;
<a href="http://www.jason.org/public/registration/registration.aspx"&gt;register&lt;/a&gt;
...
dahlbyk
thank you so much @dahlbyk, but there is an error and i have absolutely no idea what they mean (ive never done anything with linq or xml before)... can you please help me figure out what these errors mean? It says "HttpUtility does not exist in the current context." please help... +1
baeltazor
HttpUtility lives in System.Web - at the top of your file make sure you have: using System.Web;
dahlbyk
i actually did that, but still same problem...
baeltazor
You need to add a reference, System.Web.dll, to your project.
Chansik Im
yay! you did it! :D thanks lots and stuff Chansik Im :D:D:D:D:D ..very appreciated.. i was a little confused as to why you need to manually add a reference even after typing system.web, but a different question i found on s/o answered that for me.
baeltazor