views:

27

answers:

2

Hi there, i´m familiar with writing and reading my own XML files like e.g. for Setting but i need to read data from a huge xml file but i cant find my starting point.

- <span class="mw-headline" id="Kader_der_Saison_2010.2F11.5B51.5D">
  Kader der Saison 2010/11 
- <sup id="cite_ref-50" class="reference">
  <a href="#cite_note-50">[51]</a> 
  </sup>
  </span>
  </h3>
- <table class="wikitable" width="550px">
- <tr bgcolor="#DDDDDD">
  <th>Name</th> 
  <th>Trikot</th> 
  <th>Nationalität</th> 
  </tr>
- <tr bgcolor="#EEEEEE">
  <th colspan="3" align="left">Torwart</th> 
  </tr>
- <tr bgcolor="#FFFFFF">
- <td>
  <a href="/wiki/Manuel_Almunia" title="Manuel Almunia">Manuel Almunia</a> 
  </td>
  <td align="center">1</td> 
- <td align="center">
  <span style="display:none" class="sortkey">Spanien !</span> 
- <a href="/wiki/Datei:Flag_of_Spain.svg" class="image" title="Spanier">
  <img alt="Spanier" src="http://upload.wikimedia.org/wikipedia/commons/thumb/9/9a/Flag_of_Spain.svg/20px-Flag_of_Spain.svg.png" width="20" height="13" class="thumbborder" /> 
  </a>
  </td>
  </tr>
- <tr bgcolor="#FFFFFF">
- <td>
  <a href="/wiki/%C5%81ukasz_Fabia%C5%84ski" title="Łukasz Fabiański">Łukasz Fabiański</a> 
  </td>
  <td align="center">21</td> 
- <td align="center">
  <span style="display:none" class="sortkey">Polen !</span> 
- <a href="/wiki/Datei:Flag_of_Poland.svg" class="image" title="Pole">
  <img alt="Pole" src="http://upload.wikimedia.org/wikipedia/commons/thumb/1/12/Flag_of_Poland.svg/20px-Flag_of_Poland.svg.png" width="20" height="13" class="thumbborder" /> 
  </a>
  </td>
  </tr>

As you (maybe) can see i´m trying to read the names of all team members starting next to "Kader_der_Saison" right from the wikipedia. I need the title or text of these elements

<a href="/wiki/Manuel_Almunia" title="Manuel Almunia">Manuel Almunia</a>

to get the names Manuel Almunia, Łukasz Fabiański, etc.

I´ve tried a a couple of ways, xmldocument.GetElementById or Name, XmlReader.NoteTyp, XmlReader.MoveToNextAttribute, xmldocument.SelectNode(xpath), even tried a linq querry on the document but i dont get to the position of the names.

Any ideas how the find the "Kader_der_Saison" position and read the following <a link text?

Thanks

A: 

C# has a really cool class called XmlSerializer which essentially turns XML into an object. It can be a hassle with really deep XML files, because you have to make an object for each node, but I think it's the best thing since sliced bread.

codersarepeople
This one i use to Serialize my own Settings class but how should i use it with the whole wiki site?
Gpx
A: 

This looks like HTML, not XML. Assuming that is correct, see this question.

If it really is Xml (and someone chose really bad tag names), load it in as an XmlDocument or XPathDocument and use XPath navigation to call out the nodes by name.

I don't use XPathDocuments much, but with XmlDocument your code might look something like:

XmlDocument xDoc = new XmlDocument();
xDoc.Load(yourXml)
var nodes = xDoc.SelectNodes(nodeName);
AllenG
You´re right its a html page i parsed with SgmlReader because i tried to work with nodes (as i mentioned in my question i tried xmldocument.SelectNode(xpath) already).
Gpx