views:

256

answers:

4

Hi,

For personal use i am trying to parse a little html page that show in a simple grid the result of the french soccer championship.

var Url = "http://www.lfp.fr/mobile/ligue1/resultat.asp?code_jr_tr=J01";
WebResponse result = null;
WebRequest req = WebRequest.Create(Url);
result = req.GetResponse();
Stream ReceiveStream = result.GetResponseStream();
Encoding encode = System.Text.Encoding.GetEncoding(0);
StreamReader sr = new StreamReader(ReceiveStream, encode);

                while (sr.Read() != -1)
                {
                    Line = sr.ReadLine();
                    Line = Regex.Replace(Line, @"<(.|\n)*?>", " ");
                    Line = Line.Replace("&nbsp;", "");
                    Line = Line.TrimEnd();
                    Line = Line.TrimStart();

and then i really dont have a clue either take line by line or the whole stream at one and how to retreive only the team's name with the next number that would be the score.

At the end i want to put both 2 team's with scores in a liste or xml to use it with an phone application

If anyone has an idea it would be great thanks!

+1  A: 

You could put the stream into an XmlDocument, allowing you to query via something like XPath. Or you could use LINQ to XML with an XDocument.

It's not perfect though, because HTML files aren't always well-formed XML (don't we know it!), but it's a simple solution using stuff already available in the framework.

Neil Barnwell
This assumes the HTML is well-formed XML, which is a long shot.
Erik Forbes
Ha! I just edited to make a note of that, and when the screen came back - I saw this comment!
Neil Barnwell
Our edits crossed paths like two ships passing in the night... =P
Erik Forbes
+7  A: 

Take a look at Html Agility Pack

adatapost
I was just about to suggest this.
Erik Forbes
+1 sixth Don't Parse HTML With Regex question of the day bonus
bobince
+1 - easy to use and very powerful.
TrueWill
A: 

You'll need an SgmlReader, which provides an XML-like API over any SGML document (which an HTML document really is).

Anton Gogolev
A: 

You could use the Regex.Match method to pull out the team name and score. Examine the html to see how each row is built up. This is a common technique in screen scraping.

JoeCh
And smoking is a common technique for relieving stress. It doesn't mean it's a good idea, or that it works in the long term. ;)
TrueWill
Well smoking is always bad for your health but I wouldn't say the Match method is always bad in a case like this, not sure of his needs. Its nice to know what all the options are bad or good before you make a choice.
JoeCh