tags:

views:

186

answers:

3

I am loading a specific web page in a WebBrowser control. Is there a way to take the following HTML that would be located within this page and save it as a string and trim it down?

Here's an example:

HTML Snippet:

<div class="alertText">26 friends joined</div>

Trimmed:

26

I'm sorry for the very vague description, but I'm not really sure how to word this. Thank you.

A: 

Do you mean something like this:

string numberOfFriends;

HtmlElementCollection elems = webBrowser1.Document.GetElementsByTagName( "div" );
foreach( HtmlElement elem in elems )
{
  string className = elem.GetAttribute( "className" );
  if( !string.IsNullOrEmpty( className ) && "alertText".Equals( className ) )
  {
    string content = elem.InnerText;
    if( Regex.IsMatch( content, "\\d+ friends joined" ) )
    {
      numberOfFriends = Regex.Match( content, "(\\d+) friends joined" ).Groups[ 1 ].Value;
    }
  }
}

I am not entirely sure if Regex are totally correct, but the rest should work.

Edit: Changed Groups[ 0 ] to Groups[ 1 ] - IIRC first group is entire match.

Edit 2: Changed elem.GetAttribute( "class" ) to elem.GetAttribute( "className" ) - fixed name of attribute and fixed variable name (class to className).

Majkel
Doesn't seem to work.
Nate Shoffner
Which part? Class is a reserved word, I will check the rest when I will be at my computer.
Majkel
OK, now it works - attribute name was wrong.
Majkel
+1  A: 

Why not just search the HTML with regex right off the bat, instead of enumerating HtmlElement types?

html = WebBrowser1.Document.documentElement.OuterHTML
pattern = @'<div class="alertText">(\d{1,2}) friends joined</div>'
for Match m in Regex.Matches(html, pattern) {
    friendsJoined = Convert.ToInt32(m.Groups[1].Value)
}

If you wanted the scraping to be less dependent on the HTML you could drop the outerbits...

html = WebBrowser1.Document.documentElement.OuterHTML
pattern = @'>(\d{1,2}) friends joined</'
for Match m in Regex.Matches(html, pattern) {
    friendsJoined = Convert.ToInt32(m.Groups[1].Value)
}
T. Stone
Doesn't seem to work.
Nate Shoffner
Need a little more detail than that.
T. Stone
There is no `documentElement` property in `WebBrowser` - you would have to use either `webBrowser1.Document.Body.OuterHTML` or use unmanaged mshtml interface with `webBrowser1.Document.DomDocument`.
Majkel
A: 

I would say that this is a better regex match;

html = WebBrowser1.Document.documentElement.OuterHTML
pattern = @'(\d+)\sfriends\sjoined'
for Match m in Regex.Matches(html, pattern) {
    friendsJoined = Convert.ToInt32(m.Groups[1].Value)
}
Casper Broeren