views:

97

answers:

3
<div> 

      <a href="http://website/forum/f80/ThreadLink-new/" id="thread_gotonew_565407"><img class="inlineimg" src="http://website/forum/images/buttons/firstnew.gif" alt="Go to first new post" border="0" /></a> 



      [MULTI]
      <a href="http://website/forum/f80/ThreadLink/" id="thread_title_565407" style="font-weight:bold">THREAD TITLE</a> 

     </div>

I know for a fact that the link I am interested in is gonna be bold:

font-weight:bold

But the link itself comes before. How would I be able to match both the link address:

http://website/forum/f80/ThreadLink/

and the thread title:

THREAD TITLE

EDIT: Internet Explorer HTML code is very different:

  <A style="FONT-WEIGHT: bold" id=thread_title_565714 
      href="http://LinkAddress-565714/"&gt;ThreadTitle&lt;/A&gt; </DIV>
+2  A: 

Try this:

ThreadTitle

<A style="FONT-WEIGHT: bold" id=(?<id>.*?)[\s\S]*? href="(?<url>.*?)">(?<title>.*?)</A>

So you can use:

Regex link = new Regex(@"<A style=""FONT-WEIGHT: bold"" id=(?<id>.*?)[\s\S]*? href=""(?<url>.*?)"">(?<title>.*?)</A>");
foreach (Match match in link.Matches(input))
{
    Console.WriteLine(
        "Id={0}, Url={1}, Title={2}",
        match.Groups["id"].Value,
        match.Groups["url"].Value,
        match.Groups["title"].Value);
}
Rubens Farias
Thanks, also if the link was: linkaddress-id, would it be possible to fit it to the regex match, so I have an additional groip without breaking the other groups? So fulllink, title, linkid (numerics after -: linkaddress-1234)
Joan Venge
please see edited answer; thats it?
Rubens Farias
Thanks Ruben, will check it out now.
Joan Venge
Hi Rubens, it looks like my html format is VERY different for chrome (my post) from IE which is what Winforms browser control is using internally. So I added the new one to the post. I am trying to change it using your pattern, but so far, it has 0 matches. If you can add it your post, I would appreciate it, thanks.
Joan Venge
I edited my answer, please check it again
Rubens Farias
+4  A: 
.*<a href="(.*?)".*style="font-weight:bold">(.*?)</a>

Match group 1: Url Match group 2: Thread Title

This will match any bold link. If you want to match a particular one, replace the (.*?) with those values.

skalburgi
Thanks, also if the link was: linkaddress-id, would it be possible to fit it to the regex match, so I have an additional groip without breaking the other groups? So fulllink, title, linkid (numerics after -: linkaddress-1234)
Joan Venge
I don't think we need the initial .* or either of the question mark symbols
Joel
Rubens pointed out that the question mark symbols are necessary. Please disregard my comment.
Joel
A: 
<a href="([^"]*)"[^>]*style="[^"]*font-weight:bold[^"]*"[^>]*>([^<]*)</a>

Much the same as the previous answers, except I've replaced their .* with [^"]* etc. In the first match, this prevents it from matching anything outside the next double-quote symbol. Without doing this, if you could match too much in cases where the input looked like this:

<a href="#dont_match_me">Don't match me</a><br/>
<a href="http://website/forum/f80/ThreadLink/ style="font-weight:bold">THREAD TITLE</a>
Joel
"(.*?)" its a non greedy match pattern; it means: "find a quote and grab smallest text piece before next quote"; so, in this case, its same than your [^"]*
Rubens Farias
Just to clarify, [^"] means any character except a double-quote, whereas . means any character. This means the dot could match the last double-quote - you could end up matching too much
Joel
Oh, ok. Thanks Rubens.
Joel