tags:

views:

70

answers:

1

hi...i want to extract url from href of a webpage...for that i m using the regex pattern as "(?(http:[/][/]|www.)([a-z]|[A-Z]|[0-9]|[/.]|[~])*)"

to extract the href from html i used this pattern @"href=\""(?[^\""#]?(?=[\""#]))(?(?#{2}[^#]?#{2})*)(?#[^""]+)?"""

but the problem is...it do not extract urls from the href but urls like "www.seo-sem.com"..and in the result i only get.."www.seo"...after the hyphen it gets truncated...plz could u sugest a better regex pattern to extract url from href..will be thankful to u...

+4  A: 

Use the HTML Agility Pack to parse your HTML. You can query it using Xpath, as it parses the HTML into a XmlDocument like object.

See this for reasons not to parse HTML with regular expressions.

Oded
i resolved the hyphen issue...edited regex..thanks anyways..u all rock..keep it up
jaskirat