tags:

views:

22

answers:

3
+1  A: 

I would advise you to try an HTML parser rather than using regexps. It's going to be less error prone for all but the simplest cases (due to HTML not being regular, and so not a suitable candidate for regular expressions).

Brian Agnew
Brian, you are, of course, right. I've used the HTML Agility Pack parser in the past with great success: http://htmlagilitypack.codeplex.com/
Hristo Deshev
I see. But can the HtnlAgility Pack do the same task. I mean is it as powerfull as regex ?
Joseph Ghassan
@Joseph: Brace yourself for this news: a full-blown parser is _WAAAYYY_ more powerful than regex. _WAAAAYYYYYY_ more.
polygenelubricants
A: 

You don't state clearly if you'll have other (unwanted) <a> tags, but to get all <a> beginnings, you could try a regex like "<a[^>]*>".

Hristo Deshev
Works great! thanks. How can I do the same thing using HTMLAgility Pack. I mean Regex is more powerfull I guess.
Joseph Ghassan
Hristo Deshev
A: 

Regex is not the best tool of the job, but you can in fact use regex to match strings in this pattern:

<a href="News_ViewStory\.asp\?NewsID=\d{4}">

As a @-quoted C# string literal, this is:

@"<a href=""News_ViewStory\.asp\?NewsID=\d{4}"">"

The \d is the shorthand for the digit character class. {4} is exact finite repetition. Thus, \d{4} means "exactly 4 digits".

If you want to allow a different numeric pattern, you may use e.g. \d{2,6}. This allows anywhere between 2 and 6 digits, inclusive. You can also use \d+ to allow at least one digit, with no upper bound.

Note that the . and the ? are preceded by backslash in the above pattern. That's because they are regex metacharacters that have special meanings (the [dot] matches (almost) any character, the ? is optional repetition specifier. Escaping gets rid of these special meanings, and they become literal period and question mark.

Whether or not strings in these patterns are exactly the HTML tags that what you want is an entirely different issue. Parsing HTML with regex is generally not recommendable.

polygenelubricants