I would advise you to try an HTML parser rather than using regexps. It's going to be less error prone for all but the simplest cases (due to HTML not being regular, and so not a suitable candidate for regular expressions).
You don't state clearly if you'll have other (unwanted) <a> tags, but to get all <a> beginnings, you could try a regex like "<a[^>]*>".
Regex is not the best tool of the job, but you can in fact use regex to match strings in this pattern:
<a href="News_ViewStory\.asp\?NewsID=\d{4}">
As a @
-quoted C# string literal, this is:
@"<a href=""News_ViewStory\.asp\?NewsID=\d{4}"">"
The \d
is the shorthand for the digit character class. {4}
is exact finite repetition. Thus, \d{4}
means "exactly 4 digits".
If you want to allow a different numeric pattern, you may use e.g. \d{2,6}
. This allows anywhere between 2 and 6 digits, inclusive. You can also use \d+
to allow at least one digit, with no upper bound.
Note that the .
and the ?
are preceded by backslash in the above pattern. That's because they are regex metacharacters that have special meanings (the [dot] matches (almost) any character, the ?
is optional repetition specifier. Escaping gets rid of these special meanings, and they become literal period and question mark.
Whether or not strings in these patterns are exactly the HTML tags that what you want is an entirely different issue. Parsing HTML with regex is generally not recommendable.