tags:

views:

111

answers:

3

I am using the following Expression to select all hyperlinks

//a[@href]

How can I write an expression to select all hyperlinks which match this format

http://abc.com/articles/1

here http://abc.com/articles/ is constant and the article number increases

+1  A: 

That expression looks like XPath, not a regex. A regex for that particular URL would look like

^http://abc.com/articles/\d+$

But I guess you'll have to use your xpath query to find the hyperlinks, then filter them based on the HREF attribute using that regex.

Mark
I am trying to reference it like this HtmlNodeCollection hrefs = _doc.DocumentNode.SelectNodes(@"^http://abc.com/articles/\d+$"); but it gives me an error. Any advice?
Veejay
Ack!!! What did I just say? You're mixing different kinds of expressions! You *cannot* put a regular expression in there, you have to use the *xpath query* you came up with and *then* iterate over the nodes and throw away the ones you don't want using the *regular expression*.
Mark
Actually, Pavel's solution is pretty good. It doesn't use a "regular expression" like you asked for, but you don't really need one in this case ;)
Mark
+1  A: 
<a\s.*?href=(?:["'](http://abc.com/articles/([0-9])+)["']).*?&gt;(.*?)&lt;/a&gt;

UPDATE:

If you need the xpath expression here it is:

a[starts-with(@href,'http://abc.com/articles/')]

this would return all the links which has href attribute which starts with 'http://abc.com/articles/' I hope this answers your qiestion.

Pavel Nikolov
A: 

It's a bit overkill, but this is the regex I use in my apps to find URLs in plain text:

(\b(?:(?:https?|ftp|file)://|www\.|ftp\.) (?:\([-A-Z0-9+&@#/%=~_|\$\?!:,\.]*\) |[-A-Z0-9+&@#/%=~_|\$\?!:,\.])* (?:\([-A-Z0-9+&@#/%=~_|\$\?!:,\.]*\) |[A-Z0-9+&@#/%=~_|\$]))

Brian Roach