Hi There,
So i'm looking to scrape rapidshare.com links from websites. I have the following regular expressions to find links:
<a href=\"(http://rapidshare.com/files/(\\d+)/(.+)\\.(\\w{3,4}))\"
http://rapidshare.com/files/(\\d+)/(.+)\\.(\\w{3,4})
How can I write a regex that will exclude text that is embedded in a <a href="...">
tag. and only capture the text in >here</a>
I also have to bare in mind that not all links are embedded in href tags. Some are just displayed in plain text.
Basically is there a wway to exclude patterns in regex ?
Thanks.