tags:

views:

85

answers:

2

Hello all,

I am trying to match a pattern so that I can retrieve a string from a website. Here is the string in Question:

<a title="Posts by ivek dhwWaVa"
href="http://www.example.com/author/ivek/"
rel="nofollow">ivek</a>

I am trying to match the string "ivek" in between the a tag and I want to do this for each post and relate it to the number of comments.

Firstly, what is the regex I should use the above so I can use it as an example for the rest. I have nothing so far:

$content = file_get_contents('http://www.example.com');
preg_match_all("", $content, $matches);

And how I would relate the comments to the authors name as there are many other authors on the website and also their own set of comments. Do I use divs to break this up? As each set of info is wrapped around this div:

<div id="post-54" class="excerpt">

Thanks all for any help!

+3  A: 

You really shouldn't be looking to Regex to do the job:

brianreavis
It is good (or maybe bad) to see things haven't changed a bit while I have been away and those questions are still of great use.
Chas. Owens
+5  A: 

Please let me be the first to introduce you to the most famous answer on Stack Overflow.

Regular expressions are not suited to parsing HTML. You really need an HTML parser, even for what might appear to be a simple task.

I recommend something like PHP Simple HTML DOM Parser.

zombat
Grr, I was trying to find that example :P
brianreavis
Ha ha, I always just google "coding horror cthulu", and get the link from Jeff's post.
zombat
God damn! Thank you very much for that. Probably should search regex + html first! :)
Abs