ansaurus

Question

Answer 1

A:

What about something like:

/<a[^>]*([^<]*)<\/a>/siU

Mark E 2010-01-09 04:22:13

Answer 2

A:

If you must use a regex, use .*? instead of .*. *? is the non-greedy version of *; that is, rather than matching as much as possible, it matches as little as possible.

(By the way, don't try matching HTML or XML with regular expressions; that way lies madness. Instead, try using an HTML or XML parser. If you don't have an HTML parser, run it through HTML Tidy and use an XML parser. See meder's answer for how to do this in PHP.).

Brian Campbell 2010-01-09 04:22:42

I would say regex is ok for such a small and specific task (where nothing can really go wrong). But I'm probably going to get killed for saying this.

Joel L 2010-01-09 04:43:36

Clearly, something can go wrong, as he's having trouble with getting his regex to work; it's consuming too much input. And even if he fixes that, there will be tags with extra whitespace somewhere that he didn't account for, or arguments in a different order, or any number of other problems. By the time you fix you regex to account for all of those variations, it's far, far easier to just run you input through a real parser, and select your element using the XPath expression `a[@class="fetch-me"]` or CSS query `a.fetch-me` (depending on which your HTML or XML parser library supports).

Brian Campbell 2010-01-09 06:43:25

HTML and XML parsing is a solved problem. The libraries have been written. Why reinvent the wheel badly? Just use the libraries that already exist! http://docs.php.net/manual/en/class.domxpath.php

Brian Campbell 2010-01-09 06:46:44

Answer 3

A:

one way

$str= <<<A
blah blah
blah
...
<a class="fetch-me" href="products/1">Find ME!!!</a>
<a class="fetch-me" href="products/2">Me too!</a>
blah
blah
<a class="fetch-me"
          href="products/1">Find me, i am at next line!!!</a> blah blah
A;
$s = explode("</a>",$str);
foreach ($s as $k ){
    if (strpos($k,"href" ) !==FALSE ){
        print "--> ". preg_replace("/^.*href=\".*\">|\">.*/sm","",$k)."\n";
    }
}

output

$ php test.php
--> Find ME!!!
--> Me too!
--> Find me, i am at next line!!!

Ideally, you should use an actual parser, like everybody else said.

ghostdog74 2010-01-09 04:39:34

Answer 4

+3 A:

<?php

$str = '
<a class="fetch-me" href="products/1">Find ME!!!</a>
...
<a class="fetch-me" href="products/2">Me too!</a>
';

$doc = new DOMDocument();
$doc->loadHTML($str);
$xp = new DOMXpath($doc);
$query = $xp->evaluate('//a[@class="fetch-me"]');

if ( $query->length > 0 ) {
    foreach ($query as $anchor ) {
    echo $anchor->nodeValue . '<br>';
    }
}

You can also use @contains in combination with @class if multiple class values matter, you can always use an abstracted high level wrapper for DOM as well.

meder 2010-01-09 05:09:41

This is the answer. Ignore my answer (other than the part about not using regexes), and use this. I don't know PHP, so I can't write up an example of how to use their HTML parser and XPath libraries off the top of my head, but in any language, the answer is to use the HTML or XML parser that already exists in your language.

Brian Campbell 2010-01-09 07:00:10

Answer 5

A:

I've tried all of these answers and everyone's probably right. I am going to refactor to use HTML Tidy and a real parser.

Thanks for the suggestions.

Craig Gardner 2010-01-09 06:13:17

ansaurus

tags:

views:

answers:

Retrieve the text between A tags

related questions