tags:

views:

65

answers:

2

The two methods below each serve the same purpose: scan the content of the post and determine if at least one img tag has an alt attribute which contains the "keyword" which is being tested for.

I'm new to xPath and would prefer to use it depending on how expensive that approach is compared to the regex version...

Method #1 uses preg_match

function image_alt_text_has_keyword($post)
        {
            $theKeyword = trim(wpe_getKeyword($post));
            $theContent = $post->post_content;
            $myArrayVar = array();
            preg_match_all('/<img\s[^>]*alt=\"([^\"]*)\"[^>]*>/siU',$theContent,$myArrayVar);
            foreach ($myArrayVar[1] as $theValue)
            {
                if (keyword_in_content($theKeyword,$theValue)) return true;
            }
            return false;
        }

function keyword_in_content($theKeyword, $theContent)
        {
            return preg_match('/\b' . $theKeyword . '\b/i', $theContent);
        }

Method #2 uses xPath

function keyword_in_img_alt()
{
global $post;
$keyword = trim(strtolower(wpe_getKeyword($post)));
$dom = new DOMDocument;
$dom->loadHTML(strtolower($post->post_content));
$xPath = new DOMXPath($dom);
return $xPath->evaluate('count(//a[.//img[contains(@alt, "'.$keyword.'")]])');
}
+8  A: 

If you are parsing XML you should use XPath as it was designed exactly for this purpose. XML / XHTML is not a regular language and cannot be parsed correctly by regular expressions. You may be able to write a regular expression which works some of the time but there will be special cases where it will fail.

Mark Byers
"XPath is used to navigate through elements and attributes in an XML document." From the horses mouth (W3C).
john mossel
http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454
Mads Hansen
+1 Using regex on XML is like using a screwdriver to cut down a tree. Using XPath on XML is like using a chainsaw to cut the tree down. Both are useful, but neither can replace the other.
delnan
+1 for a good answer.
Dimitre Novatchev
+2  A: 

Using RegEx for selecting nodes in an XML document is as appropriate as using it for finding if a given number is a prime.

The fact that this is possible doesn't make it even a bit appropriate.

What is more, XPath 2.0 has RegEx support while RegEx do not have XPath support. Therefore, if both are needed, it is probably best to use XPath 2.0

Dimitre Novatchev