views:

597

answers:

3

Hello

I have the following code

        <?php
        $doc = new DOMDocument;
        $doc->loadhtml('<html>
                       <head> 
                        <title>bar , this is an example</title> 
                       </head> 
                       <body> 
                       <h1>latest news</h1>
                       foo <strong>bar</strong> 
                      <i>foobar</i>
                       </body>
                       </html>');


        $xpath = new DOMXPath($doc);
        foreach($xpath->query('//*[contains(child::text(),"bar")]') as $e) {
              echo $e->tagName, "\n";
        }

Prints

       title
       strong
       i

this code finds any HTML element that contains the word "bar" and it matches words that has "bar" like "foobar" I want to change the query to match only the word "bar" without any prefix or postfix

I think it can be solved by changing the query to search for every "bar" that has not got a letter after or before or has a space after or before

this code from a past question here by VolkerK

Thanks

+2  A: 

You can use the following XPath Query

$xpath->query("//*[text()='bar']");

or

$xpath->query("//*[.='bar']");

Note using the "//" will slow things down, the bigger you XML file is.

null
Thanks but this does not work, it prints: "strong" whilst it should prints "strong" and "title" because the word "bar" is in the title as well
ahmed
I thought you just wanted to match just "bar" now I see you want it to match "bar" or "this bar now" but *not* "this foobar now".
null
A: 

You can use matches and a regex instead of contains and a string, like so

$xpath->query('//*[matches(child::text(),"(^| )bar( |$)")]')
Jory
thanks but is an error and I think we should use contains because we are searching for the container of the text not the text itself
ahmed
A: 

If you are looking for just "bar" with XPath 1.0 then you'll have to use a combo of functions, there are no regular expressions in XPath 1.0.

$xpath->query("//*[
                starts-with(., 'bar') or 
                contains(., ' bar ') or  
                ('bar' = substring(.,string-length(.)-string-length('bar')+1))
              ]");

Basically this is saying locate strings that start-with 'bar' or contains ' bar ' (notice the spaces before and after) or ends-with 'bar' (notice that ends-with is an XPath 2.0 function, so I substituted code which emulates that function from a previous Stackoverflow Answer.)

if the contains ' bar ' is not enough, because you may have "one bar, over" or "This bar. That bar." where you may have other punctuation after the 'bar'. You could try this contains instead:

contains(translate(., '.,[]', ' '), ' bar ') or

That translates any '.,[]' to a ' ' (single space)... so "one bar, over" becomes "one bar over", thus would match " bar " as expected.

null