tags:

views:

79

answers:

2

I have an xml file like this:

<volume name="Early">
<book name="School Years">
<chapter number="1">
<line number="1">Here's the first line with Chicago in it.</line>
<line number="2">Here's a line that talks about Atlanta</line>
<line number="3">Here's a line that says chicagogo </line>
</chapter>
</book>
</volume>

I'm trying to do a simple keyword search using PHP that finds the word and displays the line it was in. I have this working

$xml = simplexml_load_file($data);
$keyword = $_GET['keyword'];
$kw=$xml->xpath("//line[contains(text(),'$keyword')]");
...snip...

echo $kw[0]." is the first returned item";

However, using this technique, a user must search for 'Chicago' and not 'chicago', or the search will return nothing.

I understand I need to use the translate function but all my trial and error has been in vain.

I've tried:

$upper = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
$lower = "abcdefghijklmnopqrstuvwxyz";
$kw = $xml->xpath("line[contains(text(),'translate('$keyword','$upper','$lower'))]");

but nothing seems to work. any tips?

+1  A: 

See salathe's answer on how to do it with SimpleXml and translate().

As an alternative/added option to using XPath functions, you can use any PHP function as of PHP5.3, including self defined, in XPath Expressions when using DOM. I am not sure the same is available in SimpleXml.

// create a DOMDocument and load your XML string into it
$dom = new DOMDocument;
$dom->loadXML($xml);

// create a new Xpath and register PHP functions as XPath functions
$xPath = new DOMXPath($dom);
$xPath->registerNamespace("php", "http://php.net/xpath");
$xPath->registerPHPFunctions();

// Setup the query
$keyword = 'chicago';
$q = "//line[php:functionString('stripos', text(), '$keyword')]";
$nodes = $xPath->query($q);

// Iterate the resulting NodeList
foreach($nodes as $node) {
    echo $node->nodeValue, PHP_EOL;
}

This will output

Here's the first line with Chicago in it.
Here's a line that says chicagogo

For more details, see @salathes blog entry and the PHP Manual.

Gordon
+1 for hinting at being able to use PHP-land functions within XPath queries (and linking to my blog!). :)
salathe
@salathe out of curiosity: do you know if there any function that would allow me to use the DOMNodeList like I would use an array in array_map or an iterator in iterator_apply? Short of using `$xpath->query('//book[php:function("callback", author)]');`?
Gordon
"I am not sure the same is available in SimpleXml." - not directly, but there is nothing to stop folks mixing and matching DOM/SimpleXML classes. :)
salathe
@Gordon('s comment) - You could wrap the `DOMNodeList` in an `IteratorIterator`, and use `iterator_apply` on that.
salathe
@salathe I did say that? Wow? When? Being 30+ is a pain. You forget things constantly. :D
Gordon
Thanks for this Gordon, I used salathe's response above but you've given me another approach which I will study closely!
dijon
+2  A: 

Gordon's recommendation to use a PHP function from within XPath will prove more flexible should you choose to use that. However, contrary to his answer, the translate string function is available in XPath 1.0 so that means you can use it; your problem is how.

First, there is the obvious typo that Charles pointed out in his comment to the question. Then there is the logic of how you're trying to match the text values.


In word form, you are currently asking, "does the text contain the lowercase form of the keyword?" This is not really what you want to be asking. Instead, ask, "does the lowercase text contain the lowercase keyword?" Translating (pardon the pun) that back into XPath-land would be:

(Note: truncated alphabets for readability)

//line[contains(translate(text(),'ABC...Z','abc...z'),'chicago')]

The above lowercases the text contained within the line node then checks that it (the lowercased text) contains the keyword chicago.


And now for the obligatory code snippet (but really, the above idea is what you really need to take home):

$xml    = simplexml_load_file($data);
$search = strtolower($keyword);
$nodes  = $xml->xpath("//line[contains(translate(text(), 'ABCDEFGHJIKLMNOPQRSTUVWXYZ', 'abcdefghjiklmnopqrstuvwxyz'), '$search')]");

echo 'Got ' . count($nodes) . ' matches!' . PHP_EOL;
foreach ($nodes as $node){
   echo $node . PHP_EOL;
}

Edit after dijon's comment

Inside the foreach, you could access the line number, chapter number and book name like below.

Line number -- this is just an attribute on the <line> element which makes accessing it super-easy. There are two ways, with SimpleXML, of accessing it: $node['number'] or $node->attributes()->number (I prefer the former).

Chapter number -- to get at this, as you rightly said, we need to traverse up the tree. If we were using the DOM classes, we would have a handy $node->parentNode property leading us directly to the <chapter> (since it is the immediate ancestor to our <line>). SimpleXML does not have such a handy property, but we can use a relative XPath query to get it. The parent axis allows us to traverse up the tree.

Since xpath() returns an array we can cheat and use current() to access the first (and only) item in the array returned from it. Then it is just a matter of accessing the number attribute as above.

// In the near future we can use: current(...)['number'] but not yet
$chapter = current($node->xpath('./parent::chapter'))->attributes()->number;

Book name -- the process for this is the same as that of accessing the chapter number. A relative XPath query from the <line> could make use of the ancestor axis like ./ancestor::book (or ./parent:chapter/parent::book). Hopefully you can figure out how to access its name attribute.

salathe
Thanks for the detailed explanation of how it works, in addition to the code snippet. Exactly what I was looking for! I have been using mainly simpleXML for this project but it's nice to have Gordon's answer below to compare.
dijon
One thing I would LOVE to know :) is within that foreach clause, How would I also go about listing the line-number, chapter-number and book name? I believe this is also xpath based on current node and navigating up the tree?for instance, (from first XML example) I'd like to search for 'atlanta' and receive: School Years, Chapter 1: Here's a line that talks about Atlanta.once again, trial and error has been tying me in knots!
dijon
@dijon see my edit
salathe
@salathe - thanks again! Since I'm working with simpleXML I never would have stumbled upon the 'current()' "cheat" as you call it. I knew about the axes but could never figure out how to describe the starting point. I always appreciate getting an explanation along with code. That way I learn something!
dijon