tags:

views:

1525

answers:

4

I am performing a search in an XML file, using the following code:

$result = $xml->xpath("//StopPoint[contains(StopName, '$query')]");

Where $query is the search query, and StopName is the name of a bus stop. The problem is, it's case sensitive.

And not only that, I would also be able to search with non-english characters like ÆØÅæøå to return Norwegian names.

How is this possible?

+2  A: 

Non-English names should not be a problem. Just add them to your XPath. (XML is defined as using Unicode).

As for case-insensitivity, ...

XPath 1.0 includes the following statement:

Two strings are equal if and only if they consist of the same sequence of UCS characters.

So even using explicit predicates on the local-name will not help.

XPath 2 includes functions to map case. E.g. fn:upper-case


Additional: using XPath's translate function should allow case mapping to be faked in XPath 1, but the input will need to include every cased code point you and your users will ever need:

"test" = translate($inputString, "abcedfeghijklmnopqrstuvwxyz", "ABCEDFGHIJKLMNOPQRSTUVWXYZ")
Richard
Thanks. My XML file wasn't unicoded.
rebellion
As I commented below, PHP tells me that the function lower-case and upper-case can't be found.. :/
rebellion
@termserv: XML is *always* unicode. Even if your XML files are not in a Unicode-capable encoding, once in memory this will make no difference.
Richard
@Richard: An up-vote for the answer you took the "translate()" idea from would have been fair.
Tomalak
@Tomalak: I forgot, sorry, but asking for an up-vote pretty much negates it.
Richard
I know. ;-) It's also not that I would desperately need it (in fact, if you had simply credited me without up-voting it would have been okay). Maybe I should have made a smiley right away, as it wasn't meant to be aggressive or anything.
Tomalak
+3  A: 

In XPath 2.0 you can use lower-case() function, which is unicode aware, so it'll handle non-ASCII characters fine.

contains(lower-case(StopName), lower-case('$query'))

To access XPath 2.0 you need XSLT 2.0 parser. For example SAXON. You can access it from PHP via JavaBridge.

vartec
This gives me following errors:- xmlXPathCompOpEval: function lower-case not found- Unregistered function
rebellion
You're probably using XPath 1.0, this function is only available in XPath 2.0
vartec
I solved it with using translate, to convert all characters to lower-case.Thanks for your help :)
rebellion
+2  A: 

In XPath 1.0 (which is, I believe, the best you can get with PHP SimpleXML), you'd have to use the translate() function to produce all-lowercase output from mixed-case input.

For convenience, I would wrap it in a function like this:

function findStopPointByName($xml, $query) {
  $upper = "ABCDEFGHIJKLMNOPQRSTUVWXYZÆØÅ"; // add any characters...
  $lower = "abcdefghijklmnopqrstuvwxyzæøå"; // ...that are missing

  $arg_stopname = "translate(StopName, '$upper', '$lower')";
  $arg_query    = "translate('$query', '$upper', '$lower')";

  return $xml->xpath("//StopPoint[contains($arg_stopname, $arg_query)");
}

As a sanitizing measure I would either completely forbid or escape single quotes in $query, because they will break your XPath string if they are ignored.

Tomalak
A: 

In addition:

$xml->xpath("//StopPoint[contains(StopName, '$query')]");

You will need to strip out any apostrophe characters from $query to avoid breaking your expression.

In XPath 2.0 you can double-up the quote being used in the delimiter to put that quote into a string literal, but in XPath 1.0 it's impossible to include the delimiter in the string.

bobince