A: 

This isn't really a full answer but...

The important thing to note is that the attribute xml:lang doesn't have a colon in it. The attribute 'lang' is the 'xml' namespace which is not quite the same thing. The xml namespace is (in some ways) 'built-in'.

Secondly, I think you probably mean:

'/html[boolean(string(normalize-space(@xml:lang))) = true()]'

as truth and falsehood are not strings in xpath.

Now, I've run the following script in perl, using XML::LibXML and it works just fine:

#!/usr/bin/perl

use strict;
use warnings;
use XML::LibXML;

my $parser = XML::LibXML->new;
my $xml = $parser->parse_file('test.html');

my ($node) = $xml->findnodes('/html[boolean(string(normalize-space(@xml:lang))) = true()]');


print $node->textContent, "\n";

using this as my input:

<?xml version='1.0'?>
<html xml:lang='en-uk'>
        <head><title>boo</title></head>
        <body><p>boo</p></body>
</html>

That prints out the expected output ("boo\nboo").

I wonder if you are using a parser that isn't fully namespace aware. Also, what do you mean by 'works'? Are you trying to find out if an html element has an xml:lang attribute?

If you are, this would probably be a better statement:

'/html[@xml:lang]'
Nic Gibson
yep thats exactly what I mean. I'm using PHPs domxpath so I'm surprised your query works and mine doesn't. Maybe its a bug in php's library???
EddyR