views:

71

answers:

4

This is starting to piss me off real bad. I have this XML code:

Updated with correct namespaces

<?xml version="1.0" encoding="utf-8"?>

<Infringement xsi:schemaLocation="http://www.movielabs.com/ACNS http://www.movielabs.com/ACNS/ACNS2v1.xsd" xmlns="http://www.movielabs.com/ACNS" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"&gt;
  <Case>
    <ID>...</ID>
    <Status>Open</Status>
  </Case>
  <Complainant>
    <Entity>...</Entity>
    <Contact>...</Contact>
    <Address>...</Address>
    <Phone>...</Phone>
    <Email>...</Email>
  </Complainant>
  <Service_Provider>
    <Entity>...</Entity>
    <Address></Address>
    <Email>...</Email>
  </Service_Provider>
  <Source>
    <TimeStamp>...</TimeStamp>
    <IP_Address>...</IP_Address>
    <Port>...</Port>
    <DNS_Name></DNS_Name>
    <Type>...</Type>
    <UserName></UserName>
    <Number_Files>1</Number_Files>
    <Deja_Vu>No</Deja_Vu>
  </Source>
  <Content>
    <Item>
      <TimeStamp>...</TimeStamp>
      <Title>...</Title>
      <FileName>...</FileName>
      <FileSize>...</FileSize>
      <URL></URL>
    </Item>
  </Content>
</Infringement>

And this PHP code:

<?php 
    $data = urldecode($_POST["xml"]);
    $newXML = simplexml_load_string($data);

    var_dump($newXML->xpath("//ID"));
?>

I've dumped only $newXML and gotten tons of data, but the only xPath I've run that returned anything but an empty array was "*"

Isn't "//ID" supposed to find all ID nodes in the document? Why isn't it working?

Thanks

A: 

I'm not well-versed in PHP's XML API, but I suspect the problem lies in the namespaces. Depending on how that xpath method works, it may be searching for ID elements with an empty namespace. Your ID elements inherit their namespace from the root element.

Simon
I don't even slightly understand - sorry
Codemonkey
Did I misread it or did there used to be an xmlns attribute on the Infringement element?
Simon
There was, yes. Two of them in fact. My stuff works if i replace all "xmlns" with "ns", but is there no way around *changing* the XML?
Codemonkey
@Codemonkey: There used to be **two** xmlns attributes? I see one right now, xmlns:xsi. Is the current question showing the actual attributes of the `<Infringement>` element? This is critical to diagnosing the problem.
LarsH
+2  A: 

Your XML document's root element seems to have default namespace with URI "http://www.movielabs.com/ACNS". This means that all elements in your document belong to that namespace. The problem is that all XPath expressions that do not have a namespace prefix are searching for elements that don't belong to any namespace. To search for elements (or attributes...) from a certain namespace you need to register the namespace URI to some prefix and then use this prefix in your XPath expression.

In case of PHP's simpleXML it's done something like this

$newXML = simplexml_load_string($data);
$newXML->registerXPathNamespace('prefix', 'http://www.movielabs.com/ACNS');
var_dump($newXML->xpath("//prefix:ID"));

prefixcan be practically any text, but the namespace URI must match exactly the one used in your XML document.

jasso
+2  A: 

I've dumped only $newXML and gotten tons of data, but the only xPath I've run that returned anything but an empty array was "*"

So what was returned from var_dump($newXML->xpath("*"));? <Infringement>?

If the problem is namespaces, try this:

var_dump($newXML->xpath("//*[local-name() = 'ID']"));

This will match any element in the document whose name is 'ID', regardless of namespace.

My stuff works if i replace all "xmlns" with "ns"

Wait, what? Are you sure you showed us all the xmlns-related attributes in the document?

Update: The question was edited to show that the XML really does have a default namespace declaration. That explains the original problem: your XPath expression selects ID elements that are in no namespace, but the elements in your document are in the movielabs ACNS namespace, thanks to the default namespace declaration.

The declaration xmlns="http://www.movielabs.com/ACNS" on an element means "this element and all descendants that don't have a namespace prefix (like ID) are in the namespace represented by the namespace URI 'http://www.movielabs.com/ACNS'." (Unless an intervening descendant has a different default namespace declaration, which would shadow this one.)

So use my local-name() answer above to ignore namespaces, or use jasso's technique to specify the movielabs ACNS and use it as intended.

LarsH
`local-name()` it is then. My script will get tons of XML documents in and I can't be sure they will all have the same default namespace
Codemonkey
@Codemonkey that's a fine solution. If you don't know about their default namespace but they're all in the same namespace (possibly using a namespace prefix), you could still use jasso's method, because the prefix in your script doesn't have to match the prefix in the XML document. Only the namespace URI has to match. Or you can ignore namespaces altogether.
LarsH
A: 

use this for any namespace:

var_dump($newXML->xpath("//*:ID"));
Dennis Knochenwefel
@Dennis, this works in XPath 2.0 but not in 1.0.
LarsH