tags:

views:

74

answers:

2

Hi All,

I have to parse an xml file with PHP to an object. At the moment I don't have a clue how to do this, help is appreciated.

The xml is quite big. I have to parse a part of it which looks like this:

<someNamespace:xmlDocument>
<someNamespace:categories>
 <category name="Patrick" anAttribute="numericValue" anotherAttribute="numericValue">
  <category name="Andrew" anAttribute="numericValue" anotherAttribute="numericValue">
   <category name="Alice" anAttribute="numericValue" anotherAttribute="numericValue">
    <category name="Thomas" anAttribute="numericValue" anotherAttribute="numericValue">
     <category name="Michael" anAttribute="numericValue" anotherAttribute="numericValue"/>
     <category name="Matthew" anAttribute="numericValue" anotherAttribute="numericValue"/>
    </category>
    <category name="Janet" anAttribute="numericValue" anotherAttribute="numericValue">
     <category name="Steven" anAttribute="numericValue" anotherAttribute="numericValue"/>
     <category name="Christopher" anAttribute="numericValue" anotherAttribute="numericValue"/>
    </category>
    <category name="Sue" anAttribute="numericValue" anotherAttribute="numericValue"/>
   </category>
   <category name="Charles" anAttribute="numericValue" anotherAttribute="numericValue">
    <category name="John" anAttribute="numericValue" anotherAttribute="numericValue">
     <category name="Charles" anAttribute="numericValue" anotherAttribute="numericValue"/>
     <category name="Rosamund" anAttribute="numericValue" anotherAttribute="numericValue"/>
     <category name="Stuart" anAttribute="numericValue" anotherAttribute="numericValue"/>
     <category name="Rosamund" anAttribute="numericValue" anotherAttribute="numericValue"/>
    </category>
    <category name="John" anAttribute="numericValue" anotherAttribute="numericValue"/>
   </category>
  </category>
  <category name="Oliver" anAttribute="numericValue" anotherAttribute="numericValue">
   <category name="Jane" anAttribute="numericValue" anotherAttribute="numericValue"/>
   <category name="Lucy" anAttribute="numericValue" anotherAttribute="numericValue">
    <category name="David" anAttribute="numericValue" anotherAttribute="numericValue"/>
    <category name="Robert" anAttribute="numericValue" anotherAttribute="numericValue"/>
    <category name="Hetty" anAttribute="numericValue" anotherAttribute="numericValue">
     <category name="Kenneth" anAttribute="numericValue" anotherAttribute="numericValue"/>
     <category name="Jonathan" anAttribute="numericValue" anotherAttribute="numericValue"/>
    </category>
    <category name="Freddy" anAttribute="numericValue" anotherAttribute="numericValue"/>
    <category name="Virginia" anAttribute="numericValue" anotherAttribute="numericValue"/>
   </category>
  </category>
 </category>
</someNamespace:categories>

Every "name" and "anAttribute" attribute is unique.

What I would like to have afterwards is an categories object with many category objects...

Thanks!

+2  A: 

simplexml_load_file

<?php
// The file test.xml contains an XML document with a root element
// and at least an element /[root]/title.

if (file_exists('test.xml')) {
    $xml = simplexml_load_file('test.xml');

    print_r($xml);
} else {
    exit('Failed to open test.xml.');
}
?>
r3zn1k
be aware that this returns simpleXML elements and not plain objects. simplexml elements behave differently in some contexts. Check the documentation for examples.
Pekka
I wonder if this really works as i know that simpleXML has a big issue with namespaces.
Peter Lindqvist
SimpleXML works just fine with namespaces, it's just that it doesn't always behave as you think it would.
Josh Davis
Well unexepected behaviour is not what i call "works just fine" but that's a matter of opinion. I've never gotten round to making it work.
Peter Lindqvist
+1  A: 

Define an extension to DOMDocument

class MyDOMDocument extends DOMDocument
{
    public function toArray(DOMNode $oDomNode = null)
    {
        // return empty array if dom is blank
        if (is_null($oDomNode) && !$this->hasChildNodes()) {
            return array();
        }
        $oDomNode = (is_null($oDomNode)) ? $this->documentElement : $oDomNode;
        if (!$oDomNode->hasChildNodes()) {
            $mResult = $oDomNode->nodeValue;
        } else {
            $mResult = array();
            foreach ($oDomNode->childNodes as $oChildNode) {
                // how many of these child nodes do we have?
                // this will give us a clue as to what the result structure should be
                $oChildNodeList = $oDomNode->getElementsByTagName($oChildNode->nodeName);
                $iChildCount = 0;
                // there are x number of childs in this node that have the same tag name
                // however, we are only interested in the # of siblings with the same tag name
                foreach ($oChildNodeList as $oNode) {
                    if ($oNode->parentNode->isSameNode($oChildNode->parentNode)) {
                        $iChildCount++;
                    }
                }
                $mValue = $this->toArray($oChildNode);
                $sKey   = ($oChildNode->nodeName{0} == '#') ? 0 : $oChildNode->nodeName;
                $mValue = is_array($mValue) ? $mValue[$oChildNode->nodeName] : $mValue;
                // how many of thse child nodes do we have?
                if ($iChildCount > 1) {  // more than 1 child - make numeric array
                    $mResult[$sKey][] = $mValue;
                } else {
                    $mResult[$sKey] = $mValue;
                }
            }
            // if the child is <foo>bar</foo>, the result will be array(bar)
            // make the result just 'bar'
            if (count($mResult) == 1 && isset($mResult[0]) && !is_array($mResult[0])) {
                $mResult = $mResult[0];
            }
        }
        // get our attributes if we have any
        $arAttributes = array();
        if ($oDomNode->hasAttributes()) {
            foreach ($oDomNode->attributes as $sAttrName=>$oAttrNode) {
                // retain namespace prefixes
                $arAttributes["@{$oAttrNode->nodeName}"] = $oAttrNode->nodeValue;
            }
        }
        // check for namespace attribute - Namespaces will not show up in the attributes list
        if ($oDomNode instanceof DOMElement && $oDomNode->getAttribute('xmlns')) {
            $arAttributes["@xmlns"] = $oDomNode->getAttribute('xmlns');
        }
        if (count($arAttributes)) {
            if (!is_array($mResult)) {
                $mResult = (trim($mResult)) ? array($mResult) : array();
            }
            $mResult = array_merge($mResult, $arAttributes);
        }
        $arResult = array($oDomNode->nodeName=>$mResult);
        return $arResult;
    }
}

Use like this

$mydom = new MyDOMDocument();
$mydom->load('test.xml');

print_r($mydom->toArray());
Peter Lindqvist
Hi Peter, thank you very very much! That helped me to get started. But I'm not done yet, so maybe I have to ask here again.
MagD