views:

63

answers:

3

Hey friends,

quick question: I need to transform a default RSS Structure into another XML-format.

The RSS File is like....

<?xml version="1.0" encoding="UTF-8"?>
    <rss version="2.0">
        <channel>
            <title>Name des RSS Feed</title>
            <description>Feed Beschreibung</description>
            <language>de</language>
            <link>http://xml-rss.de&lt;/link&gt;
            <lastBuildDate>Sat, 1 Jan 2000 00:00:00 GMT</lastBuildDate>
            <item>
                <title>Titel der Nachricht</title>
                <description>Die Nachricht an sich</description>
                <link>http://xml-rss.de/link-zur-nachricht.htm&lt;/link&gt;
                <pubDate>Sat, 1. Jan 2000 00:00:00 GMT</pubDate>
                <guid>01012000-000000</guid>
            </item>
            <item>
                <title>Titel der Nachricht</title>
                <description>Die Nachricht an sich</description>
                <link>http://xml-rss.de/link-zur-nachricht.htm&lt;/link&gt;
                <pubDate>Sat, 1. Jan 2000 00:00:00 GMT</pubDate>
                <guid>01012000-000000</guid>
            </item>
            <item>
                <title>Titel der Nachricht</title>
                <description>Die Nachricht an sich</description>
                <link>http://xml-rss.de/link-zur-nachricht.htm&lt;/link&gt;
                <pubDate>Sat, 1. Jan 2000 00:00:00 GMT</pubDate>
                <guid>01012000-000000</guid>
            </item>
        </channel>
    </rss>

...and I want to extract only the item-elements (with childs and attributes) XML like:

<?xml version="1.0" encoding="ISO-8859-1"?>
<item>
    <title>Titel der Nachricht</title>
    <description>Die Nachricht an sich</description>
   <link>http://xml-rss.de/link-zur-nachricht.htm&lt;/link&gt;
   <pubDate>Sat, 1. Jan 2000 00:00:00 GMT</pubDate>
   <guid>01012000-000000</guid>
</item>
...

It hasn't to be stored into a file. I need just the output.

edit: Furthermore you need to know: The RSS File could have dynamic numbers of items. This is just a sample. So it has to be looped with while, for, for-each, ...

I tried different approaches with DOMNode, SimpleXML, XPath, ... but without success.

Thanks chris

A: 

Try:

<?php
$xmlFile = new DOMDocument(); //Instantiate new DOMDocument
$xmlFile->load("URL TO RSS/XML FILE"); //Load in XML/RSS file
$xmlString = file_get_contents("URL TO RSS/XML FILE"); 

$title[] = "";
$description[] = "";
$link[] = "";
$pubDate[] = "";
$guid[] = "";

for($i = 0; $i < substr_count($xmlString, "<item>"); $i++)
{
$title[] = $xmlFile->getElementsByTagName("title")->item(0)->nodeValue; //Get the value of the node <title>
$description[] = $xmlFile->getElementsByTagName("description")->item(0)->nodeValue;
$link[] = $xmlFile->getElementsByTagName("link")->item(0)->nodeValue;
$pubDate[] = $xmlFile->getElementsByTagName("pubDate")->item(0)->nodeValue;
$guid[] = $xmlFile->getElementsByTagName("guid")->item(0)->nodeValue;
}
?>

Untested but the arrays

$title[] $description[] $link[] $pubDate[] $guid[]

should be populated with all of the data that you need!

EDIT: OK so another approach:

<?php
$xmlString = file_get_contents("URL TO RSS/XML FILE"); 
$titles = preg_filter("/<title>([.]*)</title>/","\\1", mixed $xmlString);
$descriptions = preg_filter("/<description>([.]*)</description>/","\\1", mixed $xmlString);
$links = preg_filter("/<link>([.]*)</link>/","\\1", mixed $xmlString);
$pubDates = preg_filter("/<pubDate>([.]*)</pubDate>/","\\1", mixed $xmlString);
$guids = preg_filter("/<guid>([.]*)</guid>/","\\1", mixed $xmlString);
?>

In this example each variable will be filled with the correct values.

Chief17
would be kind of you, if you could extend your approach.thanks
ChrisBenyamin
thank you cheif17, but it don't seem to me as a clean solution for this kind of problems. with your code, you have to pick up every single attribute and build the new xml document with the arrays.
ChrisBenyamin
ok i have put an edit at the bottom witha totally different approach!
Chief17
+1  A: 

What you ask for is hardly a transformation. You are basically just extracting the <item> elements as they are. Also, the result you give is not valid XML, as it lacks a root node.

Apart from that, you can simple do it like this:

$dom = new DOMDocument;           // init new DOMDocument
$dom->loadXML($xml);              // load some XML into it

$xpath = new DOMXPath($dom);      // create a new XPath
$nodes = $xpath->query('//item'); // Find all item elements
foreach($nodes as $node) {        // Iterate over found item elements
    echo $dom->saveXml($node);    // output the item node outerHTML
}

The above would echo the <item> nodes. You could simply buffer the output, concatenate it to a string, write to it an array and implode, etc - and write it to file.

If you want to do it properly with DOM (and a root node), the full code would be:

$dom = new DOMDocument;                          // init DOMDocument for RSS
$dom->loadXML($xml);                             // load some XML into it

$items = new DOMDocument;                        // init DOMDocument for new file
$items->preserveWhiteSpace = FALSE;              // dump whitespace
$items->formatOutput = TRUE;                     // make output pretty
$items->loadXML('<items/>');                     // create root node

$xpath = new DOMXPath($dom);                     // create a new XPath
$nodes = $xpath->query('//item');                // Find all item elements
foreach($nodes as $node) {                       // iterate over found item nodes
    $copy = $items->importNode($node, TRUE);     // deep copy of item node
    $items->documentElement->appendChild($copy); // append item nodes
}
echo $items->saveXML();                          // outputs the new document

Instead of saveXML(), you'd use save('filename.xml') to write it to a file.

Gordon
thanks gordon, looks good, but i get an error message. couldn't find out what the failure is. "Warning: DOMDocument::loadXML() [domdocument.loadxml]: Start tag expected, '<' not found in Entity, line: 1 in /home/chris/http/dev/xmlfeed/index3.php on line 4"
ChrisBenyamin
@Chris I've used the RSS XML you gave for $xml. Remember, loadXML loads from a String. If you want to load from a URL or file use load() only.
Gordon
ChrisBenyamin
+1  A: 

A different approach would be to use an XSLT:

$xsl = <<< XSL
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
<xsl:template match="/">
<items>
  <xsl:copy-of select="//item">
    <xsl:apply-templates/>
  </xsl:copy-of>
</items>
</xsl:template>
</xsl:stylesheet>
XSL;

The above stylesheet has just one rule, namely deep copying all <item> elements from the source XML to an XML file and ignore everything else from the source file. The nodes will be copied into an <items> element for root node. To process this, you'd do

$xslDoc = new DOMDocument();           // create Doc for XSLT
$xslDoc->loadXML($xsl);                // load stylesheet into it
$xmlDoc = new DOMDocument();           // create Doc for RSS
$xmlDoc->loadXML($xml);                // load your XML/RSS into it
$proc = new XSLTProcessor();           // init XSLT engine
$proc->importStylesheet($xslDoc);      // load stylesheet into engine
echo $proc->transformToXML($xmlDoc);   // output transformed XML

Instead of outputting, you could just write the return value to file.

Further reading:

Gordon
i will try it tomorrow and give you feedback.didn't thought about an xslt approach - thanks for this!
ChrisBenyamin
Hey Gordon, where have I to include (or reference) to my given RSS-File? I'm asking, because in the PHP-Part you wrote in the fourth comment "load your xml/rss", but the var $xml is already used for the XSL above. - XSL is pretty new stuff for me, so I guess I'm still thinking too complex.edit: Okay, I am blind or still tired. I didn't see there are two different vars ($xml and $xsl). - Let's give it a try ;)
ChrisBenyamin
@Chris you can assign the `$xml` var the same way you assign the `$xsl` with HEREDOC syntax. Or use `->load('filename.xml')`.
Gordon
Hm, are you sure you didn't forget something? Because I don't get any output.
ChrisBenyamin
@Chris worked fine for me. Check out http://pastebin.com/0Axr7tTS
Gordon
Yea it works. My fault..had a silly typo in my code. Thanks again
ChrisBenyamin