tags:

views:

75

answers:

3

Hello, my first time here. I got these lines as a response from the server and saved them in a file. They look like XML, right? My task is to read the content of those td tags and put them into other structured file(Excel). The problem is I dont know how to do that.

At the moment, I think I will strip the first and last line of the file then parse them into XML. But do you know other ways ? Thanks.

<CallbackContent><![CDATA[
    <table cellspacing="0" border="0" cellpadding="0" width="100%">
        <tr class="rowcolor2">
            <td align="left" style="padding:5px;">22/02/2010</td>                        
            <td align="right" style="padding:5px;">510,02</td>
        </tr>
    </table>     
]]></CallbackContent>

Btw, I'm using PHP.

A: 

You cannot read the table with an XML parser, because it is pushed out as a CDATA block, which equivocates to a string literal.

Thanks,that's why I'm telling I will strip the first and last line. Is it OK to do so?
hoangquan
I don't know, because i do not see your entire document and I do not see how it is used. The bits you wish to remove could have use or context that I am otherwise not aware of from outside the example you provided.
The entire document has the same structure, it may have some more hundreds <tr> and <td> tags. And I just need to get the content of <td> tags
hoangquan
A: 

First, read the whole thing using a XML parser so that you can pull out the contents of the CDATA section. Then take that and stuff it through an HTML parser.

Ignacio Vazquez-Abrams
Thanks. Can you tell me how to pull the contents out of CDATA section? Its structure is weird.
hoangquan
You don't. You take it and feed it into the HTML parser.
Ignacio Vazquez-Abrams
+1  A: 

Use an XML parser such as SimpleXML. It will allow you to extract the CDATA safely.

Then if the HTML is XML-compliant (in other words, it's XHTML) you can use SimpleXML to extract data from it. For example:

$xml='<CallbackContent><![CDATA[
    <table cellspacing="0" border="0" cellpadding="0" width="100%">
        <tr class="rowcolor2">
            <td align="left" style="padding:5px;">22/02/2010</td>                        
            <td align="right" style="padding:5px;">510,02</td>
        </tr>
    </table>     
]]></CallbackContent>';

$CallbackContent = simplexml_load_string($xml);
$html = (string) $CallbackContent;

// if XHTML
$table = simplexml_load_string($html);

// otherwise, use
$dom = new DOMDocument;
$dom->loadHTML($html);
$table = simplexml_import_dom($dom)->body->table;

foreach ($table->tr as $tr)
{
    echo 'tr class=', $tr['class'], "\n";
    foreach ($tr->td as $td)
    {
        echo 'td align=', $td['align'], ' - value: ', (string) $td, "\n";
    }
}
Josh Davis
you had traversed the html dom. but the person needs as excel file. hence append as the comma separated value.use header attachment disposition to throw as excel.
coder
@Josh Davis: it works perfectly, big thanks, you saved my day. It looks like I did not study about CDATA definition of XML.@coder: it's ok, I just want to extract the content of <td> tag.
hoangquan