views:

204

answers:

1

Hi, sorry for the vague title. I'm extracting some data from a table with the setup as specified below, using simple_html_dom. What I want to do is insert the data into a 2D-array (?), where the value of the first -field is the name of a subscription and the rest is data relevant to that subscription.

<td><a href="/subname/index.jsp">SubName</a></td> <!-- This is the name of the subscription -->
<td>Comment regarding the subscription</td><!-- Comment -->        
<td><strong>0,-</strong></td><!-- Monthly fee -->
<td>0,49</td><!-- Price per minute -->
<td>0,49</td><!-- Price per SMS -->
<td>1,99</td><!-- Price per MMS -->

What I have so far, is working okay, but it puts all the values into a regular array. I've tried reading up on arrays and trying different solutions that've come to mind, but I just can't seem to wrap my head around it.

What I want is something like:

Array ( [SubName1] => Array ( [0] => Comment [1] => Monthly fee [2] => Price per minute [3] => Price per SMS [4] => Price per MMS ) [SubName2] => Array ( .. )

This is my code:

function getData($uri) {
try {
$html = file_get_html($uri); // Fetch source code
$data = array();
foreach($html->find('td') as $td) { // Fetch all <td>-elements

foreach($td->find('a') as $a) { // Fetch all <a>-elements to remove links
     $data[] = $a->innertext; // This returns the names of the subscriptions
}
foreach($td->find('strong') as $strong) { // Fetch all <strong>-elements to remove bold text
   $data[] = $strong->innertext;
}
if(!preg_match('/<strong>/', $td->innertext) && !preg_match('/<a/', $td->innertext)) { // Skip all <td>-elements that contains <strong> and <a>, since we already have them 
    $data[] = $td->innertext;
}
}

/* Logic for database insertion goes here */

unset($data); // Deletes array
$html->clear(); // Clear to free up memory
unset($html);
} catch (Exception $e) {
echo 'Failed to fetch prices from'.$uri.'.<br />'.$e->getMessage();
}
}

Thanks in advance.

A: 

If I understand your problem correctly this is how you should do it.

First of all I suggest you catch each row instead of individual cells and then parse each row independently.

So in this example I assume that you row is wrapped in tr tags:

<tr>
<td><a href="/subname/index.jsp">SubName</a></td> <!-- This is the name of the subscription -->
<td>Comment regarding the subscription</td><!-- Comment -->        
<td><strong>0,-</strong></td><!-- Monthly fee -->
<td>0,49</td><!-- Price per minute -->
<td>0,49</td><!-- Price per SMS -->
<td>1,99</td><!-- Price per MMS -->
</tr>

If there are more cells at the beginning or the end you will just have to adjust indexes accordingly. Also I havent tested this code so there might be some errors there but the general idea should be ok.

//here we will store parsed values
$data = array();

// you may want to filter this a bit if you want some rows to be skipped
foreach ($html->find('tr') as $tr) {
    // we get first cell in the row, find a element inside and take it's inner text and so on
    $name = $tr->children(1)->find('a')->innertext;
    $comment = $tr->children(2)->innertext;
    $monthyFee = $tr->children(3)->find('strong')->innertext;
    $pricePerMin = $tr->children(4)->innertext;
    $pricePerSms = $tr->children(5)->innertext;
    $pricePerMms = $tr->children(6)->innertext;

    // create new entry in $data array formatted as you wanted it
    $data[$name] = array($comment, $monthlyFee, $pricePerMin, $pricePerSms, $pricePerMms);
}

Important note here - this won't prevent you from overwriting some data in case your name is not unique so you must be sure if it really is. This is because associative arrays cannot have multiple keys with the same value.

RaYell
Thanks, got it working after a few tweeks. Couldn't use children(x)-find to strip a and strong, but using 'plaintext' instead of innertext solved the problem. Thank you :)
Tom