views:

255

answers:

2

Hi

I am trying to parse a HTML page using the Simple HTML DOM Parser. This HTML page doesn't make use of IDs which makes it harder to refer to elements.

On this page I am trying to get the Album name, Song title, download link and the album image. I have done this but I can't even get the Album names!

    $html = file_get_html('http://music.banadir24.com/singer/aasha_abdoo/247.html');

    $article = $html->find('table td[class=title]', 0);

    foreach($article as $link){

       echo $link;

    }

This outputs: 1tdArrayArrayArray Artist Array

I need to get this sort of output:

Image Path
Duniya Jamiila [URL]
Macaan Badnoo  [URL]
Donimaayee     [URL]
...

Thanks all for any help

Please note: This is legal as the songs are not bound by copyright and they are available to download freely, its just I need to download a lot of them and I can't sit there clicking a button all day. Having said that, its taken me an hour to get this far.

A: 

There are only three TD tags on the page you used in your example that have a class attribute with the value "title".

1. <td height="35" class="title" style="padding-left:7px;"> Artist</td> 
2. <td colspan="3" height="35" class="title" style="padding-left:7px;"><img src="images/b24/dot_next.png" />Desco</td> 
3. <td colspan="3" height="35" class="title" style="padding-left:7px;"><img src="images/b24/dot_next.png" />The Best Of Aasha</td>

The first one always contains the text "Artist" and the other ones the titles of the albums. They are also the only TD tags with class="title" AND colspan="3" so it should be quite easy to select them using the HTML DOM Parser.

Kau-Boy
+1  A: 

Is this the sort of thing you mean?

$urls = $html->find('table[width=100%] table tr');
foreach($urls as $url){

   echo $url->children(2);
   echo $url->children(6)->children(0)->href;
   echo '<br>';
}

Edit

Using Simple HTML DOM.

Following from your comment, here's some updated code with some (hopefully) helpful comments.

$urls = $html->find('table[width=100%] table tr');
foreach($urls as $url){
    // Check that we actually have the right number of children, this was what was breaking before
    if ($url->children(6)) {
        /* Without the following check, we get a digg icon and a useless link. You can merge this with the if statement above, I only have it
         * seperated so that I can write this comment and it will make more sense when reading it for the first time.
         */
        if ($url->children(2)->children(0)->src == 'images/digg.png' || $url->children(2)->children(0)->href == 'javascript:void(0)') continue;
        // echo out the name of the artist. You can get the text without the link by using $url->children(2)->plaintext
        echo $url->children(2);
        // echo out the link. Obviously you could put this href inside a <a href="code-here">whatever-here</a> tag to make the links clickable.
        echo $url->children(6)->children(0)->href;
        echo '<br>'; // just for readability
   }
}
Blair McMillan
That is exactly what I mean, that so succinct! But how do I get it to go to the next album? For me its seems to just stop and complain about `Call to a member function children() on a non-object` after it does the last song name for the first album?
That's because one of the `$url` doesn't have any children (or possible doesn't have 7 children), so you'll need to check that it's actually valid before doing the call. Have a try to work it out (post your answer if you do to help others in future), otherwise if I get some time tomorrow I'll look into it a bit more.
Blair McMillan