ansaurus

Question

MP3 link Crawler

Answer 1

+1 A:

You can do an http header request to the links and check their mime type. If it is audio/mpeg chances are you are fetching an mp3 link.

klez 2009-07-14 16:58:39

Answer 2

A:

Here's something similar to your request (friends at college use it all the time). Upon entry of QUERY_TEXT This search generates a Google query of the following format:

QUERY_TEXT intitle:
"index.of" "parent directory" "size" "last modified" "description"
[snd] (mp4|mp3|avi)
-inurl:(jsp|php|html|aspx|htm|cf|shtml|lyrics|mp3s|mp3|index)
-gallery
-intitle:"last modified"
-intitle:(intitle|mp3)

pianoman 2009-07-14 17:02:40

this won't search mp3s, but only pages containing directory listing including mp3 files.

klez 2009-07-14 17:05:19

yeah and that's not really crawling either.. i want to see if a script can go around and index X amount of sites only for mp3 files. Thanks for the answer though :)

2009-07-14 19:04:28

Answer 3

+1 A:

What programming languages do you prefer?

Python:
There is a very promising crawling framework called Scrapy (written in python) which is built similar to the Django Framework. I haven't used it yet myself but I've been looking at crawlers and Scrapy is the best candidate. It's IIRC not ready out of the box and requires a minimal amount of coding, but it's designed around the DRY principle and is very customizable (somewhat like Django doesn't give you a turn-key website right after installation).

There are many different methods for URL redirection and your crawler needs to be able to follow these redirects OR in worst case be able to ignore them so it doesn't malfunction.

The site which is redirected to must also be in your sites whitelist.

Could you perhaps edit your question and add details on your crawler; Is it written from scratch, is it some turn-key solution, etc?

Hannson 2009-07-23 16:28:19

ansaurus

tags:

views:

answers:

MP3 link Crawler

related questions