views:

176

answers:

1

Hi, I'd like to extract the info string from an internet radio streamed over HTTP. By info string I mean the short note about the currently played song, band name etc.

Preferably I'd like to do it in python. So far I've tried opening a socket but from there I got a bunch of binary data that I could not parse...

thanks for any hints

+1  A: 

Sounds like you might need some stepping stone projects before you're ready for this. There's no reason to use a low-level socket library for HTTP. There are great tools both command line utilities and python standard library modules like urlopen2 that can handle the low level TCP and HTTP specifics for you.

Do you know the URL where you data resides? Have you tried something simple on the command line like using cURL to grab the raw HTML and then some basic tools like grep to hunt down the info you need? I assume here the metadata is actually available as HTML as opposed to being in a binary format read directly by the radio streamer (which presumably is in flash perhaps?).

Hard to give you any specifics because your question doesn't include any technical details about your data source.

Peter Lyons
Now I realize that I was not really specific. I've used urlopen in python, implemented few webcrawlers and stuff like that in past. But the source I'm talking about here is not a regular HTTP website. It is a HTTP live stream, basically a radio you can listen over the internet. They probably stream mp3s or something like that, divided into chunks over the http.The url is http://82.134.68.82:8666When you play the stream in say VideoLan player, it somehow extracts the string metadata, where they write the current song, band name, radio name etc.. it is about few hundred chars long.thanks:)
supo
Now I tried accessing the url via webbrowser, just for the fun of it. And it shows some basic info, along with the song name! That looks like what I need exactly.. It might be more generic to parse it from the stream though, it probably would work with other channels outside Shoutcast. So any hints on that still appreciated.
supo
What is the HTTP content-type header that is returned? That IP doesn't allow me to connect at this time. You might want to try looking at the HTTP headers using the firefox Live HTTP headers plugin or making a telnet connection to that port and typing in a manual HTTP `GET / HTTP/1.0` type request.
Peter Lyons
It looks like there are 3 addresses that are cycled in this radio. The one that works right now is http://82.134.68.82:8666When I try telneting to 82.134.68.82 8666 and typing in the GET request, I can see some textual info (it might be the thing I want) and then probably binary data roll in and it is impossible to turn it off. I tried redirecting the output of telnet to a file using > in windows but it did not work, so I can not really tell what text is there in the beginning of the stream. The telnet window just goes out of control after I issue the request. I'll try to use urlopen withit.
supo
Here's the top of the output. So it's an audio-mpeg stream but there's icy-notice1,icy-name, icy-genre headers that have some info you could parse.ICY 200 OKicy-notice1:<BR>This stream requires <a href="http://www.winamp.com/">Winamp</a><BR>icy-notice2:SHOUTcast Distributed Network Audio Server/win32 v1.9.2<BR>icy-name:Gotham Radio - The Dark Side of Metal (Live Chat, Requests,Band Info) Rock Radioicy-genre:Metal - Goth - Black - Symphonic and Powermetalicy-url:http://www.gothmetal.netContent-Type:audio/mpegicy-pub:1icy-br:128
Peter Lyons