views:

758

answers:

7

I'm having trouble retrieving the youtube video automatically, heres the code. The problem is the last part. download = urllib.request.urlopen(download_url).read()

    # Youtube video download script
    # 10n1z3d[at]w[dot]cn

    import urllib.request
    import sys

    print("\n--------------------------")
    print (" Youtube Video Downloader")
    print ("--------------------------\n")

    try:
            video_url = sys.argv[1]
    except:
            video_url = input('[+] Enter video URL: ')

    print("[+] Connecting...")
    try:
            if(video_url.endswith('&feature=related')):
                    video_id = video_url.split('www.youtube.com/watch?v=')[1].split('&feature=related')[0]
            elif(video_url.endswith('&feature=dir')):
                    video_id = video_url.split('www.youtube.com/watch?v=')[1].split('&feature=dir')[0]
            elif(video_url.endswith('&feature=fvst')):
                    video_id = video_url.split('www.youtube.com/watch?v=')[1].split('&feature=fvst')[0]
            elif(video_url.endswith('&feature=channel_page')):
                    video_id = video_url.split('www.youtube.com/watch?v=')[1].split('&feature=channel_page')[0]
            else:
                    video_id = video_url.split('www.youtube.com/watch?v=')[1]
    except:
            print("[-] Invalid URL.")
            exit(1)       
    print("[+] Parsing token...")
    try:
            url = str(urllib.request.urlopen('http://www.youtube.com/get_video_info?&video_id=' + video_id).read())
            token_value = url.split('video_id='+video_id+'&token=')[1].split('&thumbnail_url')[0]

            download_url = "http://www.youtube.com/get_video?video_id=" + video_id + "&t=" + token_value + "&fmt=18"
    except:
            url = str(urllib.request.urlopen('www.youtube.com/watch?v=' + video_id))
            exit(1)

    v_url=str(urllib.request.urlopen('http://'+video_url).read())   
    video_title = v_url.split('"rv.2.title": "')[1].split('", "rv.4.rating"')[0]
    if '"' in video_title:
            video_title = video_title.replace('"','"')
    elif '&' in video_title:
            video_title = video_title.replace('&','&')

    print("[+] Downloading " + '"' + video_title + '"...')
    try:
            print(download_url)
            file = open(video_title + '.mp4', 'wb')
            download = urllib.request.urlopen(download_url).read()
            print(download)
            for line in download:
                    file.write(line)
                    file.close()
    except:
            print("[-] Error downloading. Quitting.")
            exit(1)

    print("\n[+] Done. The video is saved to the current working directory(cwd).\n")

There's an error message: (thanks Wooble)

Traceback (most recent call last):
  File "C:/Python31/MyLib/DrawingBoard/youtube_download-.py", line 52, in <module>
    download = urllib.request.urlopen(download_url).read()
  File "C:\Python31\lib\urllib\request.py", line 119, in urlopen
    return _opener.open(url, data, timeout)
  File "C:\Python31\lib\urllib\request.py", line 353, in open
    response = meth(req, response)
  File "C:\Python31\lib\urllib\request.py", line 465, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python31\lib\urllib\request.py", line 385, in error
    result = self._call_chain(*args)
  File "C:\Python31\lib\urllib\request.py", line 325, in _call_chain
    result = func(*args)
  File "C:\Python31\lib\urllib\request.py", line 560, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "C:\Python31\lib\urllib\request.py", line 353, in open
    response = meth(req, response)
  File "C:\Python31\lib\urllib\request.py", line 465, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python31\lib\urllib\request.py", line 391, in error
    return self._call_chain(*args)
  File "C:\Python31\lib\urllib\request.py", line 325, in _call_chain
    result = func(*args)
  File "C:\Python31\lib\urllib\request.py", line 473, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden   
+4  A: 

The code on the original question relies on several assumptions about the content of youtube pages and urls (expressed in constructs such as "url.split('something=')[1]") which may not always be true. I tested it and it might depend even on which related videos show on the page. You might have tripped on any of those specificities.

Here's a cleaner version, which uses urllib to parse urls and query strings, and which successfully downloads a video. I've removed some of the try/except which didn't do much but exit, for clarity. Incidentally, it deals with unicode video titles by removing non-ascii characters from the filename to which the video is saved. It also takes any numbers of youtube urls and downloads them all. Finally, it masks its user-agent as Chrome for Mac (which is what I currently use).

#!/usr/bin/env python3

import sys
import urllib.request
from urllib.request import urlopen, FancyURLopener
from urllib.parse import urlparse, parse_qs, unquote

class UndercoverURLopener(FancyURLopener):
    version = "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-US) AppleWebKit/533.2 (KHTML, like Gecko) Chrome/5.0.342.9 Safari/533.2"
urllib.request._urlopener = UndercoverURLopener()

def youtube_download(video_url):
    video_id = parse_qs(urlparse(video_url).query)['v'][0]

    url_data = urlopen('http://www.youtube.com/get_video_info?&amp;video_id=' + video_id).read()
    url_info = parse_qs(unquote(url_data.decode('utf-8')))
    token_value = url_info['token'][0]

    download_url = "http://www.youtube.com/get_video?video_id={0}&amp;t={1}&amp;fmt=18".format(
        video_id, token_value)

    video_title = url_info['title'][0] if 'title' in url_info else ''
    # Unicode filenames are more trouble than they're worth
    filename = video_title.encode('ascii', 'ignore').decode('ascii').replace("/", "-") + '.mp4'

    print("\t Downloading '{}' to '{}'...".format(video_title, filename))

    try:
        download = urlopen(download_url).read()
        f = open(filename, 'wb')
        f.write(download)
        f.close()
    except Exception as e:
        print("\t Downlad failed! {}".format(str(e)))
        print("\t Skipping...")
    else:
        print("\t Done.")

def main():
    print("\n--------------------------")
    print (" Youtube Video Downloader")
    print ("--------------------------\n")

    try:
        video_urls = sys.argv[1:]
    except:
        video_urls = input('Enter (space-separated) video URLs: ')

    for u in video_urls:
        youtube_download(u)
    print("\n Done.")

if __name__ == '__main__':
    main()
rbp
Just made a small edit to fix a spurious single-tick that showed up at the video title.
rbp
thanks for your effort, the code looks much cleaner now, i should have removed the try except statements myself. however it still results in an error when i attempt to download a video.
dsaccount1
On the url_info line, if i'm not wrong theres an additional ')'
dsaccount1
maybe youtube knows that its a program, if i modify the headers of the request i could mimic a human user.
dsaccount1
True, there's an additional ')', left when I edited the answer. I'm removing it.Do you still get the Forbidden error? Could be you're trying too often and youtube is blocking you. I downloaded several different videos with this code.It could also be a specific video. Which one are you trying to download? For instance, I've downloaded http://www.youtube.com/watch?v=OEDP6xfAiVw several times with the above code, without further issues ("[+] Downloading 'Cycling in Amsterdam'..." -> "Cycling in Amsterdam.mp4"). Try downloading this one.
rbp
BTW, I didn't modify the headers at all, and it worked. If you've been trying too often, maybe they've blocked your IP, at least for a while.
rbp
Edited once more, to deal more elegantly with unicode video titles. I've also changed the code to use python3-style string formatting, instead of joining strings with "+"
rbp
I really appreciate the amt of effort you put into the script, i'm still receiving the same error, i noticed that your method for retrieving the video is roughly the same as the method in the script i posted.
dsaccount1
I managed to get rid of the 403 i think by adding a host param as part of the header.
dsaccount1
I haven't properly inspected Jake's script yet, but it looks nice (with bonus points for downloading from Vimeo as well). Still, I'm having fun with this downloader, so I've re-structured it to make it download an arbitrary number of videos.It also masquerades as a web browser, which might help you with you Forbidden error (but I never get it, so I can't properly test it).
rbp
+2  A: 

I'm going to shamelessly plug my script which automates checking for valid formats, automatically choosing the best quality format for a video, and works on both the flash and html5 variants of YouTube pages (as well as Vimeo).

If you wrote that script then please look at my source code for inspiration and feel free to steal some code. I challenge you to please write something better. Open source thrives on competition!

However, if you copied that script and are just trying to get it working, may I suggest you give my script a try and see if it fares better for you. You can access it both from the command line as a script or even as a module in another python file.

(edit: made wiki. not looking for reputation.)

Jake Wharton
Just wanted to see how the script i posted worked. Thanks for posting yours, i'm going to have a look.
dsaccount1
Hey isit possible to pull a key:value pair from the http response header you receive from youtube? get_headers?
dsaccount1
I entered the headers, no errors this time but the file size is way too small to be the video, do i need to buffer the download? how do i do that? also you have a download_callback procedure? what does that do?
dsaccount1
Also to add, using urllib, if the page i request is redirected, how do i get urllib to return the redirected address?
dsaccount1
As you can see from http://docs.python.org/library/urllib2.html#urllib2.urlopen you can use `.info()` to retrieve header information from the return object. The `download_callback` function is only used for Vimeo to determine whether the file is an MP4 or MOV. I'm not sure how redirects work but there is a `HTTPRedirectHandler` class in `urllib2`
Jake Wharton
A: 

Looks like YouTube guys have changed algorithms for accessing video files. Instead of "token" they now use "signature" variable, and "signature" seems to be dependent on either cookie-stored data or IP address of the client (in case of cookies-disabled browser like urllib in python-2). Here's a hack I've come up with (URLs are IP-locked):

#!/usr/bin/python

import re
from urlparse import *
from urllib import *

def yt_url(video_url):
    video_id = parse_qs(urlparse(video_url).query)['v'][0]

    get_vars = parse_qs(unquote(urlopen("http://www.youtube.com/get_video_info?video_id="+video_id).read()))

    url = get_vars["id"][0].split(",")[1].split("|")[1]

    elements = dict()
    elements["itag"] = get_vars["itag"][0]
    elements["sver"] = get_vars["sver"][0]
    elements["expire"] = get_vars["expire"][0]
    elements["signature"] = get_vars["signature"][0]
    elements["factor"] = get_vars["factor"][0]
    elements["id"] = get_vars["id"][0].split(",")[0]
    elements["key"] = get_vars["key"][0]
    elements["burst"] = get_vars["burst"][0]
    elements["sparams"] = get_vars["sparams"][0]
    elements["algorithm"] = get_vars["algorithm"][0]
    elements["ipbits"] = "8"

    for get_var in elements:
      url += "&" + get_var + "=" + elements[get_var]

    return (get_vars["title"][0], url)

if __name__ == '__main__':
    (title, url) = yt_url("http://www.youtube.com/watch?v=4tAr7tuakt0")
    print "Title: %s" % (title,)
    print "Video: %s" % (url,)
Darko Ilic
A: 

Hi guys,

Has anything changed on Youtube, as I cannot get the codes posted here to work

Thomas
A: 

yes,youtube changed something...get_video_info page's 't' value is not working anymore...rather you can download the video page(i.e, youtube.com/watch?v=ND69q158IZI),search for '"t":' in the page and use this 't' value...also you have to add another parameter to prepare the download link, 'asv=value'...normally this value is 2 (asv=2), but if you find something like 'as3' in the video page(i.e, youtube.com/watch?v=ND69q158IZI), use 'asv=3'...so the complete download url will be like this,

http://www.youtube.com/get_video?video_id=ND69q158IZI&amp;t=vjVQa1PpcFPrF0oFfKsUt-yQwkRi3Xyo-9fIpqe56P4=&amp;asv=3

Thanks

mahmudul hasan
A: 

yes,youtube changed something...get_video_info page's 't' value is not working anymore...rather you can download the video page(i.e, youtube.com/watch?v=ND69q158IZI),search for '"t":' in the page and use this 't' value...also you have to add another parameter to prepare the download link, 'asv=value'...normally this value is 2 (asv=2), but if you find something like 'as3' in the video page(i.e, youtube.com/watch?v=ND69q158IZI), use 'asv=3'...so the complete download url will be like this,

http://www.youtube.com/get_video?video_id=ND69q158IZI&amp;t=vjVQa1PpcFPrF0oFfKsUt-yQwkRi3Xyo-9fIpqe56P4=&amp;asv=3

Thanks

mahmudul hasan
A: 

no any different, just download them, this is the video tutorial to help you out: http://www.asharer.com/free-download-youtube-videos-via-video-player.html

Guggenheim