ansaurus

Question

Answer 1

+1 A:

>>> re.search('(?<=v=)[\w-]+', 'http://www.youtube.com/watch?v=AIiMa2Fe-ZQ').group()
'AIiMa2Fe-ZQ'

\w is a short-hand for [a-zA-Z0-9_] in python2.x, you'll have to use re.A flag in py3k. You quite clearly have additional character in that videoid, i.e., hyphen. I've also removed redundant escape backslashes from the lookbehind.

SilentGhost 2010-04-14 17:31:28

I think the `-ZQ$` is not part of the ID...

drewk 2010-04-14 17:33:07

@drewk: OP quite clearly says that they are

SilentGhost 2010-04-14 17:33:45

My bad -- sorry...

drewk 2010-04-14 17:36:27

Answer 2

+1 A:

/(?:/v/|/watch\?v=|/watch#!v=)([A-Za-z0-9_-]+)/

Explain the RE

There are three alternate YouTube formats: /v/[ID] and watch?v= and the new AJAX watch#!v= This RE captures all three. There is also new YouTube URL for user pages that is of the form /user/[user]?content={complex URI} This is not captured here by any regex...

drewk 2010-04-14 17:32:10

+1 for youtube format coverage

manifest 2010-04-14 18:42:09

Answer 3

+2 A:

Intead of \w+ use below. Word character (\w) doesn't include a dash. It only includes [a-zA-Z_0-9].

[\w-]+

Taylor Leese 2010-04-14 17:32:27

Answer 4

+1 A:

I don't know the pattern for youtube hashes, but just include the "-" in the possibilities as it is not considered an alpha:

import re
id = re.search('(?<=\?v\=)[\w-]+', 'http://www.youtube.com/watch?v=AIiMa2Fe-ZQ')
print id.group(0)

I have edited the above because as it turns out:

>>> re.search("[\w|-]", "|").group(0)
'|'

The "|" in the character definition does not act as a special character but does indeed match the "|" pipe. My apologies.

manifest 2010-04-14 17:32:56

is pipe allowed in a youtube ID? I don't think so.

SilentGhost 2010-04-14 17:57:22

manifest 2010-04-14 18:22:31

@manifest: **youtube video id doesn't contain `|`** (pipe).

SilentGhost 2010-04-14 18:30:18

@SilentGhost Thanks, I had mistakenly believed the "|" (pipe) would act as a special character. I've corrected the answer.

manifest 2010-04-14 18:46:31

Answer 5

+1 A:

Use the urlparse module instead of regex for such kind of things.

import urlparse

parsed_url = urlparse.urlparse(url)
if parsed_url.netloc.find('youtube.com') != -1 and parsed_url.path == '/watch':
    video = urlparse.parse_qs(parsed_url.query).get('v', None)

    if video is None:
        video = urlparse.parse_qs(parsed_url.fragment.strip('!')).get('v', None)

    if video is not None:
        print video[0]

EDIT: Updated for the upcoming new youtube url format.

Ivo Wetzel 2010-04-14 18:09:06

Answer 6

A:

I'd try this:

>>> import re
>>> a = re.compile(r'.*(\-\w+)$')
>>> a.search('http://www.youtube.com/watch?v=AIiMa2Fe-ZQ').group(1)
'-ZQ'

hughdbrown 2010-04-14 20:28:58

ansaurus

tags:

views:

answers:

Python : Small Regex problem

related questions