views:

30

answers:

2

So for example the youtube video ID from a youtube page, or a tweet ID from a twitter page, or a Facebook uid from a facebook profile...

A: 

You don't need an open source project for that. Lifting the ID from the page is usually a matter of parsing the URL that got you there. In youtube's case, the "v" querystring parameter indicates the video ID. The other examples have similar answers.

Scott Stafford
Scott, Youtube only is easy. What if I want to do that for 100 site types?
David Haddad
@David Haddad: Can you clarify your question then? You want a generic way to extract what exactly from arbitrary web pages? Just the identifying ID? Semantic information?
Scott Stafford
@Scott Stafford It's kind of hard to explain. The main content of a page changes from one page type to the other. So let's say if you pass it the link to a tweet page, then the main output would be the tweet_id, twitterer, and the tweet text. It would vary from one site to the other. However if you do the same with a youtube video link, it would send you the youtube video ID/title/etc...
David Haddad
@David Haddad: I am pretty sure you're not going to find any project that is prewritten that just knows all the specific formats of every popular social networking/web 2.0 site and can parse it for you.
Scott Stafford
A: 

The oembed protocol has a specification for accessing structured relevant data based on a URL. embed.ly is a company that procides an api based on that standard.

http://www.oembed.com/ http://embed.ly

David Haddad