tags:

views:

129

answers:

2

Like what digg does,when you submit a news,the title and summary is automatically retrieved,how to do it?

+5  A: 

Retrieve the HTML and parse it.

The title comes from the <title> tag. The summary can come from either:

  • The first couple of hundred characters of visible text from inside the <body> tag.
  • The description <meta> tag.

If the site provides an RSS feed (which you'll find in the <link rel="alternate" type="application/rss+xml"> tag) use the fielded information from that instead.

There is no one right answer to this question. There are probably other strategies possible. But this should get you started.

Asaph
Also check if there is a Meta tag for Description or Summary. Those are rare though.
Michael Stum
Can you be more specific about `visible text`?
@Michael Stum: good point. I added that to my answer.
Asaph
@unknown (google): Getting the visible text of a web page is a whole other question. Please post it as a separate question.
Asaph
I just verified it's not digg solution.
@unknown (google): I don't know what digg does. I'm guessing they use RSS if it's available. If you know what digg does, feel free to post an answer to your own question. I believe you'll earn a badge for doing that if your answer gets upvoted.
Asaph
You can try it out here:http://digg.com/submit/.It obviously doesn't rely on rss or `meta` and so on...
@unknown (google): I clicked on the link and it was broken for me. I'm not a member of digg in any case. What would I see if I clicked on the link and I was a member?
Asaph
You'll go to the page for submitting news.After you specify the url and click that button,title and summary is automatically retrieved...
@unknown depends on the link for example enter this questions URL and it will ask the submitter to enter a description. While the title is retrieved. So it would seem they are doing something to get a description but falling back on the user to enter if needed.
Jeff Beck
Is the title the same as as the contents of the `<title>` tag?
Asaph
Yes but user editable when they are submitting it. The user action of submission and description is one of the key features of Digg getting users to do the work that may be hard for a computer such as providing a usable nice description.
Jeff Beck
The title retrieved is not exactly the same as `title` tag
In what case? If the page has an RSS feed it may be treated differently.
Jeff Beck
Guys you can have a try and will find it has nothing to do with rss...
Please provide an example of a url that has a different title in Digg then the title tag in the html
Jeff Beck
Title is quite similar to <title> part, though .I'm curious about how the description is generated.
Please give an example URL that populates the description.
Jeff Beck
I just tried this one:http://movies.yahoo.com/news/movies.ap.org/merry-xmas-hollywood-boxoffice-record-falls-ap
It is the meta tag of description. See my answer edited.
Jeff Beck
Oh in this case it's the meta part,but it can still work if the page doesn't have `meta` tag
+1  A: 

The title is easy just the title tag of the HTML the summary is a bit harder if you are retrieving this with some search or context you should try and generate the summary based on the position of the search term or something relative to the context you are showing this in. For example if you are showing this because I hit an "AI" tag show me some of the page that is about AI.

In the case of Digg title and Description can be edited by the poster before it is pushed out to everyone. But if the page has a meta tag of description it will pre-populate the field. They use the following meta tag <meta name="description" content="blah blah blah"/>

Jeff Beck
ai=artificial intelligence
I know I was using an AI tag as an example such are your question is tagged with AI. Since this is about AI I thought you were asking more about how to generate a good summary then how to retrieve the data. @Asaph has a good answer to getting the data just picking what to show in the description is now the hard part.
Jeff Beck
And what if that page doesn't have a meta description?
It allows the user posting the link to write a description. There may be other ways Digg gets the description but I would need examples of URLs that don't have a description meta tag and get an auto populated Digg description then we could figure out where it gets that data but the main resource for descriptions is still the users editing and adding them.
Jeff Beck