views:

69

answers:

2

I have been looking for a way to get the comments from a Blogger blog if I have a regular URL. I know you can get the blogID by scraping the html, which is somewhat unpleasant but has a few standard ways to get it. The problem is that I have not been able to find a way to get the comments for a specific post if I have only the post URL and the blogID. The postID cannot be reliably scraped from the HTML as far as I can tell, and it seems like the postID is required to get the comments for a single post.

Also, the get most recent posts for a blogID API call is only helpful if the post is one of the most recent 10 or 15, so if it is a slightly older post, I probably cannot use that option. Does anyone know of a decent method to do this? I am mostly looking for a java solution, but if there is a solution in a different language I would gladly port it to java.

A: 

Just parse the Atom feeds XML. If you have the postURL you can just extract the link element with the rel="service.post" attribute from the HTML.

When the Atom document you get does not contain all comments then it contains a link element with rel="next" attribute that points to the document containing the rest of the comments.

You can use Apache Abdera to process the feed if you do not want to handle the plain XML yourself. Their wiki has also an example of how to consume Atom feeds.

Moritz
The service.post is the blog's full ATOM feed which is not entirely helpful in this case. Yes, I could follow the rel="next" attributes, but for heavily posted blogs, a post from a month ago could be over 300 posts prior. So, following that trail is not a good solution.
Robert Diana
No, it's not the full feed. If you take the *post's* URL the feed behind the `service.post` link contains only the *comments* to that post.
Moritz
Really? That is not what is documented or what I had seen, but I will definitely recheck it and update this again.
Robert Diana
@Moritz Disappointingly, the sample blogs that I have looked at do not follow what the same pattern as your solution, as each of them are referencing the blogID posts feed. The only time I am seeing a comment feed for the post is when there is a tag of the form <link rel="alternate" type="application/atom+xml" href="http://domain/feeds/postID/comments/default" /> .
Robert Diana
A: 

I just wanted to document my findings given that this question seems to be asked often and rarely answered.

Basically, to get the comments for a single blogger URL you would need the postID. If you have the postID, you can go through the Blogger API. If you only have the URL of the post, there seems to be only one somewhat reliable option, looking for the default post comments feed. To find this you need to look for an html tag of the form

In particular, the java regex that works for this is:

Pattern p = Pattern.compile("http://.*/feeds/[0-9]+/comments/default");

If this link tag does not exist, then the blog likely has a third-party commenting system installed like Disqus, Echo or IntenseDebate.

Robert Diana