ansaurus

Question

How can I retrieve the page title of a webpage using Python?

Answer 1

+14 A:

I'll always use lxml for such tasks. You could use beautifulsoup as well.

import lxml.html
t = lxml.html.parse(url)
print t.find(".//title").text

Peter Hoffmann 2008-09-09 04:49:38

Answer 2

+2 A:

This is probably overkill for such a simple task, but if you plan to do more than that, then it's saner to start from these tools (mechanize, BeautifulSoup) because they are much easier to use than the alternatives (urllib to get content and regexen or some other parser to parse html)

Links: BeautifulSoup mechanize

#!/usr/bin/env python
#coding:utf-8

from BeautifulSoup import BeautifulSoup
from mechanize import Browser

#This retrieves the webpage content
br = Browser()
res = br.open("https://www.google.com/")
data = res.get_data() 

#This parses the content
soup = BeautifulSoup(data)
title = soup.find('title')

#This outputs the content :)
print title.renderContents()

Vinko Vrsalovic 2008-09-09 04:51:09

Answer 3

+3 A:

The mechanize Browser object has a title() method. So the code from this post can be rewritten as:

from mechanize import Browser
br = Browser()
br.open("http://www.google.com/")
print br.title()

codeape 2008-09-09 05:45:39

Answer 4

+13 A:

@Vinko Vrsalovic

Your example may be simplified.

import urllib
import BeautifulSoup

soup = BeautifulSoup.BeautifulSoup(urllib.urlopen("https://www.google.com"))
print soup.title.string

NOTE:

soup.title finds the first title element anywhere in the html document
title.string assumes it has only one child node, and that child node is a string

J.F. Sebastian 2008-09-09 10:32:54

Answer 5

A:

is there a way to not download the whole page, but only the first part, so a 'title' can be retrieved? It seems a bit useless to download a full page if you just want the title string.

2008-11-11 10:09:21

While you're downloading the full HTML of the page, I'm not sure if any of these libraries download the "full" page, including linked images, script files and style sheets (can someone confirm?) Usually, HTML is not the #1 offender when computing total size.

technomalogical 2008-11-12 19:56:48

ansaurus

tags:

views:

answers:

How can I retrieve the page title of a webpage using Python?

related questions