views:

43

answers:

2

I want to catch some tags with BeautifulSoup: Some <p> tags, the <title> tag, some <meta> tags. But I want to catch them regardless of their case; I know that some sites do meta like this: <META> and I want to be able to catch that.

I noticed that BeautifulSoup is case-sensitive by default. How do I catch these tags in a non-case-sensitive way?

A: 

You can use soup.findAll which should match case-insensitively:

import BeautifulSoup

html = '''<html>
<head>
<meta name="description" content="Free Web tutorials on HTML, CSS, XML" /> 
<META name="keywords" content="HTML, CSS, XML" /> 
<title>Test</title>
</head>
<body>
</body>
</html>'''

soup = BeautifulSoup.BeautifulSoup(html)
for x in soup.findAll('meta'):
    print x

Result:

<meta name="description" content="Free Web tutorials on HTML, CSS, XML" />
<meta name="keywords" content="HTML, CSS, XML" />
Mark Byers
Ah, so BeautifulSoup converts all tags to lowercase? What about attributes? And does that mean I can also use `find` instead of `findAll` and it would still be case-insensitive?
cool-RR
@cool-RR: Yes find works in a similar way to findAll. The case-sensitivity depends on what you are doing though.
Mark Byers
Okay, now I tried and it does convert attributes to lowercase.
cool-RR
A: 

BeautifulSoup standardises the parse tree on input. It converts tags to lower-case. You don't have anything to worry about IMO.

Oli