ansaurus

Question

Getting BeautifulSoup to catch tags in a non-case-sensitive way

Answer 1

A:

You can use soup.findAll which should match case-insensitively:

import BeautifulSoup

html = '''<html>
<head>
<meta name="description" content="Free Web tutorials on HTML, CSS, XML" /> 
<META name="keywords" content="HTML, CSS, XML" /> 
<title>Test</title>
</head>
<body>
</body>
</html>'''

soup = BeautifulSoup.BeautifulSoup(html)
for x in soup.findAll('meta'):
    print x

Result:

<meta name="description" content="Free Web tutorials on HTML, CSS, XML" />
<meta name="keywords" content="HTML, CSS, XML" />

Mark Byers 2010-07-28 12:10:03

Ah, so BeautifulSoup converts all tags to lowercase? What about attributes? And does that mean I can also use `find` instead of `findAll` and it would still be case-insensitive?

cool-RR 2010-07-28 12:14:32

@cool-RR: Yes find works in a similar way to findAll. The case-sensitivity depends on what you are doing though.

Mark Byers 2010-07-28 12:18:41

Okay, now I tried and it does convert attributes to lowercase.

cool-RR 2010-07-28 12:36:04

Answer 2

A:

BeautifulSoup standardises the parse tree on input. It converts tags to lower-case. You don't have anything to worry about IMO.

Oli 2010-07-28 12:14:36

ansaurus

tags:

views:

answers:

Getting BeautifulSoup to catch tags in a non-case-sensitive way

related questions