views:

345

answers:

5

I'm working on code to parse a configuration file written in XML, where the XML tags are mixed case and the case is significant. Beautiful Soup appears to convert XML tags to lowercase by default, and I would like to change this behavior.

I'm not the first to ask a question on this subject [see here]. However, I did not understand the answer given to that question and in BeautifulSoup-3.1.0.1 BeautifulSoup.py does not appear to contain any instances of "encodedName" or "Tag.__str__"

A: 

just use a propper xml parser instead of a lib thats made to deal with broken files

i sugest to just take a look at xml.etree or lxml

Ronny
+5  A: 
import html5lib
from html5lib import treebuilders

f = open("mydocument.html")
parser = html5lib.XMLParser(tree=treebuilders.getTreeBuilder("beautifulsoup"))
document = parser.parse(f)

'document' is now a BeautifulSoup-like tree, but retains the cases of tags. See html5lib for documentation and installation.

TML
A: 

According to Leonard Richardson, creator|maintainer of Beautiful Soup, you can't.

Rob Carr
+1  A: 

It's much better to use lxml. It's much, much faster than BeautifulSoup. It has a compatibility API for BeautifulSoup too if you don't want to learn the lxml API.

Ian Blicking agrees.

There's no reason to use BeautifulSoup anymore, unless you're on Google App Engine or something where anything not purely Python isn't allowed.

It's more suited for XML as well.

Wahnfrieden
A: 

2 TML:

there was no XMLParser exists in html5lib :)

Denis Barmenkov