html5lib

What revision of html5lib is stable?

html5lib notes that it's latest release (0.11) is somewhat old. Using the Python portion, I have recursion problems as noted in Issue 70 and Issue 59 but can't find a recent Mercurial revision that is stable. The latest tip is no good, I got the following error from python setup.py install: byte-compiling build/bdist.linux-x86_64/egg/h...

Adding consistent whitespace to HTML using Python

I just started working on a website that is full of pages with all their HTML on a single line, which is a real pain to read and work with. I'm looking for a tool (preferably a Python library) that will take HTML input and return the same HTML unchanged, except for adding linebreaks and appropriate indentation. (All tags, markup, and c...

How to install html5lib-0.90 library for Python on Windows?

I'm using Windows, and trying to install html5lib-0.90 library on python C:\>python C:\Users\Junior\Downloads\Python\html5lib-0.90\setup.py install Traceback (most recent call last): File "C:\Users\Junior\Downloads\Python\html5lib-0.90\setup.py", line 36, in <module> for name in os.listdir(os.path.join('src','html5lib')) WindowsError: ...

Parse html and find data in the html

Hi all. I am trying to use html5lib to parse an html page in to something I can query with xpath. html5lib has close to zero documentation and I've spent too much time trying to figure this problem out. Ultimate goal is to pull out the second row of a table: <html> <table> <tr><td>Header</td></tr> <tr><td>Want This</...

Skip sanitization for videos in html5lib

I am using a wmd-editor in django, much like this one in which I am typing. I would like to allow the users to embed videos in it. For that I am using the Markdown video extension here. The problem is that I am also sanitizing user input using html5lib sanitization and it doesn't allow object tags which are required to embed the videos. ...

html5lib/lxml examples for BeautifulSoup users?

I'm trying to wean myself from BeautifulSoup, which I love but seems to be (aggressively) unsupported. I'm trying to work with html5lib and lxml, but I can't seem to figure out how to use the "find" and "findall" operators. By looking at the docs for html5lib, I came up with this for a test program: import cStringIO f = cStringIO.S...