tags:

views:

27

answers:

1

Hello,

on "twill" documentation page it is written:


By default, twill will run pages through tidy before processing them. This is on by default because the Python libraries that parse HTML are very bad at dealing with incorrect HTML, and will often return incorrect results on "real world" Web pages. To disable this feature, set config do_run_tidy 0


But where is this tidy program located inside twill? I have downloaded "twill 0.9" and looked into "twill" folder contents - I just can't find there such a file (or a module) that would be named "tidy"

+1  A: 

twill uses the commandline version of tidy if installed on your system. the method that calls tidy to clean your code is ocated in the utils.py and named 'run_tidy'. its called by the command 'tidy_ok' which is defined in commands.py

if use_tidy is set to true (which it is by default) the _cleanup_html mehtod in ConfigurableParsingFactory calls the run_tidy method

Nikolaus Gradwohl
Thank You very much, Nikolaus!!!
brilliant