I am parsing html from the html file through html agility pack, but some of files from them are badly written. And I can not parse them. Now for tidying that html file I am using htmltidy pack. But with that I can not make tidy some html file. While if I make those files tidy through notepad++ then I can.
And I am using htmltidy pack that is used by notepad++. So I can not understand that where I am doing mistake with html tidy ?
EDIT: When I tidy some htmlpage then it cleans the source of that htmlpage. My code to tidy the html page:
Tidy.Document doc = new Tidy.Document();
int err_code = doc.ParseFile(file);
if (err_code < 0)
throw new Exception("Unable to parse file: " +
file);
doc.SetOptValue(TidyATL.TidyOptionId.TidyReplaceColor, "yes");
doc.SetOptValue(TidyATL.TidyOptionId.TidyWrapSection, "0");
doc.SetOptValue(TidyATL.TidyOptionId.TidyWrapAsp, "no");
doc.SetOptValue(TidyATL.TidyOptionId.TidyWrapJste, "no");
doc.SetOptValue(TidyATL.TidyOptionId.TidyWrapPhp, "no");
doc.SetOptValue(TidyATL.TidyOptionId.TidyPreTags, "yes");
// Parse the file
err_code = doc.CleanAndRepair();
if (err_code < 0)
throw new Exception(
"Unable to clean/repair file: " + file);
err_code = doc.RunDiagnostics();
if (err_code < 0)
throw new Exception(
"Unable to run diagnostics on file: " + file);
// Commit tidied file
doc.SaveFile(file);
My file can not be parsed.So result in err_code comes as 2 instead of 1.Because when other files goes to tidy successfully this result is 1. But I can not understand why this happens to this file? Almost I am making mistake in with tidying the file which is with lot of errors.