views:

55

answers:

0

I am parsing html from the html file through html agility pack, but some of files from them are badly written. And I can not parse them. Now for tidying that html file I am using htmltidy pack. But with that I can not make tidy some html file. While if I make those files tidy through notepad++ then I can.

And I am using htmltidy pack that is used by notepad++. So I can not understand that where I am doing mistake with html tidy ?

EDIT: When I tidy some htmlpage then it cleans the source of that htmlpage. My code to tidy the html page:

        Tidy.Document doc = new Tidy.Document();

        int err_code = doc.ParseFile(file);


        if (err_code < 0)
            throw new Exception("Unable to parse file: " +
            file);


        doc.SetOptValue(TidyATL.TidyOptionId.TidyReplaceColor, "yes");
        doc.SetOptValue(TidyATL.TidyOptionId.TidyWrapSection, "0");
        doc.SetOptValue(TidyATL.TidyOptionId.TidyWrapAsp, "no");
        doc.SetOptValue(TidyATL.TidyOptionId.TidyWrapJste, "no");
        doc.SetOptValue(TidyATL.TidyOptionId.TidyWrapPhp, "no");
        doc.SetOptValue(TidyATL.TidyOptionId.TidyPreTags, "yes");

        // Parse the file
        err_code = doc.CleanAndRepair();

        if (err_code < 0)
            throw new Exception(
               "Unable to clean/repair file: " + file);

        err_code = doc.RunDiagnostics();

        if (err_code < 0)
            throw new Exception(
               "Unable to run diagnostics on file: " + file);

        // Commit tidied file
        doc.SaveFile(file);

My file can not be parsed.So result in err_code comes as 2 instead of 1.Because when other files goes to tidy successfully this result is 1. But I can not understand why this happens to this file? Almost I am making mistake in with tidying the file which is with lot of errors.