views:

145

answers:

3

How to insert the ending html tags where there are missing ending tags ?

Like

 <tr>
 <td>Index No.</td><td>Name</td>

 <tr>
 <td>1</td><td>Harikrishna</td>

Where two missing ending tags.That is "/tr".Now in this case how to Search where are the missing tag and there how to insert appropriate ending tag such as "/tr".

+2  A: 

This seems like a very though task to do if you want to handle all possible cases. HTML is not a regular language. IMHO you should try to solve the problem at the source which is how in the first place you got invalid HTML.

Darin Dimitrov
@Darin Dimitro,What is IMHO ?
Harikrishna
@Harikrishna, In My Humble Opinion.
Darin Dimitrov
@Darin Dimitrov,I am using Html Agility Pack to parse html content.And if any html file there is missing tag then parsing is not done correctly.Can Html Agility Pack help to insert the ending tag where there is missing ending tag anyway ?
Harikrishna
+1  A: 

You might take a look at HTML Tidy and see if it works for what you need.

Amber
@Dav, I want to write the code such that it insert the ending tag where there is missing ending tag.
Harikrishna
+1, this is good advice. It is the only hope you have to get info out of a malformed HTML doc. Regenerate the HTML from what you get out of Tidy. It is never going to be 100% reliable.
Hans Passant
@nobugz,HTML Tidy is a tool or source code from we can regenerate the html source. If it is a tool then what code can we write to insert the ending tag where they are missing ?
Harikrishna
+1  A: 

I cannot comment on the above, so I'll note it here. You can use HTML Tidy also for cleaning HTML fragments. See examples here:
http://www.php.net/manual/en/tidy.examples.basic.php

An alternative to HTML Tidy is to clean your output code with regular expressions - I provide an example below. However please note that even though this might be faster in terms of processing, it is not that universal not robust (maintenance-wise) as HTML Tidy is.

Code

<?php

$html = "
<table>
<tr class=\"lorem\">
<td>Index No.</td>
<td>Name</td>

<tr>
<td>0</td>
<td>FooBaz</td>

<tr>
<td>1</td>
<td>Harikrishna</td>

<tr class=\"ipsum\">
<td>2</td>
<td>Foo</td>
</tr>

<tr>
<td>3</td>
<td>Bar</td>


</table>
";

// regex magic
$start_cond = "<tr(?:\s[^>]*)?>";
$end_cond = "(?:{$start_cond}|<\/table>)";
$row_contents = "(?:(?!{$end_cond}).)*";

// first remove all </tr> tags
$xhtml = preg_replace( "/<\/tr>/ism", "", $html );

// now re-add </tr> tags where appropriate
$xhtml = preg_replace( "/({$start_cond})({$row_contents})/ism", "$1$2</tr>\n", $xhtml );

// ignore: just for writing comparision output
echo "<h2>Before:</h2>"; show_count( $html );
echo "<h2>After</h2>"; show_count( $xhtml );

function cmp($patt,$html) {
    $count = preg_match_all( "/{$patt}/ism", $html, $matches);
    return htmlentities("\n{$count} x {$patt}");
}
function show_count($html) {
    echo "<pre>"
        . cmp("<tr(\s[^>]*)?>",$html)
        . cmp("<\/tr>",$html)
        . "</pre>";
}
?>

Output


Before:
5 x <tr(\s[^>]*)?>
1 x <\/tr>

After
5 x <tr(\s[^>]*)?>
5 x <\/tr>
MicE
@MicE.. Do you know code to do this in c# ?
Harikrishna
I'm sorry, but I'm afraid that I don't. The above example is in PHP, however the regular expressions and the logic for doing that should be mostly the same regardless of the language, providing that the language supports the commonly used PCRE syntax (PCRE = Perl Compatible Regular Expressions).
MicE