views:

292

answers:

3

Whenever we are fetching some user inputed content with some editing from the database or similar sources, we might retrieve the portion which only contains the opening tag but no closing.

This can hamper the website's current layout.

Is there a clientside or serverside way of fixing this?

+3  A: 

You can use Tidy:

Tidy is a binding for the Tidy HTML clean and repair utility which allows you to not only clean and otherwise manipulate HTML documents, but also traverse the document tree.

or HTMLPurifier

HTML Purifier is a standards-compliant HTML filter library written in PHP. HTML Purifier will not only remove all malicious code (better known as XSS) with a thoroughly audited, secure yet permissive whitelist, it will also make sure your documents are standards compliant, something only achievable with a comprehensive knowledge of W3C's specifications.

Gordon
do you know of any javascript methods to do the same
Starx
@Starx nope sorry.
Gordon
A: 

i have solution for php

<?php
    // close opened html tags
    function closetags ( $html )
        {
        #put all opened tags into an array
        preg_match_all ( "#<([a-z]+)( .*)?(?!/)>#iU", $html, $result );
        $openedtags = $result[1];
        #put all closed tags into an array
        preg_match_all ( "#</([a-z]+)>#iU", $html, $result );
        $closedtags = $result[1];
        $len_opened = count ( $openedtags );
        # all tags are closed
        if( count ( $closedtags ) == $len_opened )
        {
        return $html;
        }
        $openedtags = array_reverse ( $openedtags );
        # close tags
        for( $i = 0; $i < $len_opened; $i++ )
        {
            if ( !in_array ( $openedtags[$i], $closedtags ) )
            {
            $html .= "</" . $openedtags[$i] . ">";
            }
            else
            {
            unset ( $closedtags[array_search ( $openedtags[$i], $closedtags)] );
            }
        }
        return $html;
    }
    // close opened html tags

?>

you can use this function like

   <?php echo closetags("your content <p>test test"); ?>
kamal
I like that function. One problem I see is that it can't fix broken nesting (e.g. `"<b>Bold and <i>Italic</b> text"`), which some users are so skilled at doing.
Andrew
Regex ain't the right tool for complicated HTML parsing
Gordon
+2  A: 

In addition to server-side tools like Tidy, you can also use the user's browser to do some of the cleanup for you. One of the really great things about innerHTML is that it will apply the same on-the-fly repair to dynamic content as it does to HTML pages. This code works pretty well (with two caveats) and nothing actually gets written to the page:

var divTemp = document.createElement('div');
divTemp.innerHTML = '<p id="myPara">these <i>tags aren\'t <strong> closed';
console.log(divTemp.innerHTML); 

The caveats:

  1. The different browsers will return different strings. This isn't so bad, except in the the case of IE, which will return capitalized tags and will strip the quotes from tag attributes, which will not pass validation. The solution here is to do some simple clean-up on the server side. But at least the document will be properly structured XML.

  2. I suspect that you may have to put in a delay before reading the innerHTML -- give the browser a chance to digest the string -- or you risk getting back exactly what was put in. I just tried on IE8 and it looks like the string gets parsed immediately, but I'm not so sure on IE6. It would probably be best to read the innerHTML after a delay (or throw it into a setTimeout() to force it to the end of the queue).

I would recommend you take @Gordon's advice and use Tidy if you have access to it (it takes less work to implement) and failing that, use innerHTML and write your own tidy function in PHP.

And though this isn't part of your question, as this is for a CMS, consider also using the YUI 2 Rich Text Editor for stuff like this. It's fairly easy to implement, somewhat easy to customize, the interface is very familiar to most users, and it spits out perfectly valid code. There are several other off-the-shelf rich text editors out there, but YUI has the best license and is the most powerful I've seen.

Andrew