tags:

views:

327

answers:

4

I am using a Richtext box control to post some data in one page. and I am saving the data to my db table with the HTML mark up Ex : This is <b >my bold </b > text

I am displaying the first 50 characters of this column in another page. Now When i am saving, if i save a Sentence (with more than 50 chars )with bold tag applied and in my other page when i trim this (for taking first 50 chars) I would lost the closing b tag (</b>) .So the bold is getting applied to rest of my contents in that page.

How can i solve this ? How can i check which all open tags are not closed ? is there anyeasy way to do this in PHP. Is there any function to remove my entire HTML tags / mark up and give me the sentence as plain text ?

+3  A: 

http://php.net/strip_tags

the strip_tags function will remove any tags you might have.

scragar
+2  A: 

Yes

$textWithoutTags = strip_tags($html);
alex
+1  A: 

I generally use HTML::Truncate for this. Of course, being a Perl module, you won't be able to use it directly in your PHP - but the source code does show a working approach (which is to use an HTML parser).

An alternative approach, might be to truncate as you are doing at the moment, and then try to fix it using Tidy.

David Dorward
A: 

If you want the HTML tags to remain, but be closed properly, see PHP: Truncate HTML, ignoring tags. Otherwise, read on:

strip_tags will remove HTML tags, but not HTML entities (such as &amp;), which could still cause problems if truncated.

To handle entities as well, one can use html_entity_decode to decode entities after stripping tags, then trim, and finally reencode the entities with htmlspecialchars:

$text = "1 &lt; 2\n";
print $text;
print htmlspecialchars(substr(html_entity_decode(strip_tags($text), ENT_QUOTES), 0, 3));

(Note use of ENT_QUOTES to actually convert all entities.)

Result:

1 < 2
1 <

Footnote: The above only works for entities that can be decoded to ISO-8859-1. If you need support for international characters, you should already be working with UTF-8 encoded strings, and simply need to specify that in the call to html_entity_decode.

Søren Løvborg