I want to truncate some text (loaded from a database or text file), but it contains HTML so as a result the tags are included and less text will be returned. This can then result in tags not being closed, or being partially closed (so Tidy may not work properly and there is still less content). How can I truncate based on the text (and probably stopping when you get to a table as that could cause more complex issues).
substr("Hello, my <strong>name</strong> is <em>Sam</em>. I´m a web developer.",0,26)."..."
Would result in:
Hello, my <strong>name</st...
What I would want is:
Hello, my <strong>name</strong> is <em>Sam</em>. I´m...
How can I do this?
While my question is for how to do it in PHP, it would be good to know how to do it in C#... either should be OK as I think I would be able to port the method over (unless it is a built in method).
Also note that I have included an HTML entity ´
- which would have to be considered as a single character (rather than 7 characters as in this example).
strip_tags
is a fallback, but I would lose formatting and links and it would still have the problem with HTML entities.