ansaurus

Question

invalid HTML rendering logic

Answer 1

A:

Not sure what do you mean exactly, but maybe the PHP function htmlentities could help you.

aletzo 2010-08-04 17:55:22

No... see my response to @Mike Caron's comment

JoelFan 2010-08-04 17:57:44

Answer 2

+3 A:

Try looking at the source code for Tidy.

HTML before running through Tidy:

<html>

 <head>
  <title>boo</title>
 </head>

 <body>
   x < y
 </body>

</html>

Same HTML after running through Tidy:

<html>
<head>
  <meta name="generator" content=
  "HTML Tidy for Linux (vers 25 March 2009), see www.w3.org">

  <title>boo</title>
</head>

<body>
  x &lt; y
</body>
</html>

Notice that x < y was changed to x < y.

UPDATE

Based on your comment, you should probably use Tidy to clean up your HTML. I believe there are Tidy libraries for most of the common languages, that will clean up your HTML for you. If you are using PHP, there is PHP Tidy.

UPDATE

I noticed that you said you're using C#. You can use Tidy with C# as well. Here's something I found. I don't develop in C# and I haven't tried this out so YMMV:

Fix Up Your HTML with HTML Tidy and .NET

Vivin Paliath 2010-08-04 17:57:26

Answer 3

A:

Rendering of invalid HTML in browsers is horrible guesswork, and you really shouldn't try to emulate it (it will break). However, replacing some occurrences could be done with a regexp:

preg_replace('/(\s)<(\s)/', '$1&lt;$2', $data);

You 2010-08-04 18:00:14

This will change ` < body>` to ` < body>`. Undesirable.

Vivin Paliath 2010-08-04 18:01:24

http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454

Chuck 2010-08-04 18:29:58

@Vivin: It is. It relies to a certain extent on users formatting their input properly, but it's fairly good. @Chuck: We're not actually parsing HTML here, but yeah.

You 2010-08-04 19:43:10

@You I tend to be more paranoid :)

Vivin Paliath 2010-08-04 20:02:03

Answer 4

A:

Edit: I am assuming you're using PHP, since you didn't specify

Use strip_tags:

$content = strip_tags($content, array('<b><i>'));

This will leave safe tags (as defined by you), and remove everything else.

Mike Caron 2010-08-04 18:02:19

That's … a big assumption

David Dorward 2010-08-04 18:11:37

I'm not using PHP, but I'm using something similar to strip_tags in C#. The problem is that my "strip_tags" thinks that "x < y" contains an unknown (and unterminated) tag called "y" and it "strips" it, leaving just "x"

JoelFan 2010-08-04 18:18:39

@David It's the most common web development language. And, everyone else assumed that too. The onus is on the OP to specify, right?

Mike Caron 2010-08-04 20:02:01

@Joel Ah, in that case, I'd go with someone else's answer. Vivin's is the only one with a C# answer, so... yeah.

Mike Caron 2010-08-04 20:03:23

@David, PHP is the most common language. OP should specify or at least tag his question, otherwise you need to make these assumptions.

You 2010-08-04 20:24:46

Answer 5

A:

The HTML 5 (draft) specification includes a detailed parsing algorithm based on how browsers handle bad markup.

David Dorward 2010-08-04 18:09:17

ansaurus

tags:

views:

answers:

invalid HTML rendering logic

related questions