tags:

views:

98

answers:

2

I am building an HTML Gui builder and this involves round-tripping HTML pages from the browser to the server and back again.

On the back-end I have an xml parser which expects well-formed tags.

I kick off by writting well-formed HTML - for example:

<head>
  <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-15" />
  <link rel="stylesheet" type="text/css" href="/some/path/to/some.css" />
</head>

The browser decides it knows best and turns this into:

<head>
  <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-15">
  <link rel="stylesheet" type="text/css" href="/some/path/to/some.css">
</head>

The second plan was to force in separate closing tags:

<head>
  <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-15"></meta>
  <link rel="stylesheet" type="text/css" href="/some/path/to/some.css"></link>
</head>

That doesn't work either.

The initial plan was just to snip out copies of the part of the document and cycle them back to the server with the new page. It seems my only option is to manually go through all the tags (there are more than in this example) and fix them all up before I round trip them.

Am I missing something? How do I get the browser to make the HTML be well behaved?

A: 

The basic problem is that HTML has no concept of well-formedness.

I'm assuming you are reading the HTML with innerHTML (which gets a serialization of the DOM), and are experiencing problems with Internet Explorer.

IE doesn't support XHTML and it's internal representation is HTML based. What is well formed in XHTML is an error in HTML, so the results are not entirely unexpected.

I suggest preprocessing the data on the server (with tidylib for example) to convert it from HTML to XHTML.

David Dorward
Firefox 3.5. I am processing it serverside with xmerl in Erlang (which is why I need to send it back well-formed).
Gordon Guthrie
+3  A: 

This is not well-formed HTML; it's XML or XHTML:

<link rel="stylesheet" type="text/css" href="/some/path/to/some.css" />

The confusion is explained here: http://www.cs.tut.fi/~jkorpela/html/empty.html

innerHTML is exactly that - HTML. You may be able to produce XML from the DOM - try here as a start: http://www.devarticles.com/c/a/JavaScript/More-on-JavaScript-and-XML/

Andrew Duffy
This fixed it for me...
Gordon Guthrie