ansaurus

Question

Answer 1

+1 A:

Close the anchor tag if you don't want the bold tag to be part of it.

x = <a href="stackoverflow.com">Something</a>

If you don't close the anchor, most browser will assume that the rest of the document is contained within this tag.

Also, could I recommend that you use <strong> instead of <b> since <b> is not semantic.

marcgg 2010-10-07 12:50:29

Answer 2

A:

Modern browsers do a good job of "cleaning up" broken or invalid HTML code. Obviously though there are lots of situations where what the author intends is not what the browser interprets. Your example is a good one: where should the browser insert the closing </a> tag? The browser has internally a bunch of rules to decide where to do this (which in your case doesn't give you what you want).

The only way to reliably get a browser to render exactly want you want is to ensure that what you are sending to the browser is correct! In that case, look at your HTML strings independently and add missing end tags where needed.

(Depending on the complexity of the HTML, there's possibly a number of approaches to this. You might be able to get away with manually checking each string, or if the HTML is more complex, you might need to use a parser.)

Richard 2010-10-07 12:59:40

Answer 3

A:

You must find all the tags in the HTML snippets and make sure that they are closed properly.

A simple solution is to use this regexp: r<[^>]+> and this pseudocode:

find next match:
    if match ends with `/>`:
        continue
    if match starts with '</':
        Pop element from stack and make sure that the name matches the element from the match
    else:
        Push element name on stack

for each element on stack:
    print '</%s>' % element.name

Aaron Digulla 2010-10-07 13:00:16

Hmm - that's going to print out a lot of void elements that are perfectly valid html.

Alohci 2010-10-07 13:34:14

@Alohci: What is a "void element"? Do you mean elements without children? Those are handled correctly. If you mean "<p>" elements without an end tag, then those should be closed as well but that's beyond the scope of my example. Use a HTML tidy tool for that.

Aaron Digulla 2010-10-07 15:10:31

Void elements (http://dev.w3.org/html5/spec/syntax.html#void-elements) are those that have an empty content model. E.g. meta, br, input etc. Although it's valid to use "/>" to end them in HTML5, the more usual usage is to simply end them with ">". It would be a lot of wasted effort to convert them to "/>" just to comply with your algorithm.

Alohci 2010-10-07 15:48:19

@Alohci: "wasted effort"? I think it would make checking the source much easier if you used XHTML syntax.

Aaron Digulla 2010-10-08 08:41:47

@Aaron - Well, personally I like XHTML syntax. But there are problems when serving it as text/html. `<script src="example.js" />` is the best known and it wouldn't get picked up as a problem by your algorithm. In any case the question was about html, not xhtml, and you can't just mix and match. For example, in HTML4.01 transitional, `<meta name="key1" content="val1"><meta name="key2" content="val2">` is valid, but `<meta name="key1" content="val1"/><meta name="key2" content="val2"/>` will fail validation and there's no way to fix it, other than removing the '/' characters.

Alohci 2010-10-08 09:14:20

ansaurus

tags:

views:

answers:

Working with invalid HTML tags

related questions