tags:

views:

44

answers:

3

Hello,

I have a set of HTML codes and I am wondering how can I wrap a code such that it is interpreted correct by the browser and independent of the others.

I think I should give an example:

x = <a href="stackoverflow.com">Something

y = <b>Else</b>

I print x, then y and the browser will consider that y is part of the link defined in x. How can I force the browser to interpret x independent of y; that's is, how can I wrap x and y?

I don't know if it is relevant, but I work in Python.

Thanks!

+1  A: 

Close the anchor tag if you don't want the bold tag to be part of it.

x = <a href="stackoverflow.com">Something</a>

If you don't close the anchor, most browser will assume that the rest of the document is contained within this tag.


Also, could I recommend that you use <strong> instead of <b> since <b> is not semantic.

marcgg
A: 

Modern browsers do a good job of "cleaning up" broken or invalid HTML code. Obviously though there are lots of situations where what the author intends is not what the browser interprets. Your example is a good one: where should the browser insert the closing </a> tag? The browser has internally a bunch of rules to decide where to do this (which in your case doesn't give you what you want).

The only way to reliably get a browser to render exactly want you want is to ensure that what you are sending to the browser is correct! In that case, look at your HTML strings independently and add missing end tags where needed.

(Depending on the complexity of the HTML, there's possibly a number of approaches to this. You might be able to get away with manually checking each string, or if the HTML is more complex, you might need to use a parser.)

Richard
A: 

You must find all the tags in the HTML snippets and make sure that they are closed properly.

A simple solution is to use this regexp: r<[^>]+> and this pseudocode:

find next match:
    if match ends with `/>`:
        continue
    if match starts with '</':
        Pop element from stack and make sure that the name matches the element from the match
    else:
        Push element name on stack

for each element on stack:
    print '</%s>' % element.name
Aaron Digulla
Hmm - that's going to print out a lot of void elements that are perfectly valid html.
Alohci
@Alohci: What is a "void element"? Do you mean elements without children? Those are handled correctly. If you mean "<p>" elements without an end tag, then those should be closed as well but that's beyond the scope of my example. Use a HTML tidy tool for that.
Aaron Digulla
Void elements (http://dev.w3.org/html5/spec/syntax.html#void-elements) are those that have an empty content model. E.g. meta, br, input etc. Although it's valid to use "/>" to end them in HTML5, the more usual usage is to simply end them with ">". It would be a lot of wasted effort to convert them to "/>" just to comply with your algorithm.
Alohci
@Alohci: "wasted effort"? I think it would make checking the source much easier if you used XHTML syntax.
Aaron Digulla
@Aaron - Well, personally I like XHTML syntax. But there are problems when serving it as text/html. `<script src="example.js" />` is the best known and it wouldn't get picked up as a problem by your algorithm. In any case the question was about html, not xhtml, and you can't just mix and match. For example, in HTML4.01 transitional, `<meta name="key1" content="val1"><meta name="key2" content="val2">` is valid, but `<meta name="key1" content="val1"/><meta name="key2" content="val2"/>` will fail validation and there's no way to fix it, other than removing the '/' characters.
Alohci