Why indeed? Wouldn't be something like &br;
more appropriate?
views:
890answers:
8A tag and a character entity reference exist for different reasons - character entities are stand-ins for certain characters (sometimes required as escape sequences - for example &
for an ampersand &
), tags are there for structure.
The reason the <br>
tag exists is that HTML collapses whitespace. There needs to be a way to specify a hard line break - a place that has to have a line break. This is the function of the <br>
tag.
There is no single character that has this meaning, though U+2028 LINE SEPARATOR
has similar meaning, and even if it were to be used it would not help as it is considered to be whitespace and HTML would collapse it.
See the answers from @John Kugelman and @John Hanna for more detail on this aspect.
Not entirely related, there is another reason why a &br;
character entity reference does not exist: a line break is defined in such a way that it could have more than one character, see the HTML 4 spec:
A line break is defined to be a carriage return (

), a line feed (

), or a carriage return/line feed pair.
Character entities are single character escapes, so cannot represent this, again in the HTML 4 spec:
A character entity reference is an SGML construct that references a character of the document character set.
You will see that all the defined character entities map to a single character. A line break/new line cannot be cleanly mapped this way, thus an entity is required instead of a character entity reference.
This is why a line break cannot be represented by a character entity reference.
Regardless, it not not needed as simply using the Enter key inserts a line break.
Yes. An HTML entity would be more appropriate, as a break tag cannot contain text and behaves much like a newline.
That's just not the way things are, though. Too late. I can't tell you the number of non-XML-compatible HTML documents I've had to deal with because of unclosed break tags...
Entities are content, tags are structure or layout (very roughly speaking). It seems whoever made the <br>
a tag decided that breaking a line has more to do with structure and layout than with content. Not being able to actually "see" a <br>
I'd tend to agree. Oh and I'm making this up as I go so feel free to disagree ;)
br
elements can be styled, though. How would you style an HTML entity? Because they're elements it makes them more flexible.
In HTML all line breaks are treated as white space:
A line break is defined to be a carriage return (

), a line feed (

), or a carriage return/line feed pair. All line breaks constitute white space.
And white space does only separate words and sequences of white space is collapsed:
For all HTML elements except
PRE
, sequences of white space separate "words" (we use the term "word" here to mean "sequences of non-white space characters"). […][…]
Note that a sequence of white spaces between words in the source document may result in an entirely different rendered inter-word spacing (except in the case of the
PRE
element). In particular, user agents should collapse input white space sequences when producing output inter-word space. […]
This means that line breaks cannot be expressed by plain characters. And although there are certain special characters in Unicode to unambiguously separate lines and paragraphs, they are not specified to do this in HTML too:
Note that although


and

are defined in [ISO10646] to unambiguously separate lines and paragraphs, respectively, these do not constitute line breaks in HTML […]
That means there is no plain character or sequence of plain characters that is to mark a line break in HTML. And that’s why there is the BR
element.
Now if you want to use &br;
instead of <br>
, you just need to declare the entity br to represent the value <br>
:
<!ENTITY br "<br>">
Having this additional entity named br declared, a general-purpose XML or SGML processor will replace every occurrence of the entity reference &br;
with the value it represents (<br>
). An example document:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd" [
<!ENTITY br "<br>">
]>
<HTML>
<HEAD>
<TITLE>My first HTML document</TITLE>
</HEAD>
<BODY>
<P>Hello &br;world!
</BODY>
</HTML>
HTML is a mark-up language - it represents the structure of a document, not how that document should appear visually. Take the <EM>
tag as an example - it tells user-agents that they should give emphasis to any text that is placed between the opening and closing <EM>
tags. However, it does not state how that emphasis should be represented. Yes, most visual web-browsers will place the text in italics, but this is only convention. Other browsers, such as monochrome text-only browsers may display the text in inverse. A screen reader might read the text in a louder voice, or change the pronunciation. A search-engine spider might decide the text is more important than other elements.
The same goes for the <BR>
tag - it isn't just another character entity, it actually represents a break in the document structure. A <BR
> is not just a replacement for a newline character, but is a "semantic" part of the document and how it is structured. This is similar to the way an <H1>
is not just a way of making text bigger and bolder, but is an integral part of the way the document is structured.