views:

418

answers:

2

Lately I have been seeing an increasing number of design articles encouraging the use of typographic quotes (Curly Quotes) for use in web pages over straight quotes.

While I agree typographically, that correct quotes are that much more appealing and add a nice touch to a design, I wonder if it is actually better practice from an encoding standpoint.

I have found that these quotes do not copy and paste across documents as well, Do not necessarily scrape as well, and can end up with the annoying missing character symbol. Never mind when they are used in sample code blocks, I hate that.

I am not very knowledgeable on text encodings so I am wondering if anyone has any advice on this subject?

Is " better or more reliable then “

Edit: This issue mainly applies to content areas, where <q> or <blockquote> is semantically not the best option. Also, things like plurals and such.

A: 

I think it is purely a design issue. For me, the prime objective of a web page should be semantic structure. Machine and human readers of web pages are able to use straight quotes without much difficulty. Using characters outside of the usual range of characters is only asking for trouble. Maybe you could use some CSS to satisfy your designers?

Use of blockquote is probably best as this is semantic, have a look at http://simonwillison.net/2003/May/21/stylingBlockquotes/

Program.X
However it also seems best practice to encode double straight quotes as " in content. Wouldn't this be seen as outside of the character range as well? And if so why would machines have a more difficult time with “ ?
About the semantics, this would particularly pertain to areas that we not actually quotes, (<q> or <blockquote>) but "required the characters anyway". Actual quote tags are easy enough to handle with CSS. It's the encodings that I don't understand.
“ is perfectly readable by machines, it's whether developers of the software on those machines have catered for them. As “ is a textual entity, not a numeric one (eg.  ) some effort has to be taken to cater for it. (Bad or not, I have never seen “ let alone coded for it)
Program.X
Further, any machine reading your content where quotes are important should possibly be using some sort of stack anyway to "match up" your quotes. Straight or not, it is immaterial. I always err towards simplicity, hence keeping within "well known" characters/entities and styling on top of that.
Program.X
Thanks for your thoughts on this, that makes sense.
+11  A: 
Evan
Is there any difference is using “ instead of ” ? also do any of these entities require a particular charset? Should I be using UTF-8?
Have answered your questions in an edit above.
Evan
Thank you for your answers, they were very insightful. Thanks for taking the time to make all those links too. The joelonsoftware article is just what I need and the htmlhelp link is now a permanent bookmark. big +1
Note that HTML 4.0 isn't the latest recommendation, HTML 4.01 is. And the short answer to "unknown (google)" is: no, character encoding doesn't matter when using character references, they *always* refer to characters in the document character set.
cic
Or as the current standard states: "Character references are a character encoding-independent mechanism for entering any character from the document character set." -- http://www.w3.org/TR/html401/charset.html#entities
cic