views:

110

answers:

3

I often run across developers that insist on using the XHTML doctype and when I ask why they common response is that its "machine readable". Once the project is underway the markup does not validate.

Now that their markup does not validate...is machine readability valid anymore?

I assume that if it does not validate it can't be processed as XML and cant be queried using xpath.

+1  A: 

Being well-formed XML and valid XHTML are two different issues. But anyway, plain HTML is perfectly machine readable if it is well-formed and valid. The only difference is that there are more, better tools and libraries for working with XML content than SGML content. Certainly I find it easier to genrerate valid XHTML than valid HTML, but there's no real excuse for generating invalid documents of either type.

Nat
"Being well-formed XML and valid XHTML are too different issues."So invalid XHTML can still be processed using standard xml processing libraries?
Detroitpro
Yes. Most XML parsers let you turn validation off, so you can load the XML even if it is invalid w.r.t. the DTD or schema.
Nat
"there's no real excuse for generating invalid documents of either type". You'd think, except that I have the HTML validator extension installed in Firefox, and approximately 0% (+/-1%) of web pages out there are valid. So even if there's no excuse, there must be reasons.
Steve Jessop
The reasons are that most web developers either don't care or are incompetent or both. I don't consider that an excuse.
Nat
Perhaps you should raise it on UserVoice as an issue against StackOverflow, then...
Steve Jessop
Actually, plain HTML is machine readable even if it isn't valid or well formed. Any byte stream is. This is what browsers do. When they implement the HTML 5 parsing algorithm, all the browsers will even handle arbitrary byte streams as HTML consistently.
Alohci
A: 

If you screw things up, you screw things up. If they tell you the pros about XHTML and are not delivering XHTML, but something else it will simply not be an easy job and it highly depends on how NOT Xhtml/html compatible their product is. However depending on your environment and your use you should consider using tidyhtml.

merkuro
A: 

Developers probably won't have to process the finished markup sent to the clients, because they can connect into any stage of the preprocessing. Therefore any error later in the chain is unlikely to get caught, and will be served to the users until there's a visual bug or someone tries to parse / validate at some later stage during / after preprocessing. Validation can catch such bugs preemptively, but have you ever heard of a workplace where "preemptive" is not a buzzword?

l0b0