tags:

views:

90

answers:

3

I'm transforming some XML, which I have no control over, to XHTML. The XML schema defines a <para> tag for paragraphs and <unordered-list> and <ordered-list> for lists.

Frequently in this XML, I find lists nested within paragraphs. So, a straight-forward transformation causes <ul>s to get nested within <p>s, which is illegal in XHTML.

I've created a list of ways to deal with it and here are the most obvious:

  1. Just don't worry about it. The browsers will do fine. Who cares. (I don't like this option, but it's an option!)
  2. Write a fancy-pants component to my transform that makes sure all <para> tags get closed before unordered lists start, and re-opened afterward. (I like this option the most, but it's complicated due to multiple levels of nesting, and we may not have the budget for this)
  3. Just transform <para> to <div> and set the margins on the divs so it looks like a paragraph in the browser. This is the easiest solution that emits valid XHTML, but it takes from the semantic value of the markup.

My question is, how much value do I lose if I go with option 3? Does it really matter? What is the actual effect on the user experience? If you can cite references, please do (this is easy to speculate on).

For example, I was thinking it might affect search results from a Google Search Appliance that we are using. If search terms appear in divs, do they carry less weight? Or is there less of an association between them and preceding header tags? How can I find this out?

+1  A: 

First of all, unless you set every CSS property available now plus every one possibly available in the future, then you can't guarantee your <div> will match up, WRT styles, with <p>. (Though I agree you can get close and this is probably good enough, but read on.) I don't know of any visual browsers or other tools that would seriously treat them differently, but this is just as much an artifact, IMHO, of the current widespread loose interpretation on the web, as it is of them being close in meaning.

Is <ul> the right transformation for every <unordered-list> in your source data? If they are always displayed as block-level content instead of 1) an, 2) inline, 3) list; then that's a safe bet. If so, you can break the paragraph into two (and wrap the whole thing in <div> if you like).

Example input:

<para>Yadda yadda: <unordered-list/> And so fin.</para>

Output:

<div>
<p>Yadda yadda:</p>
<ul/>
<p>And so fin.</p>
</div>
Roger Pate
A: 

The good news is that any of these 3 options would work.

There are many, many people on SO that will tell you "if it works, forget semantics and do it." So Option 1 would probably be a site favorite if everyone here was asked.

Option 2 is my favorite and would be the best semantically. I would definetely do it if time/budget allows.

However, Option 3 is a close second and hopefully this will answer your question: The <div> element and the <p> element are near-identical. In fact, the biggest difference is semantics. They each have only one rule applied to them in most browsers' CSS specification: display: block.

T Pops
Do you know how I could figure out how user agents and search engines treat the semantic differences between the two, `<div>` and `<p>`?
Chris
Yeah: Ask a question on StackOverflow! :)
Carl Smotricz
Someone beat you to it: <http://stackoverflow.com/questions/907313/is-the-div-tag-ever-an-undesirable-alternative-to-the-p-tag> . A person named Ambrose claimed that search engines understand proper syntax better, but he didn't substantiate that claim.
Carl Smotricz
OK, I've gone on a Google search myself. My conclusion is that there is a kind of "folklore" idea about what search engines will and will not do with your tags, but it looked to me more like superstition than proven knowledge. Consider that searchers work hard to keep their algorithms secret to avoid being scammed by SEO'ers.
Carl Smotricz
I DID ask this in my original question: "My question is, how much value do I lose if I go with option 3?". I guess I was hoping for something like "google goes indeed apply a weight coefficient of x on div and y on p, as documented here: (url)". High hopes. But your answer was helpful, so thank you.
Chris
+1  A: 

I've come up against this too.

Personally, I consider it a grave mistake on part of the standard that a p cannot contain lists. I think it's typographically legal, so it should be legal in what was originally intended to be a markup for text.

I may be flamed for this, but XHTML has crashed and burned in the real world, regardless of whether it was a good idea or not. The often horrible tag soup that is today's HTML markup will continue to survive for a goodly long time, if only because bad markup and lenient browsers will continue to perpetuate each other forever.

Thus, I tend to go with Option 1.

Option 3 is also viable, in my opinion. While I don't have proof, I'm pretty sure no search engine is crazy enough to actually put any trust in most of the formatting tags we apply to our HTML. meta and a tags are obvious exceptions, of course.

Carl Smotricz
ACtually, you'd be surprised. I'm pretty sure Google makes some attempt to detect ie, invisible elements so that people can't just put a bunch of search terms in an invisible div and boost their page ranking to #1.
Matthew Scharley
I think it is a good point that search engines couldn't put too much value in a `<p>` over a `<div>` because they know they simply can't trust them to be used properly.
Chris
@ Matthew Scharley: Good point, I hadn't thought of that. Let's say the engines don't believe us humans but they cleverly defend against being hoodwinked by us.
Carl Smotricz