tags:

views:

70

answers:

3

Hello,

I'm editing books/articles in HTML. These texts were printed once and I scan them, convert them into an intermediate XML-Format and then I transform them into HTML (by XSLT). Because some of those texts are extinct from the market today and are only available through the major libraries I want to publish them in a way so that people could possibly cite them by referring to the page numbers in the original document. For this purpose my intermediate XML-format has an element that marks a page-break. Right now I'm working on the XML->HTML transformations and I'm wondering myself how to transform these page breaks in HTML. They should not appear in the final HTML by default (so a simple | doesn't fit) but I plan to wrap these documents with some lightweight JavaScript that will show the markers when needed. I thought about <span>s with a | in it that are hidden by default.

Is there a better, possibly 'semantic' way to this problem?

A: 

Maybe you can use an xml tag not parsed/interpreted by html like <pagebreak/>.

In this way viewing the html the tag will be not rendered but using jQuery or any other Javascript library, transform, when asked, these particular tags in standard
or whatsoever visual mark.

I think this can be a semantic approach...

Impiastro
… but not an HTML approach.
David Dorward
Can I make this validating by using a separate namespace for this one element? (like word does when exporting to html). Can I then still do stuff with that element with javascript?
Struce
Yes, not an HTML approach but maybe an XHTML approach.@Struce: you can use a namespace for your special tags and let Javascript handle them and trasform them in other standard and visible XHTML tags.
Impiastro
+3  A: 

Page breaks are very much a thing of layout, and HTML isn't designed to describe layout, so you aren't going to find anything that is semantic for this within the language.

The best you can hope for is some sort of kludge.

Since a page break can occur in the middle of a paragraph, and <p> elements can contain only inline elements you can eliminate most of the options from the outset.

The two possibilities that suggest themselves to me are <span> and <a>. The former has no semantics, that latter is designed to be linked to (with a name attribute) or from (with an href attribute), and you could consider a page from an original document something that you might wish to link to.

No matter what element you use, I wouldn't include a marker in it and then hide it with CSS. That sort of presentational flag is something I would consider adding via :before in a stylesheet (combined with a descendent selector for a body class that can be toggled with JS since you want the toggle)

Alternatively, if you want to take a (very) broad view of the meaning of "HTML" you could consider the l element (from the defunct XHTML 2 drafts) and markup each line of the original document. Adding a class would indicate where a new page began (and you could use CSS counters and borders to clearly indicate each page and number it should you so wish). Pity the browser vendors refused to get behind a real semantic markup language and favoured HTML 5 instead.

David Dorward
I think I have to put something into the span or a because Chrome doesn't seem to like empty ones (it sucks the next elements after an empty span into it until the next span appears [Yes its closed!] - that's what I can see with the Developer Tools...)Thanks anyway. I'll go for something with "a" because of the linking...might have a use later.
Struce
+2  A: 

Use a <div class="Page"> for each page, and have a stylesheet containing:

.Page {
   page-break-after: always;
}
dan04
… and when a page break appears mid-paragraph?
David Dorward