tags:

views:

374

answers:

4

Hi guys.

I'm writing code to generate character-based pagination. I have articles in my site that I want to split up based on length.

The code I have so far is working albeit two issues:

  1. It's splitting pages in the middle of words and HTML tags; I want it to only split after a complete word, tag, or a punctuation mark.
  2. In the pagination bar, it's generating the wrong number of pages.

In the pagination bar, it's generating the wrong number of pages.

Need help addressing these two issues. Code follows:

$text = file_get_contents($View);
$ArticleLength = strlen($text);
$CharsPerPage = 5000;
$NoOfPages = round((double)$ArticleLength / (double)$CharsPerPage);
$CurrentPage = $this->ReturnNeededObject('pagenumber');
$Page = (isset($CurrentPage) && '' !== $CurrentPage) ? $CurrentPage : '1';
$PageText = substr($text, $CharsPerPage*($Page-1), $CharsPerPage);
echo $PageText;
?> <p> <?php
for ($i=1;$i < $NoOfPages+1;$i++)
{
 if ($i == $CurrentPage)
 {
 ?>
  <strong> <?php echo $i; ?> </strong>
 <?php
 }
 else
 {
 ?>
 <a href="<?php echo $i; ?>"><?php echo $i; ?></a>
 <?php
 }
 ?> | <?php
}
?> </p>

What am I doing wrong? I think I'm almost there...

A: 
$NoOfPages = round((double)$ArticleLength / (double)$CharsPerPage);

That should use ceil instead of round - if you use round, 4.2 pages will only show 1-4 - you need a 5th page to show the last .2 of a page.

The other part is harder ... its common to use some sort of marker in the file to indicate where the page breaks go as no matter how clever your code, it can't appreciate where is a good break in then same way a human can.

If you insist on doing it suggest some logic that first works forwards/backwards to the nearest space when a page break is created, which isn't too tricky. More tricky is deciding when you are within a tag or not .... think you'll either need some fairly heavy regex, or else an HTML parsing tool.

benlumley
A: 

You're calculating the number of pages wrong... you should be using ceil() not round() (for example 4.1 pages worth of text is still 5 pages to display).

To fix the other issue, you're going to have big problems if there's arbitrary HTML in there. For example, you need to know that <div>s and <p>s are OK to split, but <table>s aren't (unless you want to get really fancy)!

To do it properly you should use an HTML library to build a tree of elements and then go from there.

Greg
+1  A: 

Thanks, guys. I put in the fix for the 1st point and it worked beautifully.

Hm. I guess it is messy to do the second point. I've found some regex on-line. Will think, write, and get back to you when I make some progress.

Thanks again.

A: 

Based on your first statement,

It's splitting pages in the middle of words and HTML tags

it appears that your character count is being done after markup is inserted. That would imply that e.g. long URLs in links would be counted against the page length you're trying to achieve. However, you didn't say how the articles were being created initially.

I'd suggest looking for a point in the process of creating the article where you could examine the raw text. By regarding the actual content (without markup) as a set of paragraphs, and estimating the vertical length of each paragraph based on typical number of characters per line, you can come up with a more consistent sizing.

I would also consider only breaking between paragraphs, to keep units of thought together on the same page. Speaking as a reader, I really hate going to sites that force me to pause, hit a button or link, and wait for a page reload, all in the middle of a single thought.

joel.neely