ansaurus

Question

how to divide a long article and store in the database for easy retrieval and with paging?

Answer 1

+2 A:

You could of course output exactly 500 words per page, but the better way would be to put some kind of breaks into your article (end of sentence, end of paragraph). Put these at places where a break would be good. This way your pages won't have exactly X words in it each, but about or up to X and it won't tear sentences or paragraphs apart. Of course, when displaying the pages, don't display these break markers.

schnaader 2009-05-31 20:33:29

Answer 2

+1 A:

You might want to start by breaking the article up into an array of paragraphs by using split command: http://www.php.net/split

$array = split("\n",$articleText);

Travis 2009-05-31 20:39:20

so how do you decide which paragraphs to show when it is page 38?

動靜能量 2009-05-31 21:05:06

Answer 3

+2 A:

I would do it by splitting articles on chuks when saving them. The save script would split the article using whatever rules you design into it and save each chunk into a table like this:

CREATE TABLE article_chunks (
    article_id int not null,
    chunk_no int not null,
    body text
}

Then, when you load a page of an article:

$sql = "select body from article_chunks where article_id = "
    .$article_id." and chunk_no=".$page;

Whenever you want to change the logic of splitting articles into pages, you run a script thats pulls all the chunks together and re-splits them:

UPDPATE: Giving the advice I suppose your application is read-intensive more than write-intensive, meaning that articles are read more often than they are written

artemb 2009-05-31 20:51:28

what if there are a few hundred long articles and re-splitting them may need to stop the site for maintenance... and if there is a bug in the re-splitting script, then the content can be contaminated?

動靜能量 2009-05-31 21:09:26

Well, if there is a bug in any code that works with data, content can be damaged.You can avoid the need for stopping the site by starting and commiting a transaction around saving each article. But stopping a site for maintanance once in a while is a common thing.

artemb 2009-06-01 02:21:08

You wouldn't need to stop the site!!, you could rebuild the article while it's online. I'd also suggest adding a table article (with article_id as an identity/autoincrement/..., and body text), this is the original text that is split into chunks. In the algorithm, i'd set a trigger to update the text of the chunks online... add new chunks that weren't there, and delete unneeded chunks.

Osama ALASSIRY 2009-06-01 10:53:40

Answer 4

+1 A:

It's better way to manually cut the text, because it's not a good idea to leave a program that determines where to cut. Sometimes it will be cut just after h2 tag and continue with text on next page.

This is simple database structure for that:
article(id, title, time, ...)
article_body(id, article_id, page, body, ...)

The SQL query:

SELECT a.*, ab.body, ab.page
FROM article a
INNER JOIN article_body ab
    ON ab.article_id = a.id
WHERE a.id = $aricle_id AND ab.page= $page
LIMIT 1;

In application you can use jQuery to simple add new textarea for another page...

sasa 2009-05-31 20:57:33

says if there a few hundred of such articles, manually splitting them could take too long. also, if it is decided to be 300 words per page next month, can't re-split them by hand again.

動靜能量 2009-05-31 21:11:24

Answer 5

+1 A:

Your table could be something like

CREATE TABLE ArticleText (
  INTEGER artId,
  INTEGER wordNum,
  INTEGER wordId,
  PRIMARY KEY (artId, wordNum),
  FOREIGN KEY (artId) REFERENCES Articles,
  FOREIGN KEY (wordId) REFERENCES Words
)

this of course may be very space-expensive, or slow, etc, but you'll need some measurements to determine that (as so much depends on your DB engine). BTW, I hope it's clear that the Articles table is simply a table with metadata on articles keyed by artId, and the Words table a table of all words in every article keyed by wordId (trying to save some space there by identifying already-known words when an article is entered, if that's feasible...). One special word must be the "end of paragraph" marker, easily identifiable as such and distinct from every real word.

If you do structure your data like this you gain lots of flexibility in retrieving by page, and page length can be changed in a snap, even query by query if you wish. To get a page:

SELECT wordText
FROM  Articles
 JOIN ArticleText USING (artID)
 JOIN Words USING (wordID)
 WHERE wordNum BETWEEN (@pagenum-1)*@pagelength AND @pagenum * @pagelength + @extras
  AND Articles.artID = @articleid

parameters @pagenum, @pagelength, @extras, @articleid are to be inserted in the prepared query at query time (use whatever syntax your DB and language like, such as :extras or numbered parameters or whatever).

So we get @extras words beyond expected end-of-page and then on the client side we check those extra words to make sure one of them is the end-paragraph marker - otherwise we'll do another query (with different BETWEEN values) to get yet more.

Far from ideal, but, given all the issues you've highlighted, worth considering. If you can count on the page length always being e.g. a multiple of 100, you can adopt a slight variation of this based on 100-word chunks (and no Words table, just text stored directly per row).

Alex Martelli 2009-05-31 22:12:56

Answer 6

+1 A:

Let the author divide the article into parts themselves.

Writers know how to make an article interesting and readable by dividing it into logical parts, like "Part 1—Installation", "Part 2—Configuration" etc. Having an algorithm do it is a bad decision, imho.

Chopping an article in the wrong place just makes the reader annoyed. Don't do it.

my 2¢

/0

0scar 2009-06-01 14:00:49

ansaurus

tags:

views:

answers:

how to divide a long article and store in the database for easy retrieval and with paging?

related questions