Your table could be something like
CREATE TABLE ArticleText (
INTEGER artId,
INTEGER wordNum,
INTEGER wordId,
PRIMARY KEY (artId, wordNum),
FOREIGN KEY (artId) REFERENCES Articles,
FOREIGN KEY (wordId) REFERENCES Words
)
this of course may be very space-expensive, or slow, etc, but you'll need some measurements to determine that (as so much depends on your DB engine). BTW, I hope it's clear that the Articles table is simply a table with metadata on articles keyed by artId, and the Words table a table of all words in every article keyed by wordId (trying to save some space there by identifying already-known words when an article is entered, if that's feasible...). One special word must be the "end of paragraph" marker, easily identifiable as such and distinct from every real word.
If you do structure your data like this you gain lots of flexibility in retrieving by page, and page length can be changed in a snap, even query by query if you wish. To get a page:
SELECT wordText
FROM Articles
JOIN ArticleText USING (artID)
JOIN Words USING (wordID)
WHERE wordNum BETWEEN (@pagenum-1)*@pagelength AND @pagenum * @pagelength + @extras
AND Articles.artID = @articleid
parameters @pagenum
, @pagelength
, @extras
, @articleid
are to be inserted in the prepared query at query time (use whatever syntax your DB and language like, such as :extras
or numbered parameters or whatever).
So we get @extras
words beyond expected end-of-page and then on the client side we check those extra words to make sure one of them is the end-paragraph marker - otherwise we'll do another query (with different BETWEEN
values) to get yet more.
Far from ideal, but, given all the issues you've highlighted, worth considering. If you can count on the page length always being e.g. a multiple of 100, you can adopt a slight variation of this based on 100-word chunks (and no Words
table, just text stored directly per row).