views:

449

answers:

10
+1  Q: 

storing long text

what is the best way to store long texts (articles) in a database? it doesnt need to be searchable.

i want to allow ppl to read the first chapter of every book in my bookstore. dumping it into a database field makes it difficult to style paragraphs using css..

EDIT: access database

+3  A: 

If it is sql server 2005 USE VARCHAR(MAX)

EDIT,

It seems he saif access,

so i would go with memo

Up to 63,999 characters. (If the Memo field is manipulated through DAO and only text and numbers [not binary data] will be stored in it, then the size of the Memo field is limited by the size of the database.)

or OLE Object (if you can)

An object (such as a Microsoft Excel spreadsheet, a Microsoft Word document, graphics, sounds, or other binary data) linked (OLE/DDE link: A connection between an OLE object and its OLE server, or between a Dynamic Data Exchange (DDE) source document and a destination document.) to or embedded (embed: To insert a copy of an OLE object from another application. The source of the object, called the OLE server, can be any application that supports object linking and embedding. Changes to an embedded object are not reflected in the original object.) in a Microsoft Access table.

Up to 1 gigabyte (limited by available disk space)

astander
Why varchar instead of text/blob ? Isn't varchar used for usually short strings ?
Clement Herreman
VARCHAR(MAX) supercedes use of deprecated TEXT datatype in SQL Server 2005 onwards.
Russ Cam
@mjv -see http://msdn.microsoft.com/en-us/library/ms187993.aspx
Russ Cam
OLE data types is a terrible recommendation. It ties the use of the data to the app registered on the machine where each OLE field is populated. Incompatibilities cross version can make the data inaccessible. The overhead is also quite horrendous (like 64KBs for a 1-byte Word document). A BLOB field might be one answer.
David-W-Fenton
I aggree fully, it was just a recomendation, and i didnt know that access had the blob field???
astander
A: 

Use a CLOB.

is this CLOB thing a system or something? can u show me an example plz.
Noam Smadja
CLOB = Character Large Object
Russ Cam
yea i red that in wikipedia.. but wt does it stand for fisicly? is a PDF file a CLOB?
Noam Smadja
Since the C in CLOB stands for "character", I would not use a CLOB to store binary data like a PDF. That's what a BLOB is for. You guessed it, B is for "binary" :)
And CLOB is supported by which versions of Access/Jet/ACE?
David-W-Fenton
@David W. Fenton: for the Access Database Engine, CLOB = MEMO, BLOB = OLEOBJECT.
onedaywhen
BLOB <> OLEOBJECT in Jet/ACE. You can store binary data in a BLOB field without an OLE wrapper and read and write to it with the Chunk() commands.
David-W-Fenton
Time, then, to migrate to a more capable SQL ;)
onedaywhen
The fact that it's not an OLEOBJECT is an *advantage*. Anyone who's tried to extract data programatically from OLE fields knows it's a major pain in the ass. The fact that the Jet/ACE BLOB is not an OLE field is clearly an advantage, though it does mean you have to read/write the data in code instead of relying on bound controls. But bound OLE fields are themselves incredibly problematic, so this is yet another advantage, rather than a drawback.
David-W-Fenton
+1  A: 

I don't know which database you're using, but if text doesn't need to be searchable, then you can simply store the HTML formatted text (for instance, value coming from an FCKEditor or components like this). If you need also searchability, then you can store both HTML an plain text in two separated fields.

Fields can be nvarchar(MAX) if you use MS SQL Server 2008 or any equivalent datatype on other databases.

EDIT: Seems you're using Access, so go for Memo data type! If you decide to store HTML, consider to store only a generic markup (div, p) to divide your text, than later apply CSS formatting, wrapping stored text within another div specifing formatting classes for children elements.

m.bagattini
A: 

Put it in a TEXT field, and put it with their <p> so you'll be able to style paragraphs.

As it doesn't need to be searchable, it won't impact your sql performance.

Clement Herreman
A: 

For SQL Server

TEXT / NTEXT for SQL Server 2000

VARCHAR(MAX) / NVARCHAR(MAX) for SQL Server 2005 onwards

Russ Cam
+1  A: 

you have several options:

  • store it as a long single string with no formatting, which will look bland on the screen.

  • store it as a long single string with embedded html and css, which will be a bad choice if you ever want to make your site have a different look/feel.

  • normalize it so you have tables to store books, chapters, paragraphs, etc. you could then format and style the text as you load it into the application.

KM
A: 

I would propose storing the first chapter as pdf file. This is secure and allows for good formatting. Then use a blob, clob, varchar, or text field depending on your product (see the other answers).

Or you could use images and look into something like amazone's "look inside". It would work with the same db techniques.

Alternatively you could use something like markup.

I personally do not like to put html in my database. Even if it is only for output. Too easy to put in some javascript. But maybe I'm just too cautious.

Ralph Rickenbach
i hate using PDF on the internet.. it takes ages to load simple text.. and usually you get stuck with loading freezing every thing else around the browser..
Noam Smadja
It's slow because you're loading it across the Internet. Local access shouldn't be an issue at all. -1 to your comment, if I could vote down comments.
David-W-Fenton
+1  A: 

The main difference between long text (CLOB / TEXT / VARCHAR(MAX)) and long data (BLOB / IMAGE / VARBINARY(MAX)) is that the former is subject to character set conversions while the former is not.

If you need to make character set conversion on the database side, use CLOB and similar.

If you always want to retrieve your data as you atored it, byte-to-byte (as opposed character-to-character), use BLOB and similar.

Quassnoi
Uh, your answer is not entirely Access-specific. Can you clarify that?
David-W-Fenton
It was given before the author mentioned Access. I'm not an Access expert, sorry.
Quassnoi
A: 

I wouldn't store any of the documents in the database, but store the data in files in the file system, and the only thing that's in the database would be a pointer to the data files.

You don't give any details in your question that would suggest any need whatsoever to store the documents in the database itself.

And there are very few circumstances where it's advantageous.

David-W-Fenton
@David W. Fenton: "I wouldn't store any of the documents in the database, but store the data in files in the file system" -- what's wrong with the Attachment data type?
onedaywhen
I don't see the 2007 tag on this question, so, the Attachment data type would not be an option. I don't really know how it works and I'm also wary of new data types in the first release of a major version of the Jet db engine. I just remember all the issues with byte and decimal fields at the introduction of Jet 4 and don't want to commit to something that may never work properly.
David-W-Fenton
access-2007 is a subset of ms-access.
onedaywhen
Please stop editing my posts to read the way you want them to read. Post your own answer with the Attachments option. I am no longer accepting your edits if they change the nature of the answer instead of clarifying it -- I am going to roll back all of them that do so and I will continue the rollback war until you stop. Post your own answers. Get credit for your own ideas instead of trying to hijack someone else's answers with them.
David-W-Fenton
@onedaywhen: "access-2007 is a subset of ms-access". Post an f-ing answer that suggests the Attachment data type if the OP is using A2007. A question that does not specify A2007 should not expect to receive A2007-only answers that are unqualified by the A2007 exception.
David-W-Fenton
@David W. Fenton: why don't you just say, "PLONK", put me in your killfile and be done with it? Oh yeah, I remember now: this isn't a newsgroup, it's a collaboration site.
onedaywhen
...and when I edit your posts your still get the rep for them.
onedaywhen
I don't want to PLONK you because your posts are great. It's just your habit of inappropriate edits that annoys me.
David-W-Fenton
I don't want the reputation from your words. Endorsing the attachment type is not something I can do as I haven't used them. Every time you edit this post, I will roll it back, and I mean EVERY SINGLE DAY.
David-W-Fenton
Last of all, the Attachment type in the ACCDB format was introduced for Sharepoint compatibility, and that outside dependency is something that gives me pause. Will the ACCEDB data type change someday if Sharepoint changes the way it works? I'm not sure I'd want to take that risk.
David-W-Fenton
Also, in regard to the whole reputation thing, my post begins "I wouldn't store any of the documents in the database" -- it's a statement of my practices. Changing that completely alters the content of the answer I posted. PLEASE JUST STOP EDITING MY ANSWERS INAPPROPRIATELY.
David-W-Fenton
@David W. Fenton: All I ask is that you relocate relevant information that you have you conveyed in comments into your main answer. It's perfectly OK that your answer only relates to one version of the Access Database Engine if you make that clear. Even better if you also include you reasons for avoiding ACE Attachments. Don't you agree that adding information about Access2007 enhances your answer? How could that make it 'inappropriate'? I cannot add an answer of my own because I have no experience of storing files using the Access Database Engine.
onedaywhen
I genuinely appreciate the attempt at flattery but I doubt the sincerity :) I wager if you did a google search for "David W. Fenton" +PLONK then I'd turn up in the first page of the results ;)
onedaywhen
The topic of the discussion is determined by the original questioner. The topic is MS-ACCESS. My answer applies to ALL versions of Access. You have raised an option that is available in ONE version of Access. You can make the discussion better by posting an answer of your own that points out that option.
David-W-Fenton
As to the possibility of adding in the reference to the Attachments type, I've never used it so I can't recommend it nor say whether it's better or worse for this purpose. I do know that it was added for Sharepoint compatibility, not because anybody was clamoring for an Attachments type in Access.
David-W-Fenton
You may not believe I'm sincere, but I am -- your posts in this forum are very high quality, and you add a lot to discussions. But your approach to editing other people's answers is, in my opinion, completely wrongheaded -- you often alter the meaning of what was originally posted, and I think that's not a valid edit.
David-W-Fenton
@David W. Fenton: OK, my mind and ears are open. How does adding a disclaimer such as, 'This answer only applies to Jet 4.0' alter the meaning of what you originally posted? How does pointing out the existence of a potentially relevant data type in the current version of the engine do harm to your position? I rather think that you prefer to self-author biased, opinionated pieces than collaborate with others to produce more rounded, inclusive answers and in doing so find yourself at odds with the creators of this site...
onedaywhen
You couldn't give a rat's backside but here it is again: Editing versus Answering? by Joel Spolsky: I'd really like to see people editing answers to make them better rather than providing new answers. Editing definitely runs the risk of muddying the "ownership" of an answer, as multiple people contribute to the "best" answer. Tough. This isn't a site for egoists, who should feel free to go back to the old days of single-author book publishing. It's a place for team players who want to work together to provide great answers. http://meta.stackoverflow.com/questions/15839/editing-versus-answering
onedaywhen
Anyhow, I don't want to upset you or mar your enjoyment of this fantastic site so I've taken Spolsky's advice and posted a 'copy)
onedaywhen
@onedaywhen asks: "How does adding a disclaimer such as, 'This answer only applies to Jet 4.0' alter the meaning of what you originally posted?" Because IT WOULDN'T BE TRUE -- my answer applies to ALL VERSIONS OF JET. That the ACE adds a new data type in no way negates that statement -- and given that I don't know if the new data type is an appropriate answer to the problem (and neither do you, by your own admission), I don't see how a mention of it belongs in the answer I gave.
David-W-Fenton
You're right about my not giving a rat's ass about Spolsky's article. But I see what you are suggesting as completely wrongheaded -- you admit you won't post the suggestion as an answer yourself because you haven't used the Attachments data type. But neither have I, so how would it be appropriate for me to include a mention of it, or for you to edit that mention into my post (where it would look like it's coming from me, not from you)?
David-W-Fenton
Looks like plagiarism to me. You are a royal asshole, no doubt.
David-W-Fenton
If you think my behaviour is contrary to the spirit of this site or its written rules then feel free to call a moderator. As for you: calling people names, using profanity, pimping your business interests, not honouring the 'favour editing over answering' spirit... all adds to the fun as far as I'm concerned ;) I wonder if SO has a key on user name or whether I could create a duplicate 'David W. Fenton' the polite wiki advocate, a non-egoist team-player who is open-minded enough to see merit on both sides of any given issue...?
onedaywhen
Are you accusing me of "pimping [my] business interests"?
David-W-Fenton
Do you think SO is better for having my contributions or not? If so, then I suggest you not follow through on the threat of creating a sock-puppet user to impersonate me. If you do that, I'll leave the site permanently, but not before I finish deleting all my posts. I don't need problem people like you, and if you don't want my contributions, I definitely have plenty of other forums where I can offer my help.
David-W-Fenton
You imply the decision is mine. If you truly have opportunities to contribute to other forums then I'd say go for it, on grounds that you are likely to find a better fit for your philosophy than SO (Usenet perhaps?) I have you down as a wolf rather than a sheep, lonely perhaps but I know you'd do well. I'd appeal to your better nature and encourage you not to delete your SO answers out of spite. But be assured, I'm not about to hound you off the site with a sock puppet. My own interest in SO is waning. It's a nice idea but you have revealed a few of its flaws to me.
onedaywhen
For your informatino, I've been posting on Usenet in Access forums since 1996. As regards to you and SO, I think you don't understand collaborative editing. You have edited things into my answers that contradict what was originally said (regardless of author), or that introduce errors, or that are irrelevant to the thrust of the answer in the first place. You seem to want to change my answers to what you would have answered, no matter how much that completely negates my answer.
David-W-Fenton
It's not so much my authorship that's at stake, but the goal of SO, which is improving the content of the answers. The kinds of edits you applied to my posts degrade the quality of information, not because you remove or alter my golden utterances, but because you change them in ways that do not make them better. It's particularly galling in this case, as you just should have posted your own answer (suitably qualified to indicate your lack of direct experience), instead of attempting to hijack that really didn't have much to do with what you were seeking to introduce into the discussion.
David-W-Fenton
Again, I remain unconvinced. There was a time I edited one of your answers to add ADO code examples to match your DAO and you took umbrage (and we never did get to the bottom of what you mean by 'native'). Then there was the time you congratulated me on a good edit but still asked me not to edit your answers! The only sort of collaboration you seems to think is acceptable is where my edit is clearly demarcated from yours, where each distinct part shows the author's name.
onedaywhen
A: 

The following applies to Jet 4.0 only, being the version of the Access Database Engine in the era Access2000 to Access2003 inclusive:

I wouldn't store any of the documents in the database, but store the data in files in the file system, and the only thing that's in the database would be a pointer to the data files.

You don't give any details in your question that would suggest any need whatsoever to store the documents in the database itself.

And there are very few circumstances where it's advantageous.

If you are using ACE, being the version of the Access Database Engine in the Access2007 era, the Attachment data type would be an option, however I don't really know how it works, I've never used it so I can't recommend it nor say whether it's better or worse for this purpose. I'm also wary of new data types in the first release of a major version of the Access Database Engine. I just remember all the issues with byte and decimal fields at the introduction of Jet 4 and don't want to commit to something that may never work properly. The Attachment type in the ACCDB format was introduced for Sharepoint compatibility, and that outside dependency is something that gives me pause. Will the ACCEDB data type change someday if Sharepoint changes the way it works? I'm not sure I'd want to take that risk.

onedaywhen
Could you at the very least remove the personal pronouns from the text you copied from my answer? As it stands, you're claiming something as your policy that isn't. In fact, there's almost nothing in this post that you didn't lift from my answer or my comments. Yet, you're presenting it as though these are your thoughts. You're a real piece of work.
David-W-Fenton
IMO the ideal community member would write answers in the third person, wiki style, to facilitate editing by other users. Any chance of making this your personal 'policy'? If you edit your answer accordingly, I will re-do the copy+paste job. Alternatively, edit my answer; I am highly unlikely to rollback any edit where someone has removed personal pronouns and the like.
onedaywhen
Well, if I were editing your version of my post, I'd delete the first paragraph, as it is WRONG, and then indicate that all the rest was a quote of a different post of mine with the addition of selected comments. Is that what you want? If so, I'll edit it, but I have a feeling you wouldn't be happy with that -- you seem insistent on having the incorrect 1st paragraph included. It's your post -- you fix it. I've already posted an answer along with voluminous comments that make it quite clear what my position on the issue is. And I believe my answer is correct as it stands.
David-W-Fenton
...and I don't believe the information about Attachment fields belongs in my answer, as it was your suggestion in a comment, and I said what I know about it, which is not much. It's not part of my ANSWER -- it's part of what I didn't suggest as a solution, even though it might be a possible solution. It's just not one I can vouch for.
David-W-Fenton