views:

422

answers:

8

I need to store long strings in a database. the string may be 5 or 6 sentences long. do you think this is a good design strategy. or should I store an id for that string and then create a relationship with another table that contains the location of the file storing the string. could you please give advantages and disadvantages of both.

the strings have been preprocessed and stored in the database. any modification would read the entire string and replace it completely. so you can assume that the string is indivisible.

+3  A: 

Five or six sentences is nothing to a modern DBMS! Store the text directly in the database.

(The other technique you mentioned - storing a ref to another table which itself has a ref to an external file holding the text - would be much more cumbersome to use and have much poorer performance.)

LukeH
+4  A: 

It should be fine to store the string in the database. If you store a file pointer instead, that means you need to do File I/O every time you want to read the string. A few sentences isn't terribly long and you can always use a longtext data field if you need to. Obviously your database will be a little bit larger because you have the text, but that's ok. It is certainly a better alternative than having to store the files.

Jeff Storey
+4  A: 

The only reason I would create a separate table is if those long strings will be the same for many records. Otherwise its just an extra complication that isn't likely to provide any payback.

AnthonyWJones
no these long strings are different. i understand your point, no separate table should be used. but should i store the string in the file system and only hold a pointer to the file in the database or store the string in the database itself. any suggestions based on performance.
iamrohitbanga
@iamrohitbanga: So, *how long* are they, exactly? Anything up to a couple of kilobytes (which is quite a lot for text) is OK to store in the database. Anything above that limit is *still* okay, but you must use the TEXT data types your DBMS provides. The idea of putting them to the file system for performance reasons does make much sense to me.
Tomalak
@iamrohitbanga: These string really need to be of some significant size, we're talking multiples of 10K before it (if ever) becomes viable to store them in the file system with all the complications that brings.
AnthonyWJones
+1  A: 

The answer really depends on the volume of strings you intend to store, and what DB you intend to use to store it. If you aren't storing many strings, you might want to consider storing them in an XML or resource file, and loading that into your application up front. If you have lots of string data though, you'll probably be better off memorywise reading the string as and when you need it, rather than taking the chance of reading a string into memory that you don't end up using.

Pete OHanlon
+1  A: 

The database itself has no real problem with storing long strings. Some restrictions apply (like the 8k record size limit on SQL Server), but even then you could store text of arbitrary length in a database, because all proper ones support BLOB/TEXT data types with virtually no upper limit.

Five to six sentences is not really long. If they belong together and are meant to be retrieved and manipulated as a whole, you can go ahead and store them in a CHAR data type field of appropriate dimensions.

The question whether to separate them and attach an ID to them arises only if your application/data model benefits directly from this approach, i.e. in reality they are separate things. In your case there seems to be no reason to go that way.

Tomalak
A: 

Except in special cases, I would leave the field where it is.

The only other option would be to put the strings into a different table (putting the actual strings in there)... putting them in separate files will kill your performance.

Jon Seigel
+7  A: 

The strings you mention are not at all long.

When you refered to "long" strings, I was thinking about 32kB and above -- some sentences are <1kb -- that is nothing today.

Your trick, storing an Id makes the things slower since you have to make an indirect access.

The only thing I would recommend, when maximum performance is needed, you should select only those columns that you need (omit SELECT *) -- so omit the text column, when not needed, since the transport of the string from server to application costs the most time. It is a good praxis, not to touch columns not needed (specially when they might contain much data).

Juergen
+1  A: 

Everybody has mentioned performance, but nobody has raised the other major reason why storing pointers to OS files is a bad idea: backup and recovery. If everything is in the database then we have a single mechanism for backing up the data and a single mechanism for recovery. Whereas with files on the OS we have two different backup mechanisms, probably at two different granularities, and recovery becomes a synchronisation nightmare.

There are a few cases where this doesn't apply, such as data warehouses, which have very infrequent transactions and so can survive without redo or transaction logs.

APC