views:

622

answers:

4

I'm storing localized strings in a single datatable using MS Sql (2008 or whatever). Most of the strings are short and can be represented with varchar(200), while about 10% are much longer require something like varchar(5000). My question is, is there a performance advantage when retrieving shorter strings if I break this into two tables like this:

CREATE TABLE ShortTextTable(ID bigint IDENTITY(1,1) NOT NULL, TextValue nvarchar(200))
CREATE TABLE LongTextTable(ID bigint IDENTITY(1,1) NOT NULL, TextValue nvarchar(5000))

Versus:

CREATE TABLE TextTable(ID bigint IDENTITY(1,1) NOT NULL, TextValue nvarchar(5000))

This data will rarely be updated, I'm only really concerned about reads.

+2  A: 

It depends. Could be premature optimization.

With smaller columns, you will fit more rows per page, obviously, but your usage patterns may mean that the horizontal partition you are proposing is not very efficient, since it's getting things from both new tables. I think we need to see the read usage pattern and how the table is joined to.

Also, it is partitioning a space that is logically one space, and will no longer be manageable as one space (i.e. add an index in both places, etc.)

You'd really have to see a bottle neck and profile the proposed change before I would partition it like this.

I'm not sure, but it might be possible to literally partition (using SQL Server's partitioned tables feature) the table based on length of the column. Again, whether this would help would need to be profiled.

Cade Roux
Good point about the premature optimization. I'll be sticking with the single table design for this reason.
Paul
Another issue is that with 2 tables you will no longer be able to easily create a single foreign-key constraint.
Cade Roux
+2  A: 

No, no real gain. To see bottlenecks because of the string size interleaves, specially base don an int PK, it would be a real extreme.
On the other hand the mess of working with such a storage schema is very clear and present: you have to decide, based on the length of a string you did not yet retrieve on which table to look! You'll probably end up looking by trial and error (try one table, then the other), which is by far more wasteful than that any table nvarchar storage structure issue.

Remus Rusanu
+1  A: 

In SQL 2005 and I believe 2008 you will not create an NVarChar(5000) since you exceed the page size with such a data type, NVarChar(Max) would work at that point. When specifying a number N for nVarChar you have a limit of up to 4000.

I believe that at that point there will be a performance differece between reading an inline stored value to the page, vs reading the page to get the 16 byte pointer to the LOB pages and reading the data from there.

Andrew
i like this answer best but it's a little vague.
djangofan
Whilst it can be theorised multiple ways, the proof should be checked by instrumenting / measuring the results of multiple tests.It's going to be slower to read items from the LOB space than reading items which are directly on the page the record is stored. Whether this benefit is worth it given the decision where to read the data from time? unknown.
Andrew
With nvarchar(max) you deal with LOB only if size of the data in row exicdes 8036 bytes. For anything smaller than that, data will be stored as regular nvarchar.There's no difference in physical space used between nvarchar(200) and nvarchar(4000) if you are store up to 200 chars.
Niikola
+1  A: 

No or negative gain,

Storage wise: A variable length string is stored as number of chars + 2 bytes for length. So: length of data is the same but you'd have index and key overhead of the 2nd table.

Processing wise:

  • deciding what table to add it to
  • correcting a typo means it's in the wrong table (ignoring forward ptrs etc)
  • dealing with key uniqueness for 2 tables (if they'd have a common parent)

Now, more importantly I saw you mentioned localisation but do you need nvarchar? Another SO question: varchar vs nvarchar performance

gbn