views:

82

answers:

2

I am currently using MS SQL Server 2008 but I'm not sure it it is the best system for this particular task.

I have a single table like so:

PK_ptA PK_ptB DateInserted LookupColA LookupColB ... LookupColF DataCol (ntext)

A common query is

SELECT TOP(1000000) DataCol FROM table 
WHERE LookupColA=x AND LookupColD=y AND LookupColE=z
ORDER BY DateInserted DESC 

The table has about a billion rows with 5 million inserted per day.

My main problem with SQL Server is that it isn't too easy to shard or spread out the datafiles. Also, exporting seems to max out at 1000rows per second (about 1MB/s) which seems very slow.

Another problem I have is, with SQL Server, if I want to add a new LookupCol the log file grows enormously requiring a large amount of rarely used free space on tap.

Are there any obvious better solutions for this problem?

+3  A: 

You have a problem, and it is not SQL Server. let me also ignore that you seem to ahve a bad table design.

  • Spreading data files is actually pretty easy. REORGANIZING later is not that easy, but also doable. How is your table, filegroup and file layout?
  • export 1mb per second is a joke. Seriously. I have been handling 150 million row files in minutes - that runs down to a LOT more than 60.000 rows per minute. Something is freaking out. Temp space? Did you do a performance analysis? How does the hardware look?
  • Nothing will work for the log usage. Basically like most pro databases the log contains all changed database pages during a transaction. Adding a field changes - ALL pages.

You should:

  • Redesign the database (use a view to keep the same old table in place if you ahve to) so that it does not ahve "LookupColA" etc., but is normalized (LookupValue, and a LookuPTable that is coded by "column"). This way you get instant additional fields. This turns into a data warehouse like star schema.
  • Do a performance analysis. Looks like you ahve some problems.
  • Definitely tell us abou your hardware ;)

This problem here is definitely NOT SQL Server, it is related to bad table design AND - possibly - insufficient - badly utilized hardware.

TomTom
Thanks TomTom. Serverwise I'm using a DELL MD3000 with 13 x 15kRPM 300GB drives in RAID 5 and a T710, 2.66GHz X5550 with 16GB ram, Windows 2003 x64. The whole table data is stored in RAID array. I haven't split the filegroup.I don't quite follow your table design suggestion. Could you expand on your suggestion a bit please?
Mr. Flibble
The RAID 5 is going to be slow during updates - the write may kill your read performance. For the table suggestion separate answer ;) If you did not tune your tables, possibly you loose a lot of performance due to non-aligned partitions on top of it (up to 40% io performance wasted). I think you run into an IO issue, which partially is foved on you (as in: bad install, bad raid layout).
TomTom
A: 

Ok, the table design (separate answer). Lokup are bassically lookup tables.

So....

  • LookupTable
  • pk (int)
  • TableType
  • Value as vields

  • ValueTable

  • pk

  • ValueLookupMap table

  • pk of ValueTable entry
  • pk of LookupTable entry

So, basically, if you add a lookup "field" then you just create a set of entries in the LookupTable then add entries in the ValueLookupMap.

TomTom