I am currently using MS SQL Server 2008 but I'm not sure it it is the best system for this particular task.
I have a single table like so:
PK_ptA PK_ptB DateInserted LookupColA LookupColB ... LookupColF DataCol (ntext)
A common query is
SELECT TOP(1000000) DataCol FROM table
WHERE LookupColA=x AND LookupColD=y AND LookupColE=z
ORDER BY DateInserted DESC
The table has about a billion rows with 5 million inserted per day.
My main problem with SQL Server is that it isn't too easy to shard or spread out the datafiles. Also, exporting seems to max out at 1000rows per second (about 1MB/s) which seems very slow.
Another problem I have is, with SQL Server, if I want to add a new LookupCol the log file grows enormously requiring a large amount of rarely used free space on tap.
Are there any obvious better solutions for this problem?