ansaurus

Question

Database that can handle >500 millions rows

Answer 1

+11 A:

MSSQL can handle that many rows just fine. The query time is completely dependent on a lot more factors than just simple row count.

For example, it's going to depend on:

how many joins those queries do
how well your indexes are set up
how much ram is in the machine
speed and number of processors
type and spindle speed of hard drives
size of the row/amount of data returned in the query
Network interface speed / latency

It's very easy to have a small (less than 10,000 rows) table which would take a couple minutes to execute a query against. For example, using lots of joins, functions in the where clause, and zero indexes on a Atom processor with 512MB of total ram. ;)

It takes a bit more work to make sure all of your indexes and foreign key relationships are good, that your queries are optimized to eliminate needless function calls and only return the data you actually need. Also, you'll need fast hardware.

It all boils down to how much money you want to spend, the quality of the dev team, and the size of the data rows you are dealing with.

UPDATE Updating due to changes in the question.

The amount of information here is still not enough to give a real world answer. You are going to just have to test it and adjust your database design and hardware as necessary.

For example, I could very easily have 1 billion rows in a table on a machine with those specs and run a "select top(1) id from tableA (nolock)" query and get an answer in milliseconds. By the same token, you can execute a "select * from tablea" query and it take a while because although the query executed quickly, transferring all of that data across the wire takes awhile.

Point is, you have to test. Which means, setting up the server, creating some of your tables, and populating them. Then you have to go through performance tuning to get your queries and indexes right. As part of the performance tuning you're going to uncover not only how the queries need to be restructured but also exactly what parts of the machine might need to be replaced (ie: disk, more ram, cpu, etc) based on the lock and wait types.

I'd highly recommend you hire (or contract) one or two DBAs to do this for you.

Chris Lively 2010-09-23 13:58:21

would six-inch processors help? :-)

Javier 2010-09-23 14:00:45

@Javier - I thought the processor size was in pounds (lbs) :)

Mark Schultheiss 2010-09-23 14:03:34

@Javier: Ha! I'll update that statement. :)

Chris Lively 2010-09-23 14:04:14

Answer 2

+3 A:

Most databases can handle this, it's about what you are going to do with this data and how you do it. Lots of RAM will help.

I would start with PostgreSQL, it's for free and has no limits on RAM (unlike SQL Server Express) and no potential problems with licences (too many processors, etc.). But it's also my work :)

Frank Heikens 2010-09-23 13:58:59

I think you meant unlike SQL Server Express...

Joe R 2010-09-23 14:05:26

I will try Postgresql and see how it works.

Skarab 2010-09-23 15:00:40

Answer 3

+1 A:

Pretty much every non-stupid database can handle a billion rows today easily. 500 million is doable even on 32 bit systems (albeit 64 bit really helps).

The main problem is:

You need to have enough RAM. How much is enough depends on your queries.
You need to have a good enough disc subsystem. This pretty much means if you want to do large selects, then a single platter for everything is totally out of the question. Many spindles (or a SSD) are needed to handle the IO load.

Both Postgres as well as Mysql can easily handle 500 million rows. On proper hardware.

TomTom 2010-09-23 14:00:04

From my own experience, Mysql has often problems with indexing attributes when a table is very long.

Skarab 2010-09-23 14:04:34

MySQL might get into trouble on it's queryplans, it's pretty limited. But it depends on your type of queries, as usual.

Frank Heikens 2010-09-23 14:04:45

Answer 4

A:

Have you checked out Cassandra? http://cassandra.apache.org/

adamzwakk 2010-09-23 14:00:22

What problem would that solve? Every DB can handle this amount of records.

Frank Heikens 2010-09-23 14:02:05

You get to be a cool kid, like John Quinn.

MkV 2010-09-23 14:17:13

Answer 5

A:

As mentioned pretty much all DB's today can handle this situation - what you want to concentrate on is your disk i/o subsystem. You need to configure a RAID 0 or RAID 0+1 situation throwing as many spindles to the problem as you can. Also, divide up your Log/Temp/Data logical drives for performance.

For example, let say you have 12 drives - in your RAID controller I'd create 3 RAID 0 partitions of 4 drives each. In Windows (let's say) format each group as a logical drive (G,H,I) - now when configuring SQLServer (let's say) assign the tempdb to G, the Log files to H and the data files to I.

bigtang 2010-09-23 16:09:15

Answer 6

A:

I don't have much input on which is the best system to use, but perhaps this tip could help you get some of the speed you're looking for.

If you're going to be doing exact matches of long varchar strings, especially ones that are longer than allowed for an index, you can do a sort of pre-calculated hash:

CREATE TABLE BigStrings (
   BigStringID int identity(1,1) NOT NULL PRIMARY KEY CLUSTERED,
   Value varchar(6000) NOT NULL,
   Chk AS (CHECKSUM(Value))
);
CREATE NONCLUSTERED INDEX IX_BigStrings_Chk ON BigStrings(Chk);

--Load 500 million rows in BigStrings

DECLARE @S varchar(6000);
SET @S = '6000-character-long string here';

-- nasty, slow table scan:
SELECT * FROM BigStrings WHERE Value = @S

-- super fast nonclustered seek followed by very fast clustered index range seek:
SELECT * FROM BigStrings WHERE Value = @S AND Chk = CHECKSUM(@S)

This won't help you if you aren't doing exact matches, but in that case you might look into full-text indexing. This will really change the speed of lookups on a 500-million-row table.

Emtucifor 2010-09-24 00:51:13

Answer 7

+1 A:

michaelok 2010-09-24 01:34:48

ansaurus

tags:

views:

answers:

Database that can handle >500 millions rows

related questions