views:

3081

answers:

6

We know the MS Access database engine is 'throttled' to allow a maximum file size of 2GB (or perhaps internally wired to be limited to fewer than some power of 2 of 4KB data pages). But what does this mean in practical terms?

To help me measure this, can you tell me the maximum number of rows that can be inserted into a MS Access database engine table?

To satisfy the definition of a table, all rows must be unique, therefore a unique constraint (e.g. PRIMARY KEY, UNIQUE, CHECK, Data Macro, etc) is a requirement.

EDIT: I realize there is a theoretical limit but what I am interested in is the practical (and not necessarily practicable), real life limit.

A: 

It all depends. Theoretically using a single column with 4 byte data type. You could store 300 000 rows. But there is probably alot of overhead in the database even before you do anything. I read some where that you could have 1.000.000 rows but again, it all depends..

You can also link databases together. Limiting yourself to only disk space.

Tommy
The math of 4-byte records and 300K rows does not add up at all. I've had databases in long-term production use with over 300K rows in 3 tables (and one table with twice that) and the records were a lot more complicated than a single 4-byte field. The data file was substantially less than 500MBs in size (more like half that).
David-W-Fenton
yes you're right. Im missing some zeroes. (2 gigabytes) / (4 Bytes) = 536 870 912
Tommy
A: 

We're not necessarily talking theoretical limits here, we're talking about real world limits of the 2GB max file size AND database schema.

  • Is your db a single table or multiple?
  • How many columns does each table have?
  • What are the datatypes?

The schema is on even footing with the row count in determining how many rows you can have.

We have used Access MDBs to store exports of MS-SQL data for statistical analysis by some of our corporate users. In those cases we've exported our core table structure, typically four tables with 20 to 150 columns varying from a hundred bytes per row to upwards of 8000 bytes per row. In these cases, we would bump up against a few hundred thousand rows of data were permissible PER MDB that we would ship them.

So, I just don't think that this question has an answer in absence of your schema.

David Walker
I don't have a schema. The challenge is for you to come up with a schema that will yield the table with the most rows possible (hint: I'm thinking a sole single-column table).
onedaywhen
Row-level locking also makes a difference, as it forces a single data page for each record.
David-W-Fenton
Hey onedaywhen, you've got me curious. If you're willing to settle for one single-column table, where that column is a an autonumber primary key, your database could contain millions of rows before hitting the 2G file size limit. (The furthest I ever pushed it was 7-9 million rows.) But you emphasized "practical". How practical would that database be?
HansUp
Yes, the answer will be in the millions and will take a long time to achieve. I would be surprised if no one has done this before :)
onedaywhen
It is confusing, I know (see my edits to the question!), but I mean 'practical' meaning 'can be done' and not 'practicable' meaning 'worth doing'. However, I think it is worth knowing the practical limit because I suspect it would give some indication that the practicable limit exceeds the theoretical limit. Any clearer? :)
onedaywhen
A: 

It's been some years since I last worked with Access but larger database files always used to have more problems and be more prone to corruption than smaller files.

Unless the database file is only being accessed by one person or stored on a robust network you may find this is a problem before the 2GB database size limit is reached.

Richard
-1 Using another SQL product will not help me find the practical limit of an Access database engine table.
onedaywhen
I didn't realise I was advocating the use of another SQL product in my comment? I was pointing out that an important factor in determining the practical limit (whatever that might mean) of the size of an Access database is a network's ability to both serve a large file and serve this large file to multiple users concurrently.
Richard
A properly-designed front-end application won't be retrieving any more data for a Jet/ACE back end with 1 million records than for one with 1000 records because Jet/ACE retrieves the index pages for the requested data set, then only the data pages storing those records. This will be exactly the same amount of data in either context. It's only if you're doing full table scans (because you're an idiot) that your scenario would be relevant.
David-W-Fenton
A: 

As others have stated it's combination of your schema and the number of indexes.

A friend had about 100,000,000 historical stock prices, daily closing quotes, in an MDB which approached the 2 Gb limit.

He pulled them down using some code found in a Microsoft Knowledge base article. I was rather surprised that whatever server he was using didn't cut him off after the first 100K records.

He could view any record in under a second.

Tony Toews
I would say that to get maximum space, you'd also want to create your database in code, so it's pure Jet, without any of the unnecessary Access-specific features.
David-W-Fenton
@Tony Toews: this question is about practical limits. No anecdotes, thanks.
onedaywhen
...but a link to that Microsoft Knowledge base article could get you some rep :)
onedaywhen
+3  A: 

Some comments:

  1. Jet/ACE files are organized in data pages, which means there is a certain amount of slack space when your record boundaries are not aligned with your data pages.

  2. Row-level locking will greatly reduce the number of possible records, since it forces one record per data page.

  3. In Jet 4, the data page size was increased to 4KBs (from 2KBs in Jet 3.x). As Jet 4 was the first Jet version to support Unicode, this meant that you could store 1GB of double-byte data (i.e., 1,000,000,000 double-byte characters), and with Unicode compression turned on, 2GBs of data. So, the number of records is going to be affected by whether or not you have Unicode compression on.

  4. Since we don't know how much room in a Jet/ACE file is taken up by headers and other metadata, nor precisely how much room index storage takes, the theoretical calculation is always going to be under what is practical.

  5. To get the most efficient possible storage, you'd want to use code to create your database rather than the Access UI, because Access creates certain properties that pure Jet does not need. This is not to say there are a lot of these, as properties set to the Access defaults are usually not set at all (the property is created only when you change it from the default value -- this can be seen by cycling through a field's properties collection, i.e., many of the properties listed for a field in the Access table designer are not there in the properties collection because they haven't been set), but you might want to limit yourself to Jet-specific data types (hyperlink fields are Access-only, for instance).

I just wasted an hour mucking around with this using Rnd() to populate 4 fields defined as type byte, with composite PK on the four fields, and it took forever to append enough records to get up to any significant portion of 2GBs. At over 2 million records, the file was under 80MBs. I finally quit after reaching just 700K 7 MILLION records and the file compacted to 184MBs. The amount of time it would take to get up near 2GBs is just more than I'm willing to invest!

David-W-Fenton
Then length of time to get the test MDB up is probably because of the time spent indexing the composite key. I wonder if it would've been a lot faster just using a auto number key.
Tony Toews
+1 for a partial answer in the spirit of the question :)
onedaywhen
Actually, I'm pretty sure the reason the last batch took so long was because there were 9000+ dupes to be discarded (appending 2 million records). What I did was stop using Rnd() and used the table as source, just swapping the columns. This did, of course, result in some duplicates, and I'm certain that's what caused it to take over an hour to append those records.
David-W-Fenton
BTW, I'm right now copying this over to an older PC to finish appending up to 2GBs to see how many records it comes out with. That way I can let it run without it bogging down the machine I'm working on (and it was pretty doggy yesterday when I was running it).
David-W-Fenton
Just an update: I'm now up to over 16 million records, and the file is just over 420MBs. I'm starting one more round of appending random data. The last batch had 2 million dupes discarded, so I expect more discards on this one.
David-W-Fenton
I'm stopping for the night. The database now has 21.7 million records (11 million of the last batch were dupes and rejected), and after a compact is just over 600MBs. My method of generating unique records is obviously less effective than it was before, so I think I'll stop now.
David-W-Fenton
+1  A: 

Here's my attempt:

I created a single-column (INTEGER) table with no key:

CREATE TABLE a (a INTEGER NOT NULL);

Inserted integers in sequence starting at 1.

I stopped it (arbitrarily after many hours) when it had inserted 65,632,875 rows. The file size was 1,029,772 KB.

I compacted the file which reduced it very slightly to 1,029,704 KB.

I added a PK:

ALTER TABLE a ADD CONSTRAINT p PRIMARY KEY (a);

which increased the file size to 1,467,708 KB.

This suggests the maximum is somewhere around the 80 million mark.

onedaywhen
Thanks for showing your cards, ODW.
HansUp