views:

72

answers:

3

This throws an out of memory exception when using 10^8 items but not with 10^7. How would you serialize an array with 10^9 values in it so that it could be stored in a database?

Dim List((10 ^ 9) - 1) As Int64
For i = 1 To (10 ^ 9)
    List(i - 1) = i
Next
Dim Format As New Runtime.Serialization.Formatters.Binary.BinaryFormatter
Dim Writer As New System.IO.MemoryStream
Format.Serialize(Writer, List)

[EDIT]

This is on a 64 bit machine with more memory than one could every ask for. 8GB and can page up to 15GB

+1  A: 

Your example simply stores the index + 1 in each array element; you can get this via a calculation at runtime, there is no need to store or serialize anything.

Even if your example is made-up and you are actually trying to store 10^9 arbitrary integers, you are probably going to going to have many duplicates. In this case you should be using a sparse data structure, not an array.

Addendum: If the values are primary keys and must be unique, you may be better off storing the numbers that are not used rather than the ones which are.

Dour High Arch
This is a test for storing Primary Keys for items returned on a Report. It's stored because it's far easier to retrieve than to reproduce the report.
Middletone
10^9 = 1 billion, 2^31 = 2.147 billion; so it should fit.
Edmund
Are the primary keys Int32s? Then you should use a sparse data structure. If they are not, please post the actual example, not a made-up one.
Dour High Arch
That is the example. the only difference is that I've subbed the PK with i-1 to simulate the same results. They could be Int64 but the concept still applies.
Middletone
+1  A: 

Use System.IO.BinaryWriter instead to do your own serialization - just call Write(int) on it. However, a MemoryStream will not support more than 2^31 values, so you'll need to write it to some other kind of stream. UnmanagedMemoryStream is a possibility or your database client may provide something specifically for storing large binary values. (I don't know what kind of database you're writing to.)

Evgeny
I like the concept. I tried it and it failed at 33554432.
Middletone
+1  A: 

An Int64 is 8 bytes; 1e9 of them is 8GB. In order to serialize the array you must have the 8GB array in memory plus 8GB for the MemoryStream, thus clearly requiring 16GB of memory. It's not clear how you are going to store 8GB to your database, but to complete the immediate task you just need to get more memory, make the numbers smaller (i.e. Int32), or stream to disk instead of memory.

Exactly how do you intend to store 8GB of data in your DB? Most that I know of only allow a single value to be at most 2GB or 4GB.

Gabe
I think that it just dawned on me. It's not that I ran out of memory but that the application decided that 2.5 GB would be required in order to make the array large enough to contain the larger values for all of the previous items. So what do companies do that need to datamine such large sets of information if they can’t even store the results?
Middletone
Generally speaking it is rare to store large datasets in single arrays. In a database, for example, you could store integers into a table with 1e9 rows and then ask the database to find you whatever information you needed.
Gabe