tags:

views:

1168

answers:

5

I'm doing some Project Euler exercises and I've run into a scenario where I have want arrays which are larger than 2,147,483,647 (the upper limit of int in C#).

Sure these are large arrays, but for instance, I can't do this

// fails
bool[] BigArray = new BigArray[2147483648];

// also fails, cannot convert uint to int
ArrayList BigArrayList = new ArrayList(2147483648);

So, can I have bigger arrays?

EDIT: It was for a Sieve of Atkin, you know, so I just wanted a really big one :D

+11  A: 

Anytime you are working with an array this big, you should probably try to find a better solution to the problem. But that being said I'll still attempt to answer your question.

As mentioned in this article there is a 2 GB limit on any object in .Net. For all x86, x64 and IA64.

As with 32-bit Windows operating systems, there is a 2GB limit on the size of an object you can create while running a 64-bit managed application on a 64-bit Windows operating system.

Also if you define an array too big on the stack, you will have a stack overflow. If you define the array on the heap, it will try to allocate it all in one big continuous block. It would be better to use an ArrayList which has implicit dynamic allocation on the heap. This will not allow you to get past the 2GB, but will probably allow you to get closer to it.

I think the stack size limit will be bigger only if you are using an x64 or IA64 architecture and operating system. Using x64 or IA64 you will have 64-bit allocatable memory instead of 32-bit.

If you are not able to allocate the array list all at once, you can probably allocate it in parts.

Using an array list and adding 1 object at a time on an x64 Windows 2008 machine with 6GB of RAM, the most I can get the ArrayList to is size: 134217728. So I really think you have to find a better solution to your problem that does not use as much memory. Perhaps writing to a file instead of using RAM.

Brian R. Bondy
but I can't do this: ArrayList BigArrayList = new ArrayList(2147483648); either
DrG
"stack overflow": I understand the array being on the stack if it's a local variable, but are you saying that the **contents** of an array are allocated on the stack as well (instead of on the heap)?
ChrisW
I agree. This would be a heap limitation, not stack.
recursive
I've allocated 2.5 GB in one block on Linux x86.
Joshua
ChrisW, I agree, I clarified.
Brian R. Bondy
Joshua: Linux is not Windows, and it is not .NET. ;)This answer only says that 1) 32-bit Windows imposes a 2GB limit on 32-bit processes by default, and 2) Even in 64-bit, .NET imposes a 2GB limit on any single object.Linux doesn't limit 32-bit processes to 2GB.
jalf
+4  A: 

The array limit is, afaik, fixed as int32 even on 64-bit. There is a cap on the maximum size of a single object. However, you could have a nice big jagged array quite easily.

Worse; because references are larger in x64, for ref-type arrays you actually get less elements in a single array.

See here:

I’ve received a number of queries as to why the 64-bit version of the 2.0 .Net runtime still has array maximum sizes limited to 2GB. Given that it seems to be a hot topic of late I figured a little background and a discussion of the options to get around this limitation was in order.

First some background; in the 2.0 version of the .Net runtime (CLR) we made a conscious design decision to keep the maximum object size allowed in the GC Heap at 2GB, even on the 64-bit version of the runtime. This is the same as the current 1.1 implementation of the 32-bit CLR, however you would be hard pressed to actually manage to allocate a 2GB object on the 32-bit CLR because the virtual address space is simply too fragmented to realistically find a 2GB hole. Generally people aren’t particularly concerned with creating types that would be >2GB when instantiated (or anywhere close), however since arrays are just a special kind of managed type which are created within the managed heap they also suffer from this limitation.

Marc Gravell
Very interesting, if this is true, I wonder what the justification is for the existence of the Array.LongLength property.
DrJokepu
It is presumably needed to get elements between 1gb and 2gb (assuming byte[]) since int is signed, and they didn't want to use uint due to CLS compliance.
Marc Gravell
+2  A: 

I believe that even within a 64 bit CLR, there's a limit of 2GB (or possibly 1GB - I can't remember exactly) per object. That would prevent you from creating a larger array. The fact that Array.CreateInstance only takes Int32 arguments for sizes is suggestive too.

On a broader note, I suspect that if you need arrays that large you should really change how you're approaching the problem.

Jon Skeet
nice, I was hoping I'ld get a response from you :D
DrG
In one question you need to get primes up to 50 billion, but the effective way is to use The Sieve of Eratosthenes which forces you to declare an array with such index.. http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes
Canavar
I would argue that at that point it *isn't* an effective way.
Jon Skeet
+5  A: 

You don't need an array that large at all.

When your method runs into resource problems, don't just look at how to expand the resources, look at the method also. :)

Here's a class that uses a 3 MB buffer to calculate primes using the sieve of Eratosthenes. The class keeps track of how far you have calculated primes, and when the range needs to be expanded it creates a buffer to test another 3 million numbers.

It keeps the found prime numbers in a list, and when the range is expanded the previos primes are used to rule out numbers in the buffer.

I did some testing, and a buffer around 3 MB is most efficient.

public class Primes {

   private const int _blockSize = 3000000;

   private List<long> _primes;
   private long _next;

   public Primes() {
      _primes = new List<long>() { 2, 3, 5, 7, 11, 13, 17, 19 };
      _next = 23;
   }

   private void Expand() {
      bool[] sieve = new bool[_blockSize];
      foreach (long prime in _primes) {
         for (long i = ((_next + prime - 1L) / prime) * prime - _next;
            i < _blockSize; i += prime) {
            sieve[i] = true;
         }
      }
      for (int i = 0; i < _blockSize; i++) {
         if (!sieve[i]) {
            _primes.Add(_next);
            for (long j = i + _next; j < _blockSize; j += _next) {
               sieve[j] = true;
            }
         }
         _next++;
      }
   }

   public long this[int index] {
      get {
         if (index < 0) throw new IndexOutOfRangeException();
         while (index >= _primes.Count) {
            Expand();
         }
         return _primes[index];
      }
   }

   public bool IsPrime(long number) {
      while (_primes[_primes.Count - 1] < number) {
         Expand();
      }
      return _primes.BinarySearch(number) >= 0;
   }

}
Guffa
Efficiency-wise, I think it would be more efficient to have your block size aligned to some power of 2 (e.g. 3 MB == 3*1024*1024), because it would make memory management a little easier for the OS (e.g. because your memory is divided evenly into pages).
Hosam Aly
Would it not be more efficient to use bit sets instead of boolean arrays? It could save much space at the very least.
Hosam Aly
A: 

I'm very much a newbie with C# (i.e. learning it this week), so I'm not sure of the exact details of how ArrayList is implemented. However, I would guess that as you haven't defined a type for the ArrayList example, then the array would be allocated as an array of object references. This might well mean that you are actually allocating 4-8Gb of memory depending on the architecture.

Jason Waring
Good point, booleans take up 4 bytes in .NET and, so, 2 GB of booleans is 8 GB total. The ArrayList class is implemented as an array internally which re-allocates a new (larger) array as needed to accommodate larger sizes: http://msdn.microsoft.com/en-us/library/system.collections.arraylist.aspx
Mike Rosenblum
Actually, it uses a lot more than that. In a bool array each bool only uses one byte, but in an ArrayList each bool uses 16 bytes. Each reference is 4 bytes, each object boxing a bool has two interal pointers and 4 bytes for the bool. So an ArrayList with 2 million booleans uses 32 GB of memory.
Guffa
@Guffa - or worse *again* on x64, since references are bigger ;-p
Marc Gravell
@Mark - correct, I just wanted to keep it simple, and comment space is limited. :)
Guffa
I've used Java for years, so I thought this might be the case. Nice to have my suspicion clarified. Abstraction layers are very useful, but sometimes we do need to know the implementation details :-)
Jason Waring