views:

335

answers:

6

I need to declare square matrices in C# WinForms with more than 20000 items in a row. I read about 2GB .Net object size limit in 32bit and also the same case in 64bit OS. So as I understood the single answer - is using unsafe code or separate library built withing C++ compiler.

The problem for me is worth because ushort[20000,20000] is smaller then 2GB but actually I cannot allocate even 700MB of memory. My limit is 650MB and I don't understand why - I have 32bit WinXP with 3GB of memory. I tried to use Marshal.AllocHGlobal(700<<20) but it throws OutOfMemoryException, GC.GetTotalMemory returns 4.5MB before trying to allocate memory.

I found only that many people say use unsafe code but I cannot find example of how to declare 2-dim array in heap (any stack can't keep so huge amount of data) and how to work with it using pointers. Is it pure C++ code inside of unsafe{} brackets?

PS. Please don't ask WHY I need so huge arrays... but if you want - I need to analyze texts (for example books) and found lot of indexes. So answer is - matrices of relations between words

Edit: Could somebody please provide a small example of working with matrices using pointers in unsafe code. I know that under 32bit it is impossible to allocate more space but I spent much time in googling such example and found NOTHING

+4  A: 

Why demand a huge 2-D array? You can simulate this with, for example, a jagged array - ushort[][] - almost as fast, and you won't hit the same single-object limit. You'll still need buckets-o-RAM of course, so x64 is implied...

        ushort[][] arr = new ushort[size][];
        for(int i = 0 ; i < size ; i++) {
            arr[i] = new ushort[size];
        }

Besides which - you might want to look at sparse-arrays, eta-vectors, and all that jazz.

Marc Gravell
I also tried to use jagged array in the same way but got OutOfMemoryException - I don't know on which step - I think something near 650MB :)
Cheburek
@4eburek - curious, but hard for us to reproduce your setup, unfortunately. You should be able to get a bit more than that, but moving to x64 would be better here.
Marc Gravell
@Marc: Jagged arrays are faster than 2D array
Tomek Tarczynski
@Marc: nothing special:public class TroubleClass{ ushort[,] m; public void Init(/*some params passed here*/){ m = new ushort[20000,20000]; // thats enough for me /* I tried also here Marshal.AllocHGlobal(700<<20); I call it from WinForms application and pass list of data to initialize that array. I tried to allocate such memory size on Load event but also without success */ }}
Cheburek
@Tomek Could you please provide more information about that - I need access items not sequentially. I can believe you if to use parallel threads of C#4.0
Cheburek
@4eburek: You can read more about Jagged arrays here: http://dotnetperls.com/jagged-array . For me it's counterintuitive that jagged arrays are faster, but I've done some benchmarks and there were faster :|
Tomek Tarczynski
@Tomek they shown sequential access to elements but if it is not so but something like: for each i,j take [i,s] and [s,j] elements (in whole matrix but not current row only then results may not be such optimistic - but it is need to benchmark this case
Cheburek
@4eburek: I've checked also non sequential access. In random access (both indexes were generated randomly) jagged array was also faster.
Tomek Tarczynski
@Tomek Thank you in advance. I can rewrite it using jagged arrays... but I still cannot allocate enough memory. I already have one version of the same code using sql database but it is eating too much disk space while filling a huge matrix...
Cheburek
From your explanation (relationships between words) your matrices seem really sparse (most of the cells will be 0) so why use arrays at all and not store the elements as a a hash table (with keys x) of hash tables (with keys y) or a hash table in which the keys are x,y pairs?
reinierpost
A: 

If sparse array does not apply, it's probably better to just do it in C/C++ with platform APIs related to memory mapped file: http://en.wikipedia.org/wiki/Memory-mapped_file

Codism
The .NET 4.0 Framework comes with classes for using Memory Mapped Files. http://msdn.microsoft.com/en-us/library/system.io.memorymappedfiles(VS.100).aspx
dtb
@dtb: I think it is much slower then using relational database but anyway thanks.
Cheburek
@dtb: nice to know!
Codism
A: 

If you explained what you are trying to do it would be easier to help. Maybe there are better ways than allocating such a huge amount of memory at once.

Re-design is also choice number one in this great blog post:

BigArray, getting around the 2GB array size limit

The options suggested in this article are:

0xA3
Thank you for your answer. Yes, I already saw BigArray but didn't try - looks like it uses jagged arrays. As I wrote in comment to previous answer I've tried to use jagged arrays - they are bit slower on the stage of initialization but even so I cannot allocate whole required memory - it throws OutOfMemory exception.The last think came in head - maybe something wrong with my instance of OS and need to try another computer. Because I cannot explain why when other men talk about 2GB I even cannot allocate 700Mb
Cheburek
+4  A: 

The reason why you can't get near even the 2Gb allocation in 32 bit Windows is that arrays in the CLR are laid out in contiguous memory. In 32 bit Windows you have such a restricted address space that you'll find nothing like a 2Gb hole in the virtual address space of the process. Your experiments suggest that the largest region of available address space is 650Mb. Moving to 64 bit Windows should at least allow you to use a full 2Gb allocation.

Note that the virtual address space limitation on 32 bit Windows has nothing to do with the amount of physical memory you have in your computer, in your case 3Gb. Instead the limitation is caused by the number of bits the CPU uses to address memory addresses. 32 bit Windows uses, unsurprisingly, 32 bits to access each memory address which gives a total addressable memory space of 4Gbytes. By default Windows keeps 2Gb for itself and gives 2Gb to the currently running process, so you can see why the CLR will find nothing like a 2Gb allocation. With some trickery you can change the OS/user allocation so that Windows only keeps 1Gb for itself and gives the running process 3Gb which might help. However with 64 bit windows the addressable memory assigned to each process jumps up to 8 Terabytes so here the CLR will almost certainly be able to use full 2Gb allocations for arrays.

Andrew O'Reilly
I tried to use 64bit Windows7 without success. But it was on virtual PC under the same WinXP so I cannot guaranty that results are correct
Cheburek
That's not an environment I have any familiarity with, however one thing to check is that your code is compiled such that it will run as a native x64 executable on 64 bit Windows. This MSDN page http://msdn.microsoft.com/en-us/library/ms241064%28VS.80%29.aspx gives more information.
Andrew O'Reilly
@Andrew: Under win7 I used VS2010 and in configuration managed selected Release Mode and x64 type of processors. That is the quick answer but I need to verify if there are all required steps to compile under 64bit platform
Cheburek
A: 

For the OutOfMemoryException read this thread (especially nobugz and Brian Rasmussen's answer):
http://stackoverflow.com/questions/2347935/microsoft-visual-c-2008-reducing-number-of-loaded-dlls

Thanks. I'll read it tonight. But as answer at first look - I tried to run application in Release mode. Results the same
Cheburek
A: 

I'm so happy! :) Recently I played around subject problem - tried to resolve it using database but only found that this way is far to be perfect. Matrix [20000,20000] was implemented as single table. Even with properly set up indexes time required only to create more than 400 millions records is about 1 hour on my PC. It is not critical for me. Then I ran algorithm to work with that matrix (require twice to join the same table!) and after it worked more than half an hour it made no even single step. After that I understood that only way is to find a way to work with such matrix in memory only and back to C# again.

I created pilot application to test memory allocation process and to determine where exactly allocation process stops using different structures.

As was said in my first post it is possible to allocate using 2-dim arrays only about 650MB under 32bit WinXP. Results after using Win7 and 64bit compilation also were sad - less than 700MB.

I used JAGGED ARRAYS [][] instead of single 2-dim array [,] and results you can see below:

Compiled in Release mode as 32bit app - WinXP 32bit 3GB phys. mem. - 1.45GB Compiled in Release mode as 64bit app - Win7 64bit 2GB under VM - 7.5GB

--Sources of application which I used for testing are attached to this post. I cannot find here how to attach source files so just describe design part and put here manual code. Create WinForms application. Put on form such contols with default names: 1 button, 1 numericUpDown and 1 listbox In .cs file add next code and run.

private void button1_Click(object sender, EventArgs e)
        {
            //Log(string.Format("Memory used before collection: {0}", GC.GetTotalMemory(false)));
            GC.Collect();
            //Log(string.Format("Memory used after collection: {0}", GC.GetTotalMemory(true)));
            listBox1.Items.Clear();
            if (string.IsNullOrEmpty(numericUpDown1.Text )) {
                Log("Enter integer value");
            }else{
                int val = (int) numericUpDown1.Value;
                Log(TryAllocate(val));
            }
        }

        /// <summary>
        /// Memory Test method
        /// </summary>
        /// <param name="rowLen">in MB</param>
        private IEnumerable<string> TryAllocate(int rowLen) {
            var r = new List<string>();
            r.Add ( string.Format("Allocating using jagged array with overall size (MB) = {0}", ((long)rowLen*rowLen*Marshal.SizeOf(typeof(int))) >> 20) );
            try {
                var ar = new int[rowLen][];
                for (int i = 0; i < ar.Length; i++) {
                    try {
                        ar[i] = new int[rowLen];
                    }
                    catch (Exception e) {
                        r.Add ( string.Format("Unable to allocate memory on step {0}. Allocated {1} MB", i
                            , ((long)rowLen*i*Marshal.SizeOf(typeof(int))) >> 20 ));
                        break;
                    }
                }
                r.Add("Memory was successfully allocated");
            }
            catch (Exception e) {
                r.Add(e.Message + e.StackTrace);
            }
            return r;
        }

        #region Logging

        private void Log(string s) {
            listBox1.Items.Add(s);
        }

        private void Log(IEnumerable<string> s)
        {
            if (s != null) {
                foreach (var ss in s) {
                    listBox1.Items.Add ( ss );
                }
            }
        }

        #endregion

The problem is solved for me. Guys, thank you in advance!

Cheburek