The .Net framework has an Array.Sort overload that allows one to specify the starting and ending indicies for the sort to act upon. However these parameters are only 32 bit. So I don't see a way to sort a part of a large array when the indicies that describe the sort range can only be specified using a 64-bit number. I suppose I could copy and modify the the framework's sort implementation, but that is not ideal.
Update:
I've created two classes to help me around these and other large-array issues. One other such issue was that long before I got to my memory limit, I start getting OutOfMemoryException's. I'm assuming this is because the requested memory may be available but not contiguous. So for that, I created class BigArray, which is a generic, dynamically sizable list of arrays. It has a smaller memory footprint than the framework's generic list class, and does not require that the entire array be contiguous. I haven't tested the performance hit, but I'm sure its there.
public class BigArray<T> : IEnumerable<T>
{
private long capacity;
private int itemsPerBlock;
private int shift;
private List<T[]> blocks = new List<T[]>();
public BigArray(int itemsPerBlock)
{
shift = (int)Math.Ceiling(Math.Log(itemsPerBlock) / Math.Log(2));
this.itemsPerBlock = 1 << shift;
}
public long Capacity
{
get
{
return capacity;
}
set
{
var requiredBlockCount = (value - 1) / itemsPerBlock + 1;
while (blocks.Count > requiredBlockCount)
{
blocks.RemoveAt(blocks.Count - 1);
}
while (blocks.Count < requiredBlockCount)
{
blocks.Add(new T[itemsPerBlock]);
}
capacity = (long)itemsPerBlock * blocks.Count;
}
}
public T this[long index]
{
get
{
Debug.Assert(index < capacity);
var blockNumber = (int)(index >> shift);
var itemNumber = index & (itemsPerBlock - 1);
return blocks[blockNumber][itemNumber];
}
set
{
Debug.Assert(index < capacity);
var blockNumber = (int)(index >> shift);
var itemNumber = index & (itemsPerBlock - 1);
blocks[blockNumber][itemNumber] = value;
}
}
public IEnumerator<T> GetEnumerator()
{
for (long i = 0; i < capacity; i++)
{
yield return this[i];
}
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return this.GetEnumerator();
}
}
And getting back to the original issue of sorting... What I really needed was a way to act on each element of an array, in order. But with such large arrays, it is prohibitive to copy the data, sort it, act on it and then discard the sorted copy (the original order must be maintained). So I created static class OrderedOperation, which allows you to perform an arbitrary operation on each element of an unsorted array, in a sorted order. And do so with a low memory footprint (trading memory for execution time here).
public static class OrderedOperation
{
public delegate void WorkerDelegate(int index, float progress);
public static void Process(WorkerDelegate worker, IEnumerable<int> items, int count, int maxItem, int maxChunkSize)
{
// create a histogram such that a single bin is never bigger than a chunk
int binCount = 1000;
int[] bins;
double binScale;
bool ok;
do
{
ok = true;
bins = new int[binCount];
binScale = (double)(binCount - 1) / maxItem;
int i = 0;
foreach (int item in items)
{
bins[(int)(binScale * item)]++;
if (++i == count)
{
break;
}
}
for (int b = 0; b < binCount; b++)
{
if (bins[b] > maxChunkSize)
{
ok = false;
binCount *= 2;
break;
}
}
} while (!ok);
var chunkData = new int[maxChunkSize];
var chunkIndex = new int[maxChunkSize];
var done = new System.Collections.BitArray(count);
var processed = 0;
var binsCompleted = 0;
while (binsCompleted < binCount)
{
var chunkMax = 0;
var sum = 0;
do
{
sum += bins[binsCompleted];
binsCompleted++;
} while (binsCompleted < binCount - 1 && sum + bins[binsCompleted] <= maxChunkSize);
Debug.Assert(sum <= maxChunkSize);
chunkMax = (int)Math.Ceiling((double)binsCompleted / binScale);
var chunkCount = 0;
int i = 0;
foreach (int item in items)
{
if (item < chunkMax && !done[i])
{
chunkData[chunkCount] = item;
chunkIndex[chunkCount] = i;
chunkCount++;
done[i] = true;
}
if (++i == count)
{
break;
}
}
Debug.Assert(sum == chunkCount);
Array.Sort(chunkData, chunkIndex, 0, chunkCount);
for (i = 0; i < chunkCount; i++)
{
worker(chunkIndex[i], (float)processed / count);
processed++;
}
}
Debug.Assert(processed == count);
}
}
The two classes can work together (that's how I use them), but they don't have to. I hope someone else finds them useful. But I'll admit, they are fringe case classes. Questions welcome. And if my code sucks, I'd like to hear tips, too.
One final thought: As you can see in OrderedOperation, I'm using ints and not longs. Currently that is sufficient for me despite the original question I had (the application is in flux, in case you can't tell). But the class should be able to handle longs as well, should the need arise.