views:

82

answers:

1

In the library I'm working on, we have data sets (which may be subsets of other data sets) that are distributed in memory in three-dimensional rectangular strided arrays. That is, an array A can be subscripted as A(i,j,k), where each index ranges from zero to some upper bound, and the location of each element in memory is given by:

A(i,j,k) = A0 + i * A_stride_i + j * A_stride_j + k * A_stride_k

where A0 is a base pointer, and A_stride_i et al are dimensional strides.

Now, because these data sets may be subsets of other data sets rather than each occupying their own independent malloc'ed block of memory, it's entirely possible that they may overlap (where overlap means that A(i,j,k) < B(m,n,p) is neither always true nor always false), and if they overlap they may interleave with each other or they may collide with each other (where collide means that A(i,j,k) == B(m,n,p) for some sextet of indices).

Therein lies the question. Some operations on two data sets (for example, a copy) are only valid if the arrays do not collide with each other, but are valid if they overlap in an interleaved non-colliding fashion. I'd like to add a function for two data sets whether two data sets collide or not.

Is there an existing algorithm for doing this in a reasonably efficient and straightforward way?

It's fairly easy to check whether the data sets overlap or not, so the key question is: Given two data sets of this form that overlap, what is an efficient algorithm to determine if they interleave or collide?

Example:

As a simple example, suppose we have memory locations from 0 to F (in hex):

0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F

I'll also consider only 2D arrays here, for simplicity. Suppose we have one of size 2,3 (that is, 0 <= i < 2 and 0 <= j < 3), with a stride_i = 1 and stride_j = 4, at a base address of 2. This will occupy (with occupied locations denoted by their i,j pair):

0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F
      *  *        *  *        *  *

Likewise, if we have another array of the same sizes and strides, starting at a base address of 4, that will look like this:

0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F
            o  o        o  o        o  o

In the terminology that I was using in describing the problem, these arrays "overlap", but they do not collide.

Restrictions and Assumptions:

We can assume that the strides are positive and, if desired, that they are in increasing order. Neither of things are true in the actual library, but it is reasonably simple to rearrange the array definition to get to this point.

We can assume that arrays do not self-interleave. This is also not enforced by the library, but would be a pathological case, and can be warned about separately. That is (assuming the strides are in increasing order, and i ranges from zero to max_i and so forth):

  • stride_j >= max_i * stride_i
  • stride_k >= max_j * stride_j

Points, of course, for methods that do not require these assumptions, as rearranging the array definition into a canonical order is a bit of work that's ideally avoided.

The two arrays cannot be assumed to have equal sizes or strides.

I don't think there's value in keeping track of things during construction -- there's no information occurring at construction that is not present when doing the test. Also, "construction" may simply be "consider the subset of this larger array with this base pointer, these strides, and these sizes."

Worst Likely Cases

svick's answer reminds me that I should probably add something about some typical "worse" cases that I expect this to see. One of the worst will be when we have an array that represents some very large number of complex values, stored in consecutive (real, imag) pairs, and then we have two sub-arrays containing the real and imaginary parts respectively -- so, you've got a few million elements in the array, alternating between the arrays. As this is not an unlikely case, it should be testable with something other than abysmal performance.

+1  A: 

I think the following C# program should work. It uses the branch and bound method and works for arrays of any number of dimensions.

    using System;
    using System.Collections.Generic;

    namespace SO_strides
    {
        sealed class Dimension
        {
            public int Min { get; private set; }
            public int Max { get; private set; }
            public int Stride { get; private set; }

            private Dimension() { }

            public Dimension(int max, int stride)
            {
                Min = 0;
                Max = max;
                Stride = stride;
            }

            public Dimension[] Halve()
            {
                if (Max == Min)
                    throw new InvalidOperationException();

                int split = Min + (Max - Min) / 2;

                return new Dimension[]
                {
                    new Dimension { Min = Min, Max = split, Stride = Stride },
                    new Dimension { Min = split + 1, Max = Max, Stride = Stride }
                };
            }
        }

        sealed class ArrayPart
        {
            public int BaseAddr { get; private set; }
            public Dimension[] Dimensions { get; private set; }
            public int FirstNonconstantIndex { get; private set; }

            int? min;
            public int Min
            {
                get
                {
                    if (min == null)
                    {
                        int result = BaseAddr;
                        foreach (Dimension dimension in Dimensions)
                            result += dimension.Min * dimension.Stride;
                        min = result;
                    }
                    return min.Value;
                }
            }

            int? max;
            public int Max
            {
                get
                {
                    if (max == null)
                    {
                        int result = BaseAddr;
                        foreach (Dimension dimension in Dimensions)
                            result += dimension.Max * dimension.Stride;
                        max = result;
                    }
                    return max.Value;
                }
            }

            public int Size
            {
                get
                {
                    return Max - Min + 1;
                }
            }

            public ArrayPart(int baseAddr, Dimension[] dimensions)
                : this(baseAddr, dimensions, 0)
            {
                Array.Sort(dimensions, (d1, d2) => d2.Stride - d1.Stride);
            }

            private ArrayPart(int baseAddr, Dimension[] dimensions, int fni)
            {
                BaseAddr = baseAddr;
                Dimensions = dimensions;
                FirstNonconstantIndex = fni;
            }

            public bool CanHalve()
            {
                while (FirstNonconstantIndex < Dimensions.Length
                    && Dimensions[FirstNonconstantIndex].Min == Dimensions[FirstNonconstantIndex].Max)
                    FirstNonconstantIndex++;

                return FirstNonconstantIndex < Dimensions.Length;
            }

            public ArrayPart[] Halve()
            {
                Dimension[][] result = new Dimension[2][];

                Dimension[] halves = Dimensions[FirstNonconstantIndex].Halve();

                for (int i = 0; i < 2; i++)
                {
                    result[i] = (Dimension[])Dimensions.Clone();
                    result[i][FirstNonconstantIndex] = halves[i];
                }

                return new ArrayPart[]
                {
                    new ArrayPart(BaseAddr, result[0], FirstNonconstantIndex),
                    new ArrayPart(BaseAddr, result[1], FirstNonconstantIndex)
                };
            }
        }

        sealed class CandidateSet
        {
            public ArrayPart First { get; private set; }
            public ArrayPart Second { get; private set; }

            public CandidateSet(ArrayPart first, ArrayPart second)
            {
                First = first;
                Second = second;
            }

            public bool Empty
            {
                get
                {
                    return First.Min > Second.Max || Second.Min > First.Max;
                }
            }

            public CandidateSet[] Halve()
            {
                int firstSize = First.Size;
                int secondSize = Second.Size;

                CandidateSet[] result;

                if (firstSize > secondSize && First.CanHalve())
                {
                    ArrayPart[] halves = First.Halve();
                    result = new CandidateSet[]
                    {
                        new CandidateSet(halves[0], Second),
                        new CandidateSet(halves[1], Second)
                    };
                }
                else if (Second.CanHalve())
                {
                    ArrayPart[] halves = Second.Halve();
                    result = new CandidateSet[]
                    {
                        new CandidateSet(First, halves[0]),
                        new CandidateSet(First, halves[1])
                    };
                }
                else
                    throw new InvalidOperationException();

                return result;
            }

            public static bool HasSolution(ArrayPart first, ArrayPart second)
            {
                Stack<CandidateSet> stack = new Stack<CandidateSet>();
                stack.Push(new CandidateSet(first, second));

                bool found = false;

                while (!found && stack.Count > 0)
                {
                    CandidateSet candidate = stack.Pop();
                    if (candidate.First.Size == 1 && candidate.Second.Size == 1)
                        found = true;
                    else
                    {
                        foreach (CandidateSet half in candidate.Halve())
                            if (!half.Empty)
                                stack.Push(half);
                    }
                }

                return found;
            }
        }

        static class Program
        {
            static void Main()
            {
                Console.WriteLine(
                    CandidateSet.HasSolution(
                        new ArrayPart(2, new Dimension[] { new Dimension(1, 1), new Dimension(2, 4) }),
                        new ArrayPart(4, new Dimension[] { new Dimension(1, 1), new Dimension(2, 4) })
                        )
                    );
            }
        }
    }
svick
Thanks! That seems like a fairly clever general solution indeed. It will perform poorly on some of the cases we care about (see the note I added to the question), but I think we could treat those with some special case tests -- for instance, check if the base values are different modulo the least common denominator of all the strides; and check if the larger strides are the same and if so compare "rows".
Brooks Moses