In the library I'm working on, we have data sets (which may be subsets of other data sets) that are distributed in memory in three-dimensional rectangular strided arrays. That is, an array A
can be subscripted as A(i,j,k)
, where each index ranges from zero to some upper bound, and the location of each element in memory is given by:
A(i,j,k) = A0 + i * A_stride_i + j * A_stride_j + k * A_stride_k
where A0
is a base pointer, and A_stride_i
et al are dimensional strides.
Now, because these data sets may be subsets of other data sets rather than each occupying their own independent malloc'ed block of memory, it's entirely possible that they may overlap (where overlap means that A(i,j,k) < B(m,n,p)
is neither always true nor always false), and if they overlap they may interleave with each other or they may collide with each other (where collide means that A(i,j,k) == B(m,n,p)
for some sextet of indices).
Therein lies the question. Some operations on two data sets (for example, a copy) are only valid if the arrays do not collide with each other, but are valid if they overlap in an interleaved non-colliding fashion. I'd like to add a function for two data sets whether two data sets collide or not.
Is there an existing algorithm for doing this in a reasonably efficient and straightforward way?
It's fairly easy to check whether the data sets overlap or not, so the key question is: Given two data sets of this form that overlap, what is an efficient algorithm to determine if they interleave or collide?
Example:
As a simple example, suppose we have memory locations from 0 to F (in hex):
0 1 2 3 4 5 6 7 8 9 A B C D E F
I'll also consider only 2D arrays here, for simplicity. Suppose we have one of size 2,3 (that is, 0 <= i < 2
and 0 <= j < 3
), with a stride_i = 1
and stride_j = 4
, at a base address of 2. This will occupy (with occupied locations denoted by their i,j pair):
0 1 2 3 4 5 6 7 8 9 A B C D E F
* * * * * *
Likewise, if we have another array of the same sizes and strides, starting at a base address of 4, that will look like this:
0 1 2 3 4 5 6 7 8 9 A B C D E F
o o o o o o
In the terminology that I was using in describing the problem, these arrays "overlap", but they do not collide.
Restrictions and Assumptions:
We can assume that the strides are positive and, if desired, that they are in increasing order. Neither of things are true in the actual library, but it is reasonably simple to rearrange the array definition to get to this point.
We can assume that arrays do not self-interleave. This is also not enforced by the library, but would be a pathological case, and can be warned about separately. That is (assuming the strides are in increasing order, and i
ranges from zero to max_i
and so forth):
stride_j >= max_i * stride_i
stride_k >= max_j * stride_j
Points, of course, for methods that do not require these assumptions, as rearranging the array definition into a canonical order is a bit of work that's ideally avoided.
The two arrays cannot be assumed to have equal sizes or strides.
I don't think there's value in keeping track of things during construction -- there's no information occurring at construction that is not present when doing the test. Also, "construction" may simply be "consider the subset of this larger array with this base pointer, these strides, and these sizes."
Worst Likely Cases
svick's answer reminds me that I should probably add something about some typical "worse" cases that I expect this to see. One of the worst will be when we have an array that represents some very large number of complex values, stored in consecutive (real, imag) pairs, and then we have two sub-arrays containing the real and imaginary parts respectively -- so, you've got a few million elements in the array, alternating between the arrays. As this is not an unlikely case, it should be testable with something other than abysmal performance.