I want to use a data structure for sorting space-time data (x,y,z,time).
Currently a processing algorithm searches a set of 4D (x,y,z,time) points, given a spherical (3d) spacial radius and a linear (1d) time radius, marking for each point, which other points are within those radii. The reason is that after processing, I can ask any 4D point for all of its neighbours in O(1) time.
However in some common configurations of space and time radii, the first run of the algorithm takes about 12 hours. Believe it or not, that's actually fast compared to what exists in our industry. Nevertheless, I want to help speed up the initial runs and so I want to know: Is a kd-tree suitable for 4D space-time data?
Note that I am not looking for implementations of nearest-neighbour search or k-nearest-neighbours search.
More Info:
An example dataset has 450,000 4D points.
Some datasets are time-dense so ordering by time certainly saves processing, but still leads to many distance checks.
Time is represented by Excel-style dates, with typical ranges between 30,000-39,000 (approximate). The space ranges are sometimes higher values, sometimes lower values, but the range between each space co-ordinate is similar to time (e.g. maxX-minX ~ maxT-minT).
Even more info:
I thought I'd add some more slightly irrelevant data in case anybody has dealt with a similar dataset.
Basically I'm working with data that represents space-time events that are recorded and corroborated by multiple sensors. Error is involved, so only events that meet an error threshold are included.
The time span of these datasets ranges between 5-20 years of data.
For the really old data (>8 years old), the events were often very spacially dense for two reasons: 1) there were relatively few sensors available back then, and 2) the sensors were placed close together so that nearby events could be properly corroborated with low error. Further events could be recorded, but they had too high an error
For the newer data (<8 years old), the events are often very time dense, for the inverse reasons: 1) there are usually many sensors available, and 2) the sensors are placed at regular intervals over a larger distance.
As a result, the datasets cannot typically be said to be only time-dense or only spacially dense (except in the case of datasets that contain only new data).
Conclusion
I clearly should be asking more questions on this site.
I will be testing out several solutions over the next while which will include the 4d kd-tree, a 3d kd-tree followed by time distance check (suggested by Drew Hall), and the current algorithm I have. Also, I have been suggested another data structure called TSP (time space partitioning) tree, which uses an octree for space and a bsp on each node for time, so I may test that as well.
Assuming I remember, I'll be sure to post some profiling benchmarks on different time/space radii configurations.
Thanks all