views:

329

answers:

8

I'm writing some code (just for fun so far) in Python that will store some data on every point in a 3d space. I'm basically after a 3d matrix object that stores arbitary objects that will allow me to do some advanced selections, like:

  • Get the point where x=1,y=2,z=3.
  • Getting all points where y=2.
  • Getting all points within 3 units of position x=1,y=2,z=3.
  • Getting all points where point.getType() == "Foo"

In all of the above, I'd need to end up with some sort of output that would give me the original position in the space, and the data stored at that point.

Apparently numpy can do what I want, but it seems highly optimised for scientific computing and working out how to get the data like I want above has so far eluded me.

Is there a better alternative or should I return to banging my head on the numpy wall? :)

EDIT: some more info the first three answers made me realise I should include: I'm not worried about performance, this is purely a proof-of-concept where I'd prefer clean code to good performance. I will also have data for every point in the given 3d space, so I guess a Spare Matrix is bad?

+3  A: 

Well ... If you expect to really fill that space, then you're probably best off with a densely packed matrix-like structure, basically voxels.

If you don't expect to fill it, look into something a bit more optimized. I would start by looking at octrees, which are often used for things like this.

unwind
voxels are not a data structure, it's a technique to display a 3D grid.
Luper Rouch
A: 

It depends upon the precise configuration of your system, but from the example you give you are using integers and discrete points, so it would probably be appropriate to consider Sparse Matrix data structures.

Cruachan
A: 

You can do the first 2 queries with slicing in numpy :

a = numpy.zeros((4, 4, 4))
a[1, 2, 3] # The point at x=1,y=2,z=3
a[:, 2, :] # All points where y=2

For the third one if you mean "getting all points within a sphere of radius 3 and centered at x=1,y=2,z=3", you will have to write a custom function to do that ; if you want a cube you can proceed with slicing, e.g.:

a[1:3, 1:3, 1:3] # The 2x2x2 array sliced from the center of 'a'

For the fourth query if the only data stored in your array is the cells type, you could encode it as integers:

FOO = 1
BAR = 2
a = numpy.zeros((4, 4, 4), dtype="i")
a[1, 2, 3] = FOO
a[3, 2, 1] = BAR
def filter(a, type_code):
    coords = []
    for z in range(4):
        for y in range(4):
            for x in range(4):
                if a[x, y, z] == type_code:
                    coords.append((x, y, z))
    return coords
filter(a, FOO) # => [(1, 2, 3)]

numpy looks like the good tool for doing what you want, as the arrays will be smaller in memory, easily accessible in C (or even better, cython !) and extended slicing syntax will avoid you writing code.

Luper Rouch
+1  A: 

One advantage of numpy is that it is blazingly fast, e.g. calculating the pagerank of a 8000x8000 adjacency matrix takes milliseconds. Even though numpy.ndarray will only accept numbers, you can store number/id-object mappings in an external hash-table i.e. dictionary (which in again is a highly optimized datastructure).

The slicing would be as easy as list slicing in python:

>>> from numpy import arange

>>> the_matrix = arange(64).reshape(4, 4, 4)
>>> print the_matrix[0][1][2]
    6
>>> print the_matrix[0][1]
    [4 5 6 7]
>>> print the_matrix[0]
    [[ 0  1  2  3]
    [ 4  5  6  7]
    [ 8  9 10 11]
    [12 13 14 15]]

If you wrap some of your desired functions (distances) around some core matrix and a id-object-mapping hash, you could have your application running within a short period of time.

Good luck!

The MYYN
I thought ndarray could only store numbers too, but it can store anything!numpy.ndarray([w,h,d],object)This is what I'm currently wrestling with, but it's annoying enough I posted here for better solutions that involve less knwledge of matrix stuff!
ok, than my suggestion is somewhat pointless, i'll delete it; but why do you want to avoid ``matrix stuff``; matrices are, i wouldn't say fun, but somewhat neat. and for your slicing, access, the ``ndarray`` is great - so why not use it?
The MYYN
Actually, it's far from pointless, it's the method I'm going with now. I just store my own custom class at each point. The only reason I dislike it is that a lot of numpy seems to assume you know matrix math/terminology, and since I don't, it's just something else to swot up on.
A: 

Here's an approach that may work.

Each point is a 4-tuple (x,y,z,data), and your database looks like this:

database = [ (x,y,z,data), (x,y,z,data), ... ]

Let's look at your use cases.

Get the point where x=1,y=2,z=3.

[ (x,y,z,data) for x,y,z,data in database if (x,y,z) == (1,2,3) ]

Getting all points where y=2.

[ (x,y,z,data) for x,y,z,data in database if y == 2 ]

Getting all points within 3 units of position x=1,y=2,z=3.

[ (x,y,z,data) for x,y,z,data in database if math.sqrt((x-1)**2+(y-2)**2+(z-3)**2)<=3.0 ]

Getting all points where point.getType() == "Foo"

[ (x,y,z,data) for x,y,z,data in database if type(data) == Foo ]
S.Lott
This is going to be insanely slow for searching!
Ed Woodcock
@Ed Woodcock: It's how RDBMS's work, so there is a precedent for this architecture. Further, indexing can be added to this if the data volume warrants it. Since we don't know the volume of data or the frequency of queries, we don't know enough to reject this yet.
S.Lott
The "arbitrary object" requirement is a small problem here: if it is immutable, it requires you to do three steps to change an entry:1. remove old tuple2. build new tuple3. append new tuple.Of course, that won't be a problem if you use your own class for the data object.
RoadieRich
@RoadieRich: Doesn't matter what class the extra piece of data is. We do rebuild bunches of tuples. But these tuples contain the same original objects, unchanged by being reincorporated into numerous new tuples. The underlying objects in all these tuples never change.
S.Lott
+2  A: 

Here's another common approach

class Point( object ):
    def __init__( self, x, y, z, data ):
        self.x, self.y, self.z = x, y, z
        self.data = data
    def distFrom( self, x, y, z )
        return math.sqrt( (self.x-x)**2 + (self.y-y)**2 + (self.z-z)**2 )

database = [ Point(x,y,z,data), Point(x,y,z,data), ... ]

Let's look at your use cases.

Get the point where x=1,y=2,z=3.

[ p for p in database if (p.x, p.y, p.z) == ( 1, 2, 3 ) ]

Getting all points where y=2.

[ p for p in database if p.y == 2 ]

Getting all points within 3 units of position x=1,y=2,z=3.

[ p for p in database if p.distFrom( 1, 2, 3 ) <= 3.0 ]

Getting all points where point.getType() == "Foo"

[ p for p in database if type(p.data) == Foo ]
S.Lott
Thanks! I was getting mixed up in data structures more complicated than they needed to be. This works very well and lets me do any sort of data selection as I want to each time.
A: 

When to use Binary Space Partitioning, Quadtree, Octree?

3d array imo is worthless. Especially if your world is dynamic. You should decide between BSP, Quadtree or Octtree. BSP would do just fine. Since your world is in 3d, you need planes when splitting the bsp, not lines.

Cheers !

Edit

I will also have data for every point in the given 3d space, so I guess a Spare Matrix is bad?

I guess this is alright if always know how large you data set is and that it never changes, i.e. if more points are added to it that in turn are out of bound. You would have to resize the 3d array in that case.

Magnus Skog
A: 

Using a dictionary with x,y,z tuples as keys is another solution, if you want a relatively simple solution with the standard library.

import math

#use indexing to get point at (1,2,3): points[(1,2,3)]
get_points(points, x=None, y=None, x=None):
    """returns dict of all points with given x,y and/or z values.  Call using keywords (eg get_points(points, x=3)"""
    filteredPoints = points.items()
    if x:
        filteredPoints = [p for p in filteredPoints if p[0][0] == x]
    if y:
        filteredPoints = [p for p in filteredPoints if p[0][1] == y]
    if z:
        filteredPoints = [p for p in filteredPoints if p[0][0] == x]
    return dict(filteredPoints)

get_point_with_type(points, type_):
    """returns dict of points with point.getType() == type_"""
    filteredPoints = points.items()
    return dict((position,data) for position,data in points.iterItems() if data.getType == type_)

get_points_in_radius(points,x,y,z,r):
    """Returns a dict of points within radius r of point (x,y,z)"""
    def get_dist(x1,y1,z1,x2,y2,z3):
        return math.sqrt((x1-x2)*(x1-x2)+(y1-y2)*(y1-y2)+(z1-z2)*(z1-z2))
    return dict((position,data) for position,data in points.iterItems() if get_dist(x,y,z, *position) <= r))

And due to python referencing, you can alter "points" in the returned dictionaries, and have the original points change as well (I think).

RoadieRich