views:

269

answers:

3

Hi,

I have a raster file (basically 2D array) with close to a million points. I am trying to extract a circle from the raster (and all the points that lie within the circle. Using ArcGIS is exceedingly slow for this. Can anyone suggest any image processing library that is both easy to learn and powerful and quick enough for something like this?

Thanks!

+1  A: 

Extracting a subset of points efficiently depends on the exact format you are using. Assuming you store your raster as a numpy array of integers, you can extract points like this:

from numpy import *

def points_in_circle(circle, arr):
    "A generator to return all points whose indices are within given circle."
    i0,j0,r = circle
    def intceil(x):
        return int(ceil(x))
    for i in xrange(intceil(i0-r),intceil(i0+r)):
        ri = sqrt(r**2-(i-i0)**2)
        for j in xrange(intceil(j0-ri),intceil(j0+ri)):
            yield arr[i][j]

points_in_circle will create a generator returning all points. Please note that I used yield instead of return. This function does not actually return point values, but describes how to find all of them. It creates a sequential iterator over values of points within circle. See Python documentation for more details how yield works.

I used the fact that for circle we can explicitly loop only over inner points. For more complex shapes you may loop over the points of the extent of a shape, and then check if a point belongs to it. The trick is not to check every point, only a narrow subset of them.

Now an example of how to use points_in_circle:

# raster dimensions, 10 million points
N, M = 3200, 3200
# circle center and its radius in index space
i0, j0, r = 70, 20, 12.3

raster = fromfunction(lambda i,j: 100+10*i+j, (N, M), dtype=int)
print "raster is ready"
print raster

pts_iterator = points_in_circle((i0,j0,r), raster) # very quick, do not extract points yet
pts = array(list(pts_iterator)) # actually extract all points
print pts.size, "points extracted, sum = ", sum(pts)

On a raster of 10 million integers it is pretty quick.

Please describe file format or put a sample somewhere if you need more specific answer.

jetxee
You might want to remove the call to `in_circle` and replace `intcell(j0+r)` with `intcell(j0+ri)`. Also note that you treat the left and bottom of the circle differently from the right and top: the left and bottom limits should be int(floor(…)), for symmetry reasons.
EOL
Yes, thank you. I fixed `+ri` and removed `in_circle`. Left and bottom (top) should be `int(ceil(...))` though, because we need only inner points. For the right and top (bottom) I use the same `int(ceil(...))` because `xrange` is not inclusive with respect to its second argument. So, to make a symmetric selection, I use `ceil` on both ends.
jetxee
Thanks! The speed is very impressive! I am not sure why this will not work for extracting overlapping circles as you mentioned in the comment below.
Thanks again! I was able to combine it with my code which converts a arcGIS raster to a numpy array. What was taking arcGIS 2 minutes is now accomplished in a second :)
A: 

Numpy allows you to do this, and is extremely fast:

import numpy

all_points = numpy.random.random((1000, 1000))  # Input array
(image_size_x, image_size_y) = all_points.shape  # Size of your array of points all_points
# Disk definition:
(center_x, center_y) = (500, 500)
radius = 10

x_grid, y_grid = numpy.meshgrid(numpy.arange(image_size_x), numpy.arange(image_size_y))
disk = ((x_grid-center_x)**2 + (y_grid-center_y)**2) <= radius**2  # array of booleans with the disk shape

# You can now do all sorts of things with the mask "disk":

# For instance, the following array has only 317 points (about pi*radius**2):
points_in_disk = all_points[disk]
# You can also use masked arrays:
points_in_circle2 = numpy.ma.masked_array(all_points, ~disk)
from matplotlib import pyplot
pyplot.imshow(points_in_circle2)
EOL
This approach is likely to be much more memory hungry when extracting small shapes from huge rasters. In fact it is almost three times slower than my script on the same input (10M points, small circle) and almost 10 times slower with 100M points (and 2GB RAM). But unlike my solution, it preserves the shape (when using masked arrays), and it may be better when extracting the same shape twice. (BTW, You forgot to define `all_points`).
jetxee
@jetxee: Yes, this approach consumes memory, and is slower for small radii. However, the code is very legible, and it executes almost instantly, on a modern machine.
EOL
A: 

You need a library that can read your raster. I am not sure how to do that in Python but you could look at geotools (especially with some of the new raster library integration) if you want to program in Java. If you are good with C I would reccomend using something like GDAL.

If you want to look at a desktop tool you could look at extending QGIS with python to do the operation above.

If I remember correctly, the Raster extension to PostGIS may support clipping rasters based upon vectors. This means you would need to create your circles to features in the DB and then import your raster but then you might be able to use SQL to extract your values.

If you are really just a text file with numbers in a grid then I would go with the suggestions above.

TheSteve0
Thanks for the answer! I am planning to use GDAL to convert from raster to numpy array
Glad I could help!
TheSteve0