This question is somewhat language-agnostic, but my tool of choice happens to be a numpy array.
What I am doing is taking the difference of two images via PIL:
img = ImageChops.difference(img1, img2)
And I want to find the rectangular regions that contain changes from one picture to another. Of course there's the built in .getbbox()
method, but if there are two regions with changes it will return a box from one region to another, and if there are only 1 pixel changes in each corner it will return the whole image.
For instance consider the following where o
is a non-zero pixel:
______________________
|o ooo |
| oooo ooo |
| o |
| o o |
| |
| oo o |
| o o ooo |
| oo ooooo |
| ooo |
| o |
|____________________|
I'd like to get 4x4-tuples containing the bounding boxes for each non-zero region. For the edge case of the
oooo
o
o o
structure, I'm not terribly worried how that's handled - either getting both sections separately or together, because the bounds of the inverted-L shape will completely overlap the bounds of the single pixel.
I've never done anything this advanced with image processing so I wanted to get some input before I really write anything (and if there are pre-existing methods in the modules I'm already using, I welcome them!).
My psuedocode-ish version goes something like this:
for line in image:
started = False
for pixel in line:
if pixel and not started:
started = True
save start coords
elif started and not pixel:
started = False
save end coords (x - 1 of course)
This should give me a list of coordinates, but then I have to determine if the regions are contiguous. I could do that with a graph-type search? (We did plenty of DFS and BFS in Algorithms last semester) Of course I guess I could do that instead/in conjunction with my previous loops?
I won't be doing this on "large" images - they're pulled from a webcam and the best one I currently have does 640x480. At most I'd be doing 720p or 1080p, but that's far enough into the future that it's not a real concern.
So my question(s): Am I headed on the right path, or am I way off? And more important, are there any built-in functions that prevent me from re-inventing the wheel? And finally, are there any good resources I should look at (tutorials, papers, etc.) that will help out here?
Thanks!