tags:

views:

593

answers:

3

I have an interesting problem coming up soon and I've started to think about the algorithm. The more I think about it, the more I get frightened because I think it's going to scale horribly (O(n^4)), unless I can get smart. I'm having trouble getting smart about this one. Here's a simplified description of the problem.

I have N polygons (where N can be huge >10,000,000) that are stored as a list of M vertices (where M is on the order of 100). What I need to do is for each polygon create a list of any vertices that are shared among other polygons (Think of the polygons as surrounding regions of interest, sometimes the regions but up against each other). I see something like this

Polygon i | Vertex | Polygon j | Vertex
   1          1         2          2
   1          2         2          3
   1          5         3          1
   1          6         3          2
   1          7         3          3

This mean that vertex 1 in polygon 1 is the same point as vertex 2 in polygon 2, and vertex 2 in polygon 1 is the same point as vertex 3 in polygon 2. Likewise vertex 5 in polygon 1 is the same as vertex 1 in polygon 3....

For simplicity, we can assume that polygons never overlap, the closest they get is touching at the edge, and that all the vertices are integers (to make the equality easy to test).

The only thing I can thing of right now is for each polygon I have to loop over all of the polygons and vertices giving me a scaling of O(N^2*M^2) which is going to be very bad in my case. I can have very large files of polygons, so I can't even store it all in RAM, so that would mean multiple reads of the file.

Here's my pseudocode so far

for i=1 to N
  Pi=Polygon(i)
  for j = i+1 to N
    Pj=Polygon(j)
    for ii=1 to Pi.VertexCount()
      Vi=Pi.Vertex(ii)
      for jj=1 to Pj.VertexCount()
        Vj=Pj.Vertex(jj)
        if (Vi==Vj) AddToList(i,ii,j,jj)
      end for
    end for
  end for
end for

I'm assuming that this has come up in the graphics community (I don't spend much time there, so I don't know the literature). Any Ideas?

+1  A: 

Well, one simple optimization would be to make a map (hashtable, probably) that maps each distinct vertex (identified by its coordinates) to a list of all polygons of which it is a part. That cuts down your runtime to something like O(NM) - still large but I have my doubts that you could do better, since I can't imagine any way to avoid examining all the vertices.

David Zaslavsky
+3  A: 

This is a classic iteration-vs-memory problem. If you're comparing every polygon with every other polygon, you'll run into a O(n^2) solution. If you build a table as you step through all the polygons, then march through the table afterwards, you get a nice O(2n) solution. I ask a similar question during interviews.

Assuming you have the memory available, you want to create a multimap (one key, multiple entries) with each vertex as the key, and the polygon as the entry. Then you can walk each polygon exactly once, inserting the vertex and polygon into the map. If the vertex already exists, you add the polygon as an additional entry to that vertex key.

Once you've hit all the polygons, you walk the entire map once and do whatever you need to do with any vertex that has more than one polygon entry.

Furious Coder
Now, I just need to think of a way to efficiently create a hash table that won't fit in memory :)
miked
That's a really cool solution!
mmr
Look into the STL Containers. Don't reinvent the wheel, just implement the algorithm. :)
Furious Coder
+2  A: 

If you have the polygon/face data you don't even need to look at the vertices.

  • Create an array from [0..M] (where M is the number of verts)
  • iterate over the polygons and increment the array entry of each vertex index.

This gives you an array that describes how many times each vertex is used.*

You can then do another pass over the polygons and check the entry for each vertex. If it's > 1 you know that vertex is shared by another polygon.

You can build upon this strategy further if you need to store/find other information. For example instead of a count you could store polygons directly in the array allowing you to get a list of all faces that use a given vertex index. At this point you're effectively creating a map where vertex indices are the key.

(*this example assumes you have no degenerate polygons, but those could easily be handled).

Andrew Grant