views:

262

answers:

5

Problem

This question actually came up at work today. We are planning an experiment where we will show a series of maps to users. Each map has 31 symbols on it. Each symbol also has a label. We noticed that, in a few cases, the labels overlapped, which rendered one of the labels unreadable.

We ended up identifying the problem symbols the old fashioned way--by looking at each map one by one and writing down the ID of any problem symbols we found--but I was thinking this problem could have been solved quite easily with an algorithm. It took about one man hour to visually check all the maps (using an admittedly clunky experiment data collection tool).

I'm curious how quickly people on this site could solve this problem, and I'm also interested in what kind of algorithms you come up with. (Note: this is not a homework question, although I think it would make an interesting homework or interview question.)

Specifications

  • Maps: 24
  • Map Size (pixels): 1024 X 768
  • Symbols (circles) per map: 31
  • Symbol Diameter (pixels): 60

Symbol coordinates are stored in a spreadsheet table (assume a tab-delimited text file) with the following columns:

  • MapId (range: 1 - 24)
  • SymbolId (range: 1 - 744 (24 maps x 31 symbols/map = 744 total symbols)
  • XCoordinate (range: 0 - 1024)
  • YCoordinate (range: 0 - 768)

Assume all four columns contain integers.

Goal

How fast can you come up with an algorithm (language of your choice) that:

  1. Reads in the tab-delimited text file that contains the input data.
  2. Determines, for each map, if any symbols overlap.
  3. If any symbols do overlap, reports which SymbolIds are in violation

Your answer should contain

  1. Your algorithm, which must satisfy all three goals (above).
  2. How long it took you to think of it and write it (you're on your honor). Note: you don't need to count the time it took you to read and understand the question, but once you start thinking of ideas for a solution, start the clock.

I'm not interested in how fast your algorithm runs or how efficient it is in its use of memory. I'm looking for a quick and dirty but accurate and reliable solution to the problem.

+8  A: 

For each map:

If distance(A.center, B.center) < (A.radius+B.radius)
   the circles overlap.

In your case it appears that every circle has the same radius, but I've allowed for the possibility of each circle having a different radius, just in case.

As for how long to think it up, it's a bit hard to say exactly -- less time than it took to load the page with the full description. Then I had to do some reading to confirm that the basic problem was what it appeared to be from the title...

Edit: If you had a lot of circles, it might be worth doing some sorting to eliminate unnecessary tests for overlap, but given only ~30 circles per map, that's probably not worthwhile -- you'd need a truly ancient computer for this to take a whole second.

Sorry, had to leave for a bit to take the kids to a roller hockey game. Anyway, although I haven't tested it, here's what I'd produce for a C++ program in something like 10-12 minutes (I had to make a call in there, so I don't have an exact time):

#include <vector>
#include <math.h>
#include <iostream>

struct point { 
    int id, x, y;
};

std::istream &operator>>(std::istream &is, point &p) { 
    return is >> p.id >> p.x >> p.y;
}

typedef std::vector<point> map;

double dist(point const &a, point const &b) { 
    double x=a.x-b.x;
    double y=a.y-b.y;
    return sqrt(x*x+y*y);
}

int main() { 
    std::vector<map> maps(24);
    int mapID;
    while (std::cin>> mapID) {
        point temp;
        std::cin >> temp;
        maps[mapID].push_back(temp);
    }

    for (int i=0; i<maps.size(); i++)
        for (int j=0; j<maps[j].size(); j++) 
            for (int k=j; k<maps[j].size(); k++) 
                if (dist(maps[i][j], maps[i][k]) < 60)
                    std::cout 
                        << "map " << i << "\t" 
                        << maps[i][j].id 
                        << " overlaps with " 
                        << maps[i][k].id << "\n";
    return 0;
}

I haven't actually tested this, so it might take 5 to 10 more minutes (or something on that order) to make sure it works right, but I wouldn't expect anything a lot longer. At any rate, I'd expect one person to finish in well under an hour. If you were accustomed to something like AWK or Perl, I'd expect to finish a bit faster still -- though, in all honesty, this is sufficiently trivial that it mostly comes down to reduced typing...

Jerry Coffin
Beat me to it...
Drew Hall
indeed, this is really quite boring of a problem
Ron Warholic
Me too. ;) Was going to link to http://www.netsoc.tcd.ie/~jgilbert/maths_site/applets/circles/overlapping_circles_and_lines.html as an example.
Yoopergeek
It's easier to calculate distance squared. Calculating distance from Cartesian coordinates requires a square root.
David Thornley
@Jerry, I just clarified my question a little bit. I'm interested in whether it would have been better to program this or do it by hand (as we did). You've solution checks if two circles overlap, which is great, but I'm curious how long it takes to write code that reads in a file, checks *all* maps, and outputs results.
DanM
+4  A: 

My first thought is the simple O(N2) algorithm Jerry posted already, with one small exception: distance2 is easier to calculate, since it doesn't require a square root, so compare that to (sum of radii)2.

My second thought is to enhance that solution with a quadtree containing all your points, which helps cut down on the number of comparisons which need to be made.

It took longer to write this answer than it did to think up of either of these solutions -- maybe 10 seconds. This kind of problem is very common in computer graphics, I'm just regurgitating what I've seen before.


A quickly-written implementation of solution 1 in Haskell (took under a minute):

import Control.Monad
import Data.List
radius = 30
main = do
    points <- fmap (map (map read . words) . lines) getContents
    mapM_ print $ overlapping (points :: [[Int]])
overlapping list = do
    [m1, s1, x1, y1]:rest <- tails list
    [m2, s2, x2, y2] <- rest
    guard $ m1 == m2 && (x2-x1)^2 + (y2-y1)^2 < 2*radius^2
    return ((m1, s1), (m2, s2))

With the problem as specified, that's less than 3e5 comparisons; it's probably not worth it to write a quadtree insertion/search. Even if a comparison took 3000 clock cycles (which it obviously wouldn't), it would still finish in under a second.

ephemient
That code is 365 characters long. If we take "under a minute" to mean, say, 50 seconds, that works out to typing at just under 90 words per minute. That's well into the range of a really good typist doing transcribing -- just typing in what they read on a piece of paper (with no formatting). I'd have to see it in person before I believed that anybody, for even the most trivial task, can manage to type code in at almost 90 WPM.
Jerry Coffin
Thinking time not included. First I ran some trials in an interpreter, then opened up a text editor and banged this in.
ephemient
A: 

You might also try "Sweep and prune" if you are looking for ways to speed up your answer.

EToreo
+1  A: 

Here's something faster than O(n^2) where n = number of symbols. Use a matrix of dynamic arrays like STL vector to represent each pixel in your image. Compute the pixels occupied by each circle and push its SymbolIDinto the array of each pixel in the circle. Once you're done with all the circles, just look at each pixel and if any pixel has more than one symbol in its array, you know which SymbolIDs are in violation.

Time taken: ~5 mins

UPDATE: With the new requirements, my initial algorithm can be slightly modified to read in each row, update the pixel-SymbolID map I've defined above for each symbol and move on to the next map.

Jacob
+2  A: 

Assuming SQL powers in your spreadsheet software, I'd run it through a query:

select 
   s1.symbolId,s2.symbolId
from 
   symbols s1 
   join 
   symbols s2 
where 
   s1.mapid=s2.mapid 
   and 
   s1.symbolid<s2.symbolid 
   and 
   ((s1.xcoordinate-s2.xcoordinate)*
   (s1.xcoordinate-s2.xcoordinate)+
   (s1.ycoordinate-s2.ycoordinate)*
   (s1.ycoordinate-s2.ycoordinate))
   <(r+r)*(r+r)

(about 5 mins probably with a few mistakes)

spender
+1, you can definitely run a SQL query on a text file, so this is a nice solution. I didn't check for mistakes, but I think it could be debugged pretty fast. I'd give you another +1 for reading the question properly if I could :)
DanM