views:

107

answers:

6

I have a table that looks like this:

       <22  23-27   
8-10   1.3   1.8
11-13  2.2   2.8
14-16  3.2   3.8

and it goes on. So I'd like to lookup a value like this:

lookup(11,25)

and get the response, in this case 2.8. What is the best data structure to use for this? I have the data in CSV format.

I'm looking to program this in PHP.

Thank you.

+1  A: 

How about some kind of two dimensional data structure.

X "coordinates" being <22, 23-27
Y "coordinates" being ...

A two dimensional Array would probably work for this purpose.

You will then need some function to map the specific X and Y values to the ranges, but that should not be too hard.

adamse
A: 

the simplest option: create array of arrays, where each array consists of 5 elements: minX, maxX, minY, maxY, value, in your case it would be

$data = array(
      array(8, 10, 0, 22, 1.3),
      array(8, 10, 23, 27, 1.8),
      array(11, 13, 0, 22, 2.2), etc

write a loop that goes through every element and compares min & max values with your arguments:

 function find($x, $y) {
      foreach($data as $e) {
         if($x <= $e[0] && $x >= $e[1] && $y <= $e[2] && $y >= $e[3])
              return $e[4];
 }

with a small dataset this will work fine, if your dataset is bigger you should consider using a database.

stereofrog
+2  A: 

I'm certainly not claiming this is the best or most efficient data structure, but this is how I'd map your data into a two-dimensional PHP array that very closely resembles your raw data:

$fp = fopen('data.csv', 'r');
$cols = fgetcsv($fp);
array_shift($cols); // remove empty first item
$data = array();
while ($row = fgetcsv($fp)) {
  list($min, $max) = explode('-', $row[0]);
  // TODO: Handle non-range values here (e.g. column header "<22")
  $data["$min-$max"] = array();
  for ($x = 0; $x < count($cols); $x++) {
    $data["$min-$max"][$cols[$x]] = $row[$x + 1];
  }
}

You'd then need to add some parsing logic in your lookup function:

function lookup($row, $col) {
  $return = null;
  // Loop through all rows
  foreach ($data as $row_name => $cols) {
    list($min, $max) = explode('-', $row_name);
    if ($min <= $row && $max >= $row) {
      // If row matches, loop through columns
      foreach ($cols as $col_name => $value) {
        // TODO: Add support for "<22"
        list($min, $max) = explode('-', $col_name);
        if ($min <= $col && $max >= $col) {
          $return = $value;
          break;
        }
      }
      break;
    }
  }
  return $return;
}
pix0r
I think this will be perfect. Probably should have mentioned that efficiency wasn't necessary for this, so that isn't a concern anyway. Thanks for the detailed code!
Jon
+1  A: 

Database structure:

values
------
value
x_range_start
x_range_end
y_range_start
y_range_end

Code:

function lookup(x, y) {
    sql = "
     SELECT * FROM values
     WHERE
      x >= x_range_start
      AND
      x <= x_range_end

      AND
      y >= y_range_start
      AND
      y <= y_range_end
    "

    /---/
}

Your data would map to the database like so:

      <22  23-27   
8-10   1.3   1.8
11-13  2.2   2.8
14-16  3.2   3.8

(value, x start, x end, y start, y end)
1.3, 0, 22, 8, 10
1.8, 23, 27, 8, 10
2.2, 0, 22, 11, 13
...

Basically store the x and y axis start and end numbers for each value in the table.

Joel L
A: 

I'm partial to the 2 Dimensional array with a "hash" function that maps the ranges into specific addresses in the table.

So your underlying data structure would be a 2 dimensional array:

    0     1   
0  1.3   1.8
1  2.2   2.8
2  3.2   3.8

Then you would write two functions:

int xhash(int);
int yhash(int);

That take the original arguments and convert them into indexes into your array. So xhash performs the conversion:

8-10    0
11-13   1
14-16   2

Finally, your lookup operation becomes.

function lookup($x, $y)
{
  $xIndex = xhash($x);
  $yIndex = yhash($y);
  // Handle invalid indices!

  return $data[$xIndex][$yIndex];
}
Noah Callaway
A: 

Well, the other answers all use 2D arrays, which means using a 2D loop to retrieve it. Which, if your ranges are age ranges or something similar, may be finite (there are only so many age ranges!), and not an issue (what's a few hundred iterations?). If your ranges are expected to scale to enormous numbers, a play on a hash map may be your best bet. So, you create a hashing function that turns any number into the relevant range, then you do direct lookups, instead of a loop. It would be O(1) access instead of O(n^2).

So your hash function could be like: function hash(n) { if (n < 22) return 1; if (n < 25) return 2; return -1; }, and then you can specify your ranges in terms of those hash values (1, 2, etc.), and then just go $data[hash(11)][hash(25)]

Nathan