tags:

views:

246

answers:

4

Let's assume that I've got 2d array like :

int[,] my_array = new int[100, 100];

The array is filled with ints. What would be the quickest way to check if a target-value element is contained within the array ?

(* this is not homework, I'm trying to come up with most efficient solution for this case)

+13  A: 

If the array isn't sorted in some fashion, I don't see how anything would be faster than checking every single value using two for statements. If it is sorted you can use a binary search.

Edit: If you need to do this repeatedly, your approach would depend on the data. If the integers within this array range only up to 256, you can have a boolean array of that length, and go through the values in your data flipping the bits inside the boolean array. If the integers can range higher you can use a HashSet. The first call to your contains function would be a little slow because it would have to index the data. But subsequent calls would be O(1).

Edit1:

This will index the data on the first run, benchmarking found that the Contains takes 0 milliseconds to run after the first run, 13 to index. If I had more time I might multithread it and have it return the result, while asynchronously continuing indexing on the first call. Also since arrays are reference types, changing the value of data passed before or after it has been indexed will provide strange functionality, so this is just a sample and should be refactored prior to use.

private class DataContainer
{
    private readonly int[,] _data;
    private HashSet<int> _index;

    public DataContainer(int[,] data)
    {
        _data = data;
    }

    public bool Contains(int value)
    {

        if (_index == null)
        {
            _index = new HashSet<int>();
            for (int i = 0; i < _data.GetLength(0); i++)
            {
                for (int j = 0; j < _data.GetLength(1); j++)
                {
                    _index.Add(_data[i, j]);
                }
            }
        }
        return _index.Contains(value);
    }
}
Yuriy Faktorovich
Assuming array is unsorted, parallelizing the search might make it faster, but it's still O(n).
Winston Smith
@Winston: With such a small array, making sure the threads fall on different processors... I'm not sure the overhead would make it worth it.
Yuriy Faktorovich
+1  A: 

create a hash out of the 2d array, where

1 --> 1st row 2 --> 2nd row ... n --> nth row

O(n) to check the presence of a given element, assuming each hash check gives O(1).

This data structure gives you an opportunity to preserve your 2d array.

upd: ignore the above, it does not give any value. See comments

D_K
elaborate please?
Maciek
int[,] my_array = new int[100, 100];// my_array --> my_array_int_mapmap<int,vector<int>> my_array_int_map;for (int i=0; i<100; i++) for (int j=0; j<100; j++) my_array_int_map[i].push_back(my_array[i][j]);Now you have your 2d array copied to a std:map, which emulates a hashtable. Each key is the 2d array's row number, value -- vector of elements in that row.
D_K
sorry, the lines have screwed up; fortunately there is only 6 lines
D_K
I think you're thinking of the HashTable in C#. This may give you a benifit to brute forcing it(O(n)) if you are going to be making a lot of searches. But I'd like to see some benchmarks.
Yuriy Faktorovich
@Yuriy Faktorovich: good point.You got my reasoning right: the hashtable with O(n) may outperform in average the O(n^2) brute force. OTOH, for a small n (like n=100) it may gain insignificant increase in performance at a cost of extra memory. I think it is up to the author's task definition and perfomance requirements.
D_K
@D_K, both brute force and hashset is O(n).
leppie
@leppie: once the Hashset is filled, it is O(1) for the contains.
Yuriy Faktorovich
@Yuriy Faktorovich, @leppie ignore the code I have given above, it is wrong. what instead I would do is a hashtable with keys as elements of 2d matrix and values as a vector of integer addresses in the 2d array. But this data structure must be maintained with the 2d array. So again extra space.
D_K
@D_K: why? just a simple HashSet in C# would just contain the values and have O(1) calls. You're trying to answer for c++.
Yuriy Faktorovich
@Yuriy Faktorovich, let's abstract from a particular language. The idea of having a hash table with the keys as 2d elements is having O(1) for a presence check and element position(s) in the original 2d array -- in case it is required. As it is slightly artificial task (homework), your hashset is perfectly enough.
D_K
@D_K: Ok, I've written the method to index the data first time contains is called. And to answer your next question, yes I'm bored and my gf isn't paying attention to me.
Yuriy Faktorovich
@Yuriy Faktorovich sometimes it happens. If gf does not pay attention to you, mb it's time to find another. I'm sarcastic here :)
D_K
+1  A: 

Assumptions:

  • There is no kind of ordering in the arrays we can take advantage of
  • You are going to check for existence in the array several times

I think some kind of index might work nicely. If you want a yes/no answer if a given number is in the array. A hash table could be used for this, giving you a constant O(k) for lookups.

Also don't forget that realistically, for small MxN array sizes, it might actually be faster just to do a linear O(n) lookup.

csl
A: 

You could encapsulate the data itself, and keep a Dictionary along with it that gets modified as the data gets modified.

The key of the Dictionary would be the target element value, and the value would be the number of entries of the element. To test if an element exists, simply check the dictionary for a count > 0, which is somewhere between O(1) and O(n). You could also get other statistics on the data much quicker with this construct, particularly if the data is sparse.

The biggest drawback to this solution is that data modifications have more operations involved (still should be O(1), though), so if you're mostly doing data manipulation, then this might not be suitable.

Jon Seigel