views:

118

answers:

3

I am a bit at a loss how to do the following efficiently in Mathematica:

a = { 1, 2, 3, 4, 5 };  (* list of integers *)
b = { 2, 4, 6, 8 };     (* another list of integers *)
filter = Table[MemberQ[b, element], {element,a}]

Expected output is:

{False, True, False, True, False}

My lists a and b are big, so Mathematica is doing a kazillion linear searches through b. I want it to do faster lookups with a hashtable. But there seems to be no such structure. The closest I could find is a SparseArray, but

sa = SparseArray[{1 -> True, 2 -> True}];
MemberQ[sa, 1]

is False.

I'm sure this must be possible in Mathematica in one line of code or less, I just can't see it for the trees, or something.

Any hero to the rescue? Meanwhile, I'm going to do this with C#.

+3  A: 

The filter constructed by linear search can be simplified to:

MemberQ[b, #]& /@ a

Anyway, you could construct a sparse array from b by:

s = SparseArray[b -> ConstantArray[True, Length@b], Max[a, b], False];

then for indices i in the sparse array, s[[i]] will return True, and for those outside, s[[i]] will return False. So the list can be constructed with

s[[#]]& /@ a

We can compare the results:

In[130]:= alist = Prime /@ Range[2000];
          blist = alist + 2;

In[165]:= Timing[MemberQ[blist, #]& /@ alist;]

Out[165]= {0.91024,Null}

In[168]:= Timing[
           s = SparseArray[blist -> ConstantArray[True, Length@blist],
                           Max[alist, blist], False];
           s[[#]]& /@ alist;]

Out[168]= {0.017772,Null}
KennyTM
Regarding your first sentence, probably the asker just meant MemberQ[b, element] instead of MemberQ[element,b]. Not to say your version isn't better.
dreeves
@dreeves: Actually I am talking about the `Table[..., {element,a}]` part. The MemberQ arguments is a minor thing.
KennyTM
But there's nothing wrong with the Table[..., {element,a}] part (except perhaps that it's not as efficient/elegant as your Map-based version). To remove any ambiguity about this, I think I'll go ahead and fix the typo in the question so you can actually run the asker's example. [Done; added expected output too.]
dreeves
@dreeves Thanks for the clarification. I deleted my answer.
belisarius
@dreeves: Nevermind. It was the problem of my testing code which caused `Table[..., {element, a}]` to throw an error.
KennyTM
+6  A: 

The following question is equivalent to yours and contains an answer in Mathematica:

http://stackoverflow.com/questions/1954636/given-list-subset-of-list-return-bitmask

Here's that answer (which I'll feel free to steal since it's actually mine):

bitmask[a_, b_] := Module[{f},
  f[_] = False;
  (f[#] = True)& /@ b;
  f /@ a]

The idea is to make a hash table, f, that can answer in constant time whether a given element is a member of list b. First f is given a default value of False (if we haven't seen it before it's not in b). Then a single pass through the list b is made, setting f[i] to True for each i in b. That's all the set up. Now a single pass of the hash function f over the list a gives us the answer. Total time is O(n+m) -- one pass through each list.

dreeves
Thanks a million, works like a charm. Where do you learn the subtle nuances of pattern matching anyway? Like how do you know it's roughly O(1) time, etc. The only thing I really dislike about mathematica, is that even though it's supposed to be mathematicy, it almost never lists complexities.
Gleno
@Gleno I think your comment is a very good seed for another question :)
belisarius
@dreeves Very elegant! +1
belisarius
@Gleno: I'm don't think of this as involving subtleties of pattern matching but rather just following Mathematica's syntax for creating a hash function (aka dictionary). I added some exposition about the algorithm in case that's useful.
dreeves
There are some subtleties with function definitions for Module[] variables. I tend to use Dispatch[] to avoid these issues, or call Clear[f] explicitly to make sure everything gets cleaned up.
Joshua Martell
@Joshua: I'd love to learn more about that since this is something I blithely do all the time. The f in the Module is really something like f$246 -- unique to the Module -- so I'm having trouble imagining what could go wrong.
dreeves
I'm not able to reproduce now, but as I recall, it had something to do with setting downvalues of Module[] or Block[]'ed variables. It's possible it was fixed in V7.
Joshua Martell
+5  A: 

Another idea would be to use rules and Dispatch. It seems to be faster when the length of the blist is large:

alist = Prime /@ Range[20000];
blist = alist + 2;

membQ = First@Timing[MemberQ[blist,#]&/@alist;]

sparse = First@Timing[
    s = SparseArray[blist -> ConstantArray[True, Length@blist], Max[alist, blist], False];
    s[[#]] & /@ alist;
]

rules = First@Timing[
    bRules = Dispatch[Append[Rule[#, True] & /@ blist, _ -> False]];
    (# /. bRules) & /@ alist;
]

fct = First@Timing[
    f[_] = False;
    (f[#] = True) & /@ blist;
    f /@ alist;
]

On my laptop, rules (0.06s) < fct (0.09s) < sparse (0.9s) < membQ (40s).

Edit / Comparing fct and rules timings

@Karsten please feel free to rollback to your original answer

Tables generated with Prime[...]

alt text

Tables generated with RandomInteger[...]

alt text

Karsten W.
Thanks for doing the timing comparison. I think you deserve Accepted Answer for that even if you didn't also have the fastest answer. But about that, the .06 vs .09 is unconvincing -- want to try it with order of magnitude bigger input to make sure it holds up?
dreeves
Which one is better, seems to depend on alist and blist, too. With alist and blist set to RandomInteger[**100000**, 800000], the Dispatch solution is with 5.2s better than the functional solution (5.4), while on RandomInteger[**10000**, 800000] the functional solution is faster (5.3s vs. 5.7s). The sparse solution emits error messages, which I did not look into..
Karsten W.
I re-awarded accepted answer to you, because dreeves request it. Thanks for timing comparisons, I feel somewhat uneasy wasting people's time like that. :)
Gleno
@Karsten: Interesting, seems like sparse and functional are rather neck and neck though it looks like functional is simpler, especially if we wanted to figure out how to suppress sparse's error messages.
dreeves
@Gleno: Certainly don't feel uneasy. You asked an awesome question that inspired people to do some great work that will help the whole community. You're living the StackOverflow dream! :)
dreeves
@dreeves @Karsten .. added two comparative plots
belisarius
@belisarius: looks good! @Gleno: it's fun!
Karsten W.