views:

202

answers:

4

I have a list of phone numbers that have been dialed (nums_dialed). I also have a set of phone numbers which are the number in a client's office (client_nums) How do I efficiently figure out how many times I've called a particular client (total)

For example:

>>>nums_dialed=[1,2,2,3,3]
>>>client_nums=set([2,3])
>>>???
total=4

Problem is that I have a large-ish dataset: len(client_nums) ~ 10^5; and len(nums_dialed) ~10^3.

+1  A: 
>>> client_nums = set([2, 3])
>>> nums_dialed = [1, 2, 2, 3, 3]
>>> count = 0
>>> for num in nums_dialed:
...   if num in client_nums:
...     count += 1
... 
>>> count
4
>>> 

Should be quite efficient even for the large numbers you quote.

Eli Bendersky
+6  A: 

which client has 10^5 numbers in his office? Do you do work for an entire telephone company?

Anyway:

print sum(1 for num in nums_dialed if num in client_nums)

That will give you as fast as possible the number.


If you want to do it for multiple clients, using the same nums_dialed list, then you could cache the data on each number first:

nums_dialed_dict = collections.defaultdict(int)
for num in nums_dialed:
    nums_dialed_dict[num] += 1

Then just sum the ones on each client:

sum(nums_dialed_dict[num] for num in this_client_nums)

That would be a lot quicker than iterating over the entire list of numbers again for each client.

nosklo
or simply: len(filter(lambda x: x in client_nums, nums_dialed))
ony
@ony: that's not simpler. And it is slower and uses more memory, because it has to build an entire new list before calculating the `len`. list comprehensions and generator expressions are preferable to using `filter` and `map` for most cases, specially when you need to create and call a new function to use them.
nosklo
@nosklo: sorry, my mistake. yes "filter" and "map" in python2 uses lists, but that will be very strange if "len" in python3 will force list building.
ony
even shorter: `print sum(num in client_nums for num in nums_dialed)`
newacct
@ony: yes. It would be very strange if `len` forced list building. Instead, it fails with `TypeError` on any generators, on both python 2 and 3. Including `itertools.imap` and `itertools.ifilter` on python 2, and `map` and `filter` on python 3. Those just aren't the right tool for this job.
nosklo
@newacct: Yes, this works, but in this case I don't feel *shorter* is better - using `1` makes it clear about what's on the `sum`, while relying on the fact that `True` is an integer seems hackish.
nosklo
A: 

Thats very popular way to do some combination of sorted lists in single pass:

nums_dialed = [1, 2, 2, 3, 3]
client_nums = [2,3]

nums_dialed.sort()
client_nums.sort()

c = 0
i = iter(nums_dialed)
j = iter(client_nums)
try:
    a = i.next()
    b = j.next()
    while True:
        if a < b:
            a = i.next()
            continue
        if a > b:
            b = j.next()
            continue
        # a == b
        c += 1
        a = i.next() # next dialed
except StopIteration:
    pass

print c

Because "set" is unordered collection (don't know why it uses hashes, but not binary tree or sorted list) and it's not fair to use it there. You can implement own "set" through "bisect" if you like lists or through something more complicated that will produce ordered iterator.

ony
+1  A: 

Using collections.Counter from Python 2.7:

dialed_count = collections.Counter(nums_dialed)
count = sum(dialed_count[t] for t in client_nums)
Paul Hankin