ansaurus

Question

Analyse python list with algorithm for counting occurences over date ranges

Answer 1

+4 A:

Your result is a dictionary, right?

{ userNumber: setOfDays }

How about this to get started.

from collections import defaultdict
visits = defaultdict(set)
for user, date in someList:
    visits[user].add(date)

This gives you a dictionary with a set of dates on which they visited.

counts = defaultdict(int)
for user in visits:
    v= len(visits[user])
    count[v] += 1

This gives you a dictionary of # visits, # of users with that many visits.

Is that the kind of thing you're looking for?

S.Lott 2009-01-11 20:56:29

You declared the first dictionary as 'visits', so you mean:for user in visits: visit = len(visits[users]) count[visit] += 1or even:for visit in visits.itervalues() : count[len(visit)] += 1

hughdbrown 2009-01-12 01:00:17

Answer 2

A:

First, I should mention that you NEED to store the date as a string. Currently, it would do arithmetic on your current entry. So, if you format data like this, it will work better:

data = 
[ 
  [1,"2008-12-01"],
  [1,"2008-12-01"],
  [2,"2008-12-01"]
]

Next, we can do something like this to get the number for each day:

result = {}
for (id, date) in data:
    if date not in result:
        result[date] = 1
    else:
        result[date] += 1

Now you can get the number of users for a specific date by doing something like this:

print result[some_date]

Evan Fosmark 2009-01-11 21:11:42

This should use collections.defaultdict, as S.Lott's code does. It's a class that simplifies this kind of dictionary addition.

hughdbrown 2009-01-12 01:01:11

Answer 3

A:

It is unclear what exactly your requirement are. Here's my take:

#!/usr/bin/env python
from collections import defaultdict

data = [ 
  [1,'2008-12-01'],
  [3,'2008-12-25'],
  [1,'2008-12-01'],
  [2,'2008-12-01'],
]

d = defaultdict(set)
for id, day in data:
    d[day].add(id)

for day in sorted(d):
    print('%d user(s) visited on %s' % (len(d[day]), day))

It prints:

2 user(s) visited on 2008-12-01
1 user(s) visited on 2008-12-25

J.F. Sebastian 2009-01-11 23:39:10

This gives how many people visited on given days because you are grouping by date. He wants how many times people visited x times. He needs grouping by user id, not date.

hughdbrown 2009-01-12 01:03:14

Answer 4

A:

How about this: this gives you set of days as well as count:

In [39]: from itertools import groupby ##itertools is a part of the standard library.

In [40]: l=[[1, '2008-12-01'],
   ....:  [1, '2008-12-01'],
   ....:  [2, '2008-12-01'],
   ....:  [1, '2008-12-01'],
   ....:  [3, '3008-12-04']]

In [41]: l.sort()

In [42]: l
Out[42]: 
[[1, '2008-12-01'],
 [1, '2008-12-01'],
 [1, '2008-12-01'],
 [2, '2008-12-01'],
 [3, '3008-12-04']]

In [43]: for key, group in groupby(l, lambda x: x[0]):
   ....:     group=list(group)
   ....:     print key,' :: ', len(group), ' :: ', group
   ....:     
   ....:     
1  ::  3  ::  [[1, '2008-12-01'], [1, '2008-12-01'], [1, '2008-12-01']]
2  ::  1  ::  [[2, '2008-12-01']]
3  ::  1  ::  [[3, '3008-12-04']]

user::number of visits :: visit dates

Here the user -1 visits on 2008-12-01 3 times, if you are looking to count only distinct dates then

for key, group in groupby(l, lambda x: x[0]):
    group=list(group)
    print key,' :: ', len(set([(lambda y: y[1])(each) for each  in group])), ' :: ', group
   ....:     
   ....:     
1  ::  1  ::  [[1, '2008-12-01'], [1, '2008-12-01'], [1, '2008-12-01']]
2  ::  1  ::  [[2, '2008-12-01']]
3  ::  1  ::  [[3, '3008-12-04']]

JV 2009-01-12 03:13:49

Answer 5

+1 A:

Rewriting S.Lott's answer in SQL as an exercise, just to check that I got the requirements right...

SELECT * FROM someList;

 userid |    date    
--------+------------
      1 | 2008-12-01
      1 | 2008-12-02
      1 | 2008-12-03
      1 | 2008-12-04
      1 | 2008-12-05
      2 | 2008-12-03
      2 | 2008-12-04
      2 | 2008-12-05
      3 | 2008-12-04
      4 | 2008-12-04
      5 | 2008-12-05
      5 | 2008-12-05

SELECT countdates, COUNT(userid) AS nusers
FROM ( SELECT userid, COUNT (DISTINCT date) AS countdates
             FROM someList
             GROUP BY userid ) AS visits
GROUP BY countdates
HAVING countdates <= 25
ORDER BY countdates;

 countdates | nusers 
------------+--------
          1 |      3
          3 |      1
          5 |      1

Federico Ramponi 2009-01-12 03:18:18

Answer 6

+1 A:

This is probably not the most pythonic or efficient or smartest or whatever way of doing this. But maybe you can confirm if I've understood the requirements correctly:

>>> log=[[1, '2008-12-01'], [1, '2008-12-01'],[2, '2008-12-01'],[2, '2008-12-03'], [1, '2008-12-04'], [3, '2008-12-04'], [4, '2008-12-04']]
>>> all_dates = sorted(set([d for d in [x[1] for x in log]]))
>>> for i in range(0, len(all_dates)):
...     log_slice = [d for d in log if d[1] <= all_dates[i]]
...     num_users = len(set([u for u in [x[0] for x in log_slice]]))
...     print "%d users visited in %d days" % (num_users, i + 1)
... 
2 users visited in 1 days
2 users visited in 2 days
4 users visited in 3 days
>>>

PEZ 2009-01-12 10:59:06

ansaurus

tags:

views:

answers:

Analyse python list with algorithm for counting occurences over date ranges

related questions