ansaurus

Question

Nested Lists in Dict : Accessing members of list within the list of a dictionary

Answer 1

A:

>>> from itertools import groupby
>>> from operator import itemgetter
>>> cat = [[1,1],[1,1],[1,1],[3,1],[3,1]]
>>> [(k,len(list(v))) for k, v in groupby(cat,itemgetter(0))]
[(1, 3), (3, 2)]

will fix your code. But that doesn't solve the problem of why the code is doing the wrong thing in the first place! The solution is to use the collections.Counter class, which will do the work for you if you just feed it a list of words.

>>> words = "Lorem ipsum dolor sit ames, lorem ipsum dolor sit ames.".split(" ")
>>> Counter(words)
Counter({'ipsum': 2, 'sit': 2, 'dolor': 2, 'lorem': 1, 'ames.': 1, 'ames,': 1, 'Lorem': 1})
>>> Counter(map(str.lower, words))
Counter({'ipsum': 2, 'sit': 2, 'dolor': 2, 'lorem': 2, 'ames.': 1, 'ames,': 1})

katrielalex 2010-10-05 07:47:19

Answer 2

A:

final counts:

{'cat':[[1,3], [3,2]]}

Words in current document:

{'cat':3}

I like that you have chosen to use defaultdict. It makes the following possible and is faster then looping through the keys.

from collections import defaultdict
    all_word_counts = defaultdict(list)
    all_word_counts['cat'].append([1, 3])

First count the word frequency in a given document

word_count = defaultdict(int) #reset each document
for term in self.tokenize(lines):
    word_count[term] += 1

Before moving on to the next document update the all_word_counts

for word, count in word_count.iteritems():
    all_word_counts[word].append([docnumber, count])

kevpie 2010-10-10 18:12:17

ansaurus

tags:

views:

answers:

Nested Lists in Dict : Accessing members of list within the list of a dictionary

related questions