views:

80

answers:

1
mydata = [{'date': datetime.datetime(2009, 1, 31, 0, 0), 'value': 14, 'year': u'2009'},
           {'date': datetime.datetime(2009, 2, 28, 0, 0), 'value': 84, 'year': u'2009'},
           {'date': datetime.datetime(2009, 3, 31, 0, 0), 'value': 77, 'year': u'2009'},
           {'date': datetime.datetime(2009, 4, 30, 0, 0), 'value': 80, 'year': u'2009'},
           {'date': datetime.datetime(2009, 5, 31, 0, 0), 'value': 6, 'year': u'2009'},
           {'date': datetime.datetime(2009, 6, 30, 0, 0), 'value': 16, 'year': u'2009'},
           {'date': datetime.datetime(2009, 7, 31, 0, 0), 'value': 16, 'year': u'2009'},
           {'date': datetime.datetime(2009, 8, 31, 0, 0), 'value': 1, 'year': u'2009'},
           {'date': datetime.datetime(2009, 9, 30, 0, 0), 'value': 9, 'year': u'2009'},
           {'date': datetime.datetime(2008, 1, 31, 0, 0), 'value': 77, 'year': u'2008'},
           {'date': datetime.datetime(2008, 2, 29, 0, 0), 'value': 60, 'year': u'2008'},
           {'date': datetime.datetime(2008, 3, 31, 0, 0), 'value': 28, 'year': u'2008'},
           {'date': datetime.datetime(2008, 4, 30, 0, 0), 'value': 9, 'year': u'2008'},
           {'date': datetime.datetime(2008, 5, 31, 0, 0), 'value': 74, 'year': u'2008'},
           {'date': datetime.datetime(2008, 6, 30, 0, 0), 'value': 70, 'year': u'2008'},
           {'date': datetime.datetime(2008, 7, 31, 0, 0), 'value': 75, 'year': u'2008'},
           {'date': datetime.datetime(2008, 8, 31, 0, 0), 'value': 7, 'year': u'2008'},
           {'date': datetime.datetime(2008, 9, 30, 0, 0), 'value': 10, 'year': u'2008'},
           {'date': datetime.datetime(2008, 10, 31, 0, 0), 'value': 54, 'year': u'2008'},
           {'date': datetime.datetime(2008, 11, 30, 0, 0), 'value': 55, 'year': u'2008'},
           {'date': datetime.datetime(2008, 12, 31, 0, 0), 'value': 40, 'year': u'2008'},
           {'date': datetime.datetime(2007, 12, 31, 0, 0), 'value': 93, 'year': u'2007'},]

In 'mydata', I get list of sequential monthly data. I wrote some code to group them on year.

partial_req_data = dict([(k,[f for f in v]) for k,v in itertools.groupby(mydata, key=lambda x : x.get('year'))])

Now I further need some efficient code to fill the missing months with {}, i.e. empty dict. There are bad ways to do that, but am looking for good ones.

required_data = {"2009": [{'date': datetime.datetime(2009, 1, 31, 0, 0), 'value': 14, 'year': u'2009' },
                  {'date': datetime.datetime(2009, 2, 28, 0, 0), 'value': 84, 'year': u'2009'},
                  {'date': datetime.datetime(2009, 3, 31, 0, 0), 'value': 77, 'year': u'2009'},
                  {'date': datetime.datetime(2009, 4, 30, 0, 0), 'value': 80, 'year': u'2009'},
                  {'date': datetime.datetime(2009, 5, 31, 0, 0), 'value': 6, 'year': u'2009'},
                  {'date': datetime.datetime(2009, 6, 30, 0, 0), 'value': 16, 'year': u'2009'},
                  {'date': datetime.datetime(2009, 7, 31, 0, 0), 'value': 16, 'year': u'2009'},
                  {'date': datetime.datetime(2009, 8, 31, 0, 0), 'value': 1, 'year': u'2009'},
                  {'date': datetime.datetime(2009, 9, 30, 0, 0), 'value': 9, 'year': u'2009'},
                  {}, {}, {}],

         "2008": [{'date': datetime.datetime(2008, 1, 31, 0, 0), 'value': 77, 'year': u'2008'},
                  {'date': datetime.datetime(2008, 2, 29, 0, 0), 'value': 60, 'year': u'2008'},
                  {'date': datetime.datetime(2008, 3, 31, 0, 0), 'value': 28, 'year': u'2008'},
                  {'date': datetime.datetime(2008, 4, 30, 0, 0), 'value': 9, 'year': u'2008'},
                  {'date': datetime.datetime(2008, 5, 31, 0, 0), 'value': 74, 'year': u'2008'},
                  {'date': datetime.datetime(2008, 6, 30, 0, 0), 'value': 70, 'year': u'2008'},
                  {'date': datetime.datetime(2008, 7, 31, 0, 0), 'value': 75, 'year': u'2008'},
                  {'date': datetime.datetime(2008, 8, 31, 0, 0), 'value': 7, 'year': u'2008'},
                  {'date': datetime.datetime(2008, 9, 30, 0, 0), 'value': 10, 'year': u'2008'},
                  {'date': datetime.datetime(2008, 10, 31, 0, 0), 'value': 54, 'year': u'2008'},
                  {'date': datetime.datetime(2008, 11, 30, 0, 0), 'value': 55, 'year': u'2008'},
                  {'date': datetime.datetime(2008, 12, 31, 0, 0), 'value': 40, 'year': u'2008'},]

         "2007": [{}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {},
                  {'date': datetime.datetime(2007, 12, 31, 0, 0), 'value': 93, 'year': u'2007'}]
         }
+6  A: 
import datetime
from itertools import groupby
from pprint import pprint

required_data={}
for k,g in groupby(mydata,key=lambda x: x.get('year')):
    partial={}
    for datum in g:
        partial[datum.get('date').month]=datum    
    required_data[k]=[partial.get(m,{}) for m in range(1,13)]
pprint(required_data)

For each year k, partial is a dict whose keys are months. The trick is to use partial.get(m,{}) since this will return the datum when it exists, or {} when it does not.

unutbu
Brilliant. Lets see for others.
simplyharsh
Nice solution. I would replace the key=lambda x: x.get('year') with the (imho better) key=operator.itemgetter("year")
Peter Hoffmann
I got to accept this. But I am still open for yet efficient answer. Thanks Unutbu.
simplyharsh