DWH style query in Django to generate summary report at specified levels of detail

I have a catalog of items like so:

class Items(models.Model):
    name=models.CharField()
    type=models.ForeignKey(ItemType)
    quantity=models.IntegerField()
    price=models.DecimalField() # Simplified
    [... other fields]

and it has some attributes governed by:

class ItemAttributes(models.Model):
    name=models.CharField()
    type=modles.ForeignKey(ItemType)

class ItemAttributeRecords(models.Model):
    attribute=ForeignKey(ItemAttributes)
    item=ForeignKey(Items)
    value=CharField() #Actual model stores ints/bools etc. simplified for SO

Different types of items have a different set of ItemAttributes eg. a Tshirt will have a size, design and colour but a Towel may only have a size and design.

I'd like to be able to generate a report by specifying specific attributes that I wish to summarize over and/or see details of and generate summary information such as total quantity, average price that would allow me to generate queries at a higher level view.

Let me illustrate with some examples using the clothing illustration above.

On a predefined queryset, I'd like to know what quantity of different coloured items were sold on a per attribute basis, in this case I would specify the order and attribute, [Type, Attr1, Attr2] and details on Attr3. I might also choose to ignore another attribute, say Attr4(Long/Short sleeve), ie. both long and short seleve types get aggregated. That probably a poor description but the example output is probably a better way to understand.

Lets define that as:

summarize_over= ['Type', 'Attr1', 'Attr2']
detail_on = ['Attr3']
output_values = ['quantity']
ignore = ['Attr4']

Type    Attr1(Design) Attr2(Size) Attr3(Colour) Quantity
Tshirt  Design A      M           Blue          10 (3 long/7short sleeve, aggregated)
"       "             "           Brown         20 [....etc....]
"       "             "           Yellow        30
                                          Total 60

Tshirt  Design A      S           Blue          20
"       "             "           Green         20
                                          Total 40

Tshirt  Design B      M           Blue          30
"       "             "           White         30
                                          Total 60

Attributes such as 'name' (and some others not shown) are ignored in this summary.

A dictionary output would be fine eg. {'type': 'TShirt', 'Attr1': 'Design A', 'Attr2': 'M', 'Quantity':60, 'colour': {'blue':10,'brown':20','yellow':30}}

I'm quite new to Python so I'm not really familiar with the idioms, tricks, various methods and such that might make solving this quite easy.

I think this could either be done by 1)raw SQL (but I'm not too familiar with multiple GROUP_BY statements and rolling up the ignored columns), or 2)using the values() method and then looping through them looking for similar attributes. It also occurred to me it might be possible to use sets to solve this but I dont know how to approach it from that angle or if it is in fact better.

So far the pseudocode have come up with so far is along the lines:

for each item in list given by queryset
    remove all unwanted attributes and rollup output_values, add to new list
for each attribute in ordered attribute list
   group all items in new list that have the same attr into their own dict and add to a list
   <repeat nesting the list of dicts>
for each dict in list of dict
   flatten each dict and generate output_values

Thoughts? Suggestions? Code snipets? Better approaches?

ansaurus

tags:

views:

answers:

DWH style query in Django to generate summary report at specified levels of detail

related questions