ansaurus

Question

Pivoting SQLite table, setwise like SQL should be

Answer 1

+1 A:

A good approach in cases like this is not to push SQL to the point where it becomes convoluted and hard to understand and maintain. Let SQL do what it conveniently can and post-process the query results in Python.

Here's a cut-down version of a simple crosstab generator that I wrote. The full version delivers row/column/grand totals.

You'll note that it has built-in "group by" -- the original use-case was for summarising data obtained from Excel files using Python and xlrd.

The row_key and col_key that you supply don't need to be strings as in the example; they can be tuples -- e.g. (year, week) in your case -- or they could be integers -- e.g. you have a mapping of string column name to integer sort key.

import sys

class CrossTab(object):

    def __init__(
        self,
        missing=0, # what to return for an empty cell. Alternatives: '', 0.0, None, 'NULL'
        ):
        self.missing = missing
        self.col_key_set = set()
        self.cell_dict = {}
        self.headings_OK = False

    def add_item(self, row_key, col_key, value):
        self.col_key_set.add(col_key)
        try:
            self.cell_dict[row_key][col_key] += value
        except KeyError:
            try:
                self.cell_dict[row_key][col_key] = value
            except KeyError:
                self.cell_dict[row_key] = {col_key: value}

    def _process_headings(self):
        if self.headings_OK:
            return
        self.row_headings = list(sorted(self.cell_dict.iterkeys()))
        self.col_headings = list(sorted(self.col_key_set))
        self.headings_OK = True

    def get_col_headings(self):
        self._process_headings()
        return self.col_headings

    def generate_row_info(self):
        self._process_headings()
        for row_key in self.row_headings:
            row_dict = self.cell_dict[row_key]
            row_vals = [row_dict.get(col_key, self.missing) for col_key in self.col_headings]
            yield row_key, row_vals

    def dump(self, f=None, header=None, footer='', ):
        if f is None:
            f = sys.stdout
        alist = self.__dict__.items()
        alist.sort()
        if header is not None:
            print >> f, header
        for attr, value in alist:
            print >> f, "%s: %r" % (attr, value)
        if footer is not None:
            print >> f, footer

if __name__ == "__main__":

    data = [
        ['Rob', 'Morn', 240],
        ['Rob', 'Aft',  300],
        ['Joe', 'Morn',  70],
        ['Joe', 'Aft',   80],
        ['Jill', 'Morn', 100],
        ['Jill', 'Aft',  150],
        ['Rob', 'Aft',   40],
        ['Rob', 'aft',    5],
        ['Dozy', 'Aft',   1],
        # Dozy doesn't show up till lunch-time
        ['Nemo', 'never', -1],
        ]
    NAME, TIME, AMOUNT = range(3)
    xlate_time = {'morn': "AM", "aft": "PM"}

    print
    ctab = CrossTab(missing=None, )
    # ctab.dump(header='=== after init ===')
    for s in data:
        ctab.add_item(
            row_key=s[NAME],
            col_key= xlate_time.get(s[TIME].lower(), "XXXX"),
            value=s[AMOUNT])
        # ctab.dump(header='=== after add_item ===')
    print ctab.get_col_headings()
    # ctab.dump(header='=== after get_col_headings ===')
    for x in ctab.generate_row_info():
        print x

Output:

['AM', 'PM', 'XXXX']
('Dozy', [None, 1, None])
('Jill', [100, 150, None])
('Joe', [70, 80, None])
('Nemo', [None, None, -1])
('Rob', [240, 345, None])

John Machin 2009-12-02 21:54:19

Answer 2

+1 A:

I would first do your query

SELECT COUNT(*), 
       name, 
       strf("%W:%Y", time, "unixepoch") 
  FROM events 
 GROUP BY strf("%W:%Y", time, "unixepoch"), name 
 ORDER BY time

and then do post processing with python.

So you don't have to iterate over 224,000 rows but over 6,000 rows. You can easyly store those 6,000 rows in memory (for processing with Python). I think you can store 224,000 rows in memory too but it takes quite a lot more memory.

However: New versions of sqlite support the aggregation function group_concat. Maybe you can use this function for pivoting with SQL? I can't try because I use an older version.

tuinstoel 2009-12-04 13:44:29

ansaurus

tags:

views:

answers:

Pivoting SQLite table, setwise like SQL should be

related questions