ansaurus

Question

What's a good data model for cross-tabulation?

Answer 1

A:

Why not store it using HTML Tables? It might not be the best, but you could then, very easily, view it in a browser.

Edit:

I just re-read the question and you're asking for data model, not a storage model. To answer that question...

It all depends on how you're going to be reporting on the data. For example if you're going to be doing a lot of pivoting or aggregation it might make more sense to store it in column major order, this way you can just sum a column to get counts, for example.

It'll help a lot if you explain what kind of information you're trying to extract.

jonnii 2009-06-19 19:29:09

I'm not sure what type of data, actually; the exercises are being parceled out one step at a time (step 1: read tab-separated file and count pairs in columns 1/2 (pivot table? /me wanders off to wikipedia...)Assume that I want to do everything in here: http://en.wikipedia.org/wiki/Cross_tabulation#Statistics_related_to_cross_tabulations

Chris R 2009-06-19 19:41:09

What do you mean by count pairs in a column?

jonnii 2009-06-19 19:44:00

Answer 2

A:

Since this is an early programming exercise for Python, they probably want you to see what Python built-in mechanisms would be appropriate for the initial version of the problem. The dictionary structure seems a good candidate. The first column value from your tab-sep file can be the key into a dictionary. The entry found by that key can itself be a dictionary, whose key is the second column value. The entries of the subdictionary would be a count, initialized to 1 when you add a new subdictionary when a pair is first encountered.

mgkrebbs 2009-06-19 19:57:22

Answer 3

+1 A:

You could use an in-memory sqlite database as a data structure, and define the desired operations as SQL queries.

import sqlite3

c = sqlite3.Connection(':memory:')
c.execute('CREATE TABLE data (a, b, c)')

c.executemany('INSERT INTO data VALUES (?, ?, ?)', [
    (1, None,    1),
    (1,    0,    3),
    (1,    0,    3),
    (1,    2,    3),
    (2, None,    1),
    (2,    0, None),
    (2,    2,    2),
    (2,    2,    4),
    (2,    2, None),
])

# queries
# ...

Roberto Bonvallet 2009-06-20 23:03:24

Answer 4

+2 A:

You may try to look at Andy Mikhailenko's datashaping --- early stage crunching data toolkit in Python

zzr 2009-06-20 23:13:09

ansaurus

tags:

views:

answers:

What's a good data model for cross-tabulation?

related questions