This relates to a project to convert a 2-way ANOVA program in SAS to Python.
I pretty much started trying to learn the language Thursday, so I know I have a lot of room for improvement. If I'm missing something blatantly obvious, by all means, let me know. I haven't got Sage up and running yet, nor numpy, so right now, this is all quite vanilla Python 2.6.1. (portable)
Primary query: Need a good set of list comprehensions that can extract the data in lists of samples in lists by factor A, by factor B, overall, and in groups of each level of factors A&B (AxB).
After some work, the data is in the following form (3 layers of nested lists):
response[a][b][n]
(meaning [a1 [b1 [n1, ... ,nN] ...[bB [n1, ...nN]]], ... ,[aA [b1 [n1, ... ,nN] ...[bB [n1, ...nN]]] Hopefully that's clear.)
Factor levels in my example case: A=3 (0-2), B=8 (0-7), N=8 (0-7)
byA= [[a[i] for i in range(b)] for a[b] in response]
(Can someone explain why this syntax works? I stumbled into it trying to see what the parser would accept. I haven't seen that syntax attached to that behavior elsewhere, but it's really nice. Any good links on sites or books on the topic would be appreciated. Edit: Persistence of variables between runs explained this oddity. It doesn't work.)
byB=lstcrunch([[Bs[i] for i in range(len(Bs)) ]for Bs in response])
(It bears noting that zip(*response)
almost does what I want. The above version isn't actually working, as I recall. I haven't run it through a careful test yet.)
byAxB= [item for sublist in response for item in sublist]
(Stolen from a response by Alex Martelli on this site. Again could someone explain why? List comprehension syntax is not very well explained in the texts I've been reading.)
ByO= [item for sublist in byAxB for item in sublist]
(Obviously, I simply reused the former comprehension here, 'cause it did what I need. Edit:)
I'd like these to end up the same datatypes, at least when looped through by the factor in question, s.t. that same average/sum/SS/et cetera functions can be applied and used.
This could easily be replaced by something cleaner:
def lstcrunch(Dlist):
"""Returns a list containing the entire
contents of whatever is imported,
reduced by one level.
If a rectangular array, it reduces a dimension by one.
lstcrunch(DataSet[a][b]) -> DataOutput[a]
[[1, 2], [[2, 3], [2, 4]]] -> [1, 2, [2, 3], [2, 4]]
"""
flat=[]
if islist(Dlist):#1D top level list
for i in Dlist:
if islist(i):
flat+= i
else:
flat.append(i)
return flat
else:
return [Dlist]
Oh, while I'm on the topic, what's the preferred way of identifying a variable as a list? I have been using:
def islist(a):
"Returns 'True' if input is a list and 'False' otherwise"
return type(a)==type([])
Parting query: Is there a way to explicitly force a shallow copy to convert to a deep? copy? Or, similarly, when copying into a variable, is there a way of declaring that the assignment is supposed to replace the pointer, too, and not merely the value? (s.t.the assignment won't propagate to other shallow copies) Similarly, using that might be useful, as well, from time to time, so being able to control when it does or doesn't occur sounds really nice. (I really stepped all over myself when I prepared my table for inserting by calling: response=[[[0]*N]*B]*A )
Edit: Further investigation lead to most of this working fine. I've since made the class and tested it. it works fine. I'll leave the list comprehension forms intact for reference.
def byB(array_a_b_c):
y=range(len(array_a_b_c))
x=range(len(array_a_b_c[0]))
return [[array_a_b_c[i][j][k]
for k in range(len(array_a_b_c[0][0]))
for i in y]
for j in x]
def byA(array_a_b_c):
return [[repn for rowB in rowA for repn in rowB]
for rowA in array_a_b_c]
def byAxB(array_a_b_c):
return [rowB for rowA in array_a_b_c
for rowB in rowA]
def byO(array_a_b_c):
return [rep
for rowA in array_a_b_c
for rowB in rowA
for rep in rowB]
def gen3d(row, col, inner):
"""Produces a 3d nested array without any naughty shallow copies.
[row[col[inner]] named s.t. the outer can be split on, per lprn for easy display"""
return [[[k for k in range(inner)]
for i in range(col)]
for j in range(row)]
def lprn(X):
"""This prints a list by lines.
Not fancy, but works"""
if isiterable(X):
for line in X: print line
else:
print x
def isiterable(a):
return hasattr(a, "__iter__")
Thanks to everyone who responded. Already see a noticeable improvement in code quality due to improvements in my gnosis. Further thoughts are still appreciated, of course.