views:

204

answers:

3

This relates to a project to convert a 2-way ANOVA program in SAS to Python.

I pretty much started trying to learn the language Thursday, so I know I have a lot of room for improvement. If I'm missing something blatantly obvious, by all means, let me know. I haven't got Sage up and running yet, nor numpy, so right now, this is all quite vanilla Python 2.6.1. (portable)

Primary query: Need a good set of list comprehensions that can extract the data in lists of samples in lists by factor A, by factor B, overall, and in groups of each level of factors A&B (AxB).

After some work, the data is in the following form (3 layers of nested lists):

response[a][b][n]

(meaning [a1 [b1 [n1, ... ,nN] ...[bB [n1, ...nN]]], ... ,[aA [b1 [n1, ... ,nN] ...[bB [n1, ...nN]]] Hopefully that's clear.)

Factor levels in my example case: A=3 (0-2), B=8 (0-7), N=8 (0-7)

byA= [[a[i] for i in range(b)] for a[b] in response]

(Can someone explain why this syntax works? I stumbled into it trying to see what the parser would accept. I haven't seen that syntax attached to that behavior elsewhere, but it's really nice. Any good links on sites or books on the topic would be appreciated. Edit: Persistence of variables between runs explained this oddity. It doesn't work.)

byB=lstcrunch([[Bs[i] for i in range(len(Bs)) ]for Bs in response])

(It bears noting that zip(*response) almost does what I want. The above version isn't actually working, as I recall. I haven't run it through a careful test yet.)

byAxB= [item for sublist in response for item in sublist]

(Stolen from a response by Alex Martelli on this site. Again could someone explain why? List comprehension syntax is not very well explained in the texts I've been reading.)

ByO= [item for sublist in byAxB for item in sublist]

(Obviously, I simply reused the former comprehension here, 'cause it did what I need. Edit:)

I'd like these to end up the same datatypes, at least when looped through by the factor in question, s.t. that same average/sum/SS/et cetera functions can be applied and used.

This could easily be replaced by something cleaner:

def lstcrunch(Dlist):
    """Returns a list containing the entire
    contents of whatever is imported,
    reduced by one level.

    If a rectangular array, it reduces a dimension by one.
    lstcrunch(DataSet[a][b]) -> DataOutput[a]
    [[1, 2], [[2, 3], [2, 4]]] -> [1, 2, [2, 3], [2, 4]]
    """
    flat=[]
    if islist(Dlist):#1D top level list
        for i in Dlist:
            if islist(i):
                flat+= i
            else:
                flat.append(i)
        return flat
    else:
        return [Dlist]

Oh, while I'm on the topic, what's the preferred way of identifying a variable as a list? I have been using:

def islist(a):
    "Returns 'True' if input is a list and 'False' otherwise"
    return type(a)==type([])

Parting query: Is there a way to explicitly force a shallow copy to convert to a deep? copy? Or, similarly, when copying into a variable, is there a way of declaring that the assignment is supposed to replace the pointer, too, and not merely the value? (s.t.the assignment won't propagate to other shallow copies) Similarly, using that might be useful, as well, from time to time, so being able to control when it does or doesn't occur sounds really nice. (I really stepped all over myself when I prepared my table for inserting by calling: response=[[[0]*N]*B]*A )

Edit: Further investigation lead to most of this working fine. I've since made the class and tested it. it works fine. I'll leave the list comprehension forms intact for reference.

def byB(array_a_b_c):
    y=range(len(array_a_b_c))
    x=range(len(array_a_b_c[0]))
    return [[array_a_b_c[i][j][k]
    for k in range(len(array_a_b_c[0][0]))
    for i in y]
    for j in x]


def byA(array_a_b_c):
    return [[repn for rowB in rowA for repn in rowB] 
    for rowA in array_a_b_c]

def byAxB(array_a_b_c):
    return [rowB for rowA in array_a_b_c 
    for rowB in rowA]

def byO(array_a_b_c):
    return [rep
    for rowA in array_a_b_c
    for rowB in rowA
    for rep in rowB]


def gen3d(row, col, inner):
"""Produces a 3d nested array without any naughty shallow copies.

[row[col[inner]] named s.t. the outer can be split on, per lprn for easy display"""
    return [[[k for k in range(inner)]
    for i in range(col)]
    for j in range(row)]

def lprn(X):
    """This prints a list by lines.

    Not fancy, but works"""
    if isiterable(X):
        for line in X: print line
    else:
        print x

def isiterable(a):
    return hasattr(a, "__iter__")

Thanks to everyone who responded. Already see a noticeable improvement in code quality due to improvements in my gnosis. Further thoughts are still appreciated, of course.

+5  A: 

byAxB= [item for sublist in response for item in sublist] Again could someone explain why?

I am sure A.M. will be able to give you a good explanation. Here is my stab at it while waiting for him to turn up.

I would approach this from left to right. Take these four words:

for sublist in response

I hope you can see the resemblance to a regular for loop. These four words are doing the ground work for performing some action on each sublist in response. It appears that response is a list of lists. In that case sublist would be a list for each iteration through response.

for item in sublist

This is again another for loop in the making. Given that we first heard about sublist in the previous "loop" this would indicate that we are now traversing through sublist, one item at a time. If I were to write these loops out without comprehensions it would look like this:

for sublist in response:
    for item in sublist:

Next, we look at the remaining words. [, item and ]. This effectively means, collect items in a list and return the resulting list.

Whenever you have trouble creating or understanding list iterations write the relevant for loops out and then compress them:

result = []

for sublist in response:
    for item in sublist:
        result.append(item)

This will compress to:

[
    item 
    for sublist in response
    for item in sublist
]

List comprehension syntax is not very well explained in the texts I've been reading

Dive Into Python has a section dedicated to list comprehensions. There is also this nice tutorial to read through.

Update

I forgot to say something. List comprehensions are another way of achieving what has been traditionally done using map and filter. It would be a good idea to understand how map and filter work if you want to improve your comprehension-fu.

Manoj Govindan
That was quite helpful, thanks. I'm going to go check out your links momentarily.
The Nate
+1  A: 

For the copy part, look into the copy module, python simply uses references after the first object is created, so any change in other "copies" propagates back to the original, but the copy module makes real copies of objects and you can specify several copy modes

SpectralAngel
Noted. I'll check it out.
The Nate
A: 

It is sometimes kinky to produce right level of recursion in your data structure, however I think in your case it should be relatively simple. To test it out while we are doing we need one sample data, say:

data = [ [a,
          [b,
           range(1,9)]]
         for b in range(8)
         for a in range(3)]
print 'Origin'
print(data)
print 'Flat'
## from this we see how to produce the c data flat
print([(a,b,c) for a,[b,c] in data])    
print "Sum of data in third level = %f" % sum(point for point in c for a,[b,c] in data)
print "Sum of all data = %f" % sum(a+b+sum(c) for a,[b,c] in data)

for the type check, generally you should avoid it but if you must, as when you do not want to do recursion in string you can do it like this

if not isinstance(data, basestring) : ....

If you need to flatten structure you can find useful code in Python documentation (other way to express it is chain(*listOfLists)) and as list comprehension [ d for sublist in listOfLists for d in sublist ]:

from itertools import flat.chain
def flatten(listOfLists):
    "Flatten one level of nesting"
    return chain.from_iterable(listOfLists)

This does not work though if you have data in different depths. For heavy weight flattener see: http://www.python.org/workshops/1994-11/flatten.py,

Tony Veijalainen
I will definitely add isinstance to my list of tools. Today I started using hasattr(whatever,"__iter__") to be a touch more agnostic about type. The data you show does seem to retain all the important information, but I'm going to need some time cogitating to fully grasp what you did there, I think.Thanks for the response.
The Nate
Be carefull with strings though if you write flattening function. String is iterable, not unbreakable like Lisp atoms. I also saw in list of functions in operator module that it has methods: 'isCallable', 'isMappingType', 'isNumberType', 'isSequenceType'
Tony Veijalainen