ansaurus

Question

Reading and Grouping a List of Data in Python

Answer 1

+1 A:

You're off to a good start by noticing that your original solution may work but lacks elegance.

You should parse the string in a loop, creating a new variable for each line. Here's some sample code:

import re

s = """<1x>begins
<2x>value-1
<3x>value-2
<4x>value-3
 some indeterminate number of other values
<1y>next observation begins
<2y>value-1
<3y>value-2
<4y>value-3"""
firstMatch = re.compile('^\<1x')
numMatch = re.compile('^\<(\d+)')
listIneed = []
templist = None
for line in s.split():
        if firstMatch.match(line):
                if templist is not None: 
                        listIneed.append(templist)
                templist = [line]
        elif numMatch.match(line):
            #print 'The matching number is %s' % numMatch.match(line).groups(1)
            templist.append(line)
if templist is not None: listIneed.append(templist)

print listIneed

RossFabricant 2009-03-27 03:49:53

I appreciate your creativity but I think my solution is cheaper to implement though I am not absolutely sure. It took less than two seconds to run against about 750K lines

PyNEwbie 2009-03-27 04:37:43

If by "cheaper to implement" you mean your approach runs faster, than you are probably right. Your approach will also break if there are 5 variables instead of 4. If my solution doesn't need to work, I can make it run as fast as you want.

RossFabricant 2009-03-27 14:36:44

Well one problem with my solution is that if there is ever the case that the variables are out of order then it won't work. But my solution will work fine with five or n variables, I just have to define them. I guess I was wrong there is not a one liner. I learned a lot from your code

PyNEwbie 2009-03-27 15:27:02

Answer 2

+1 A:

If you want to pick out the second, third, and fourth elements of each sublist, this should work:

listINeed = [sublist[1:4] for sublist in biglist]

David Zaslavsky 2009-03-27 03:50:57

Well I can't be sure which ones they are and the entire thing I posted up there is the sublist so I might need 1 or ten units that are only indicated by there names

PyNEwbie 2009-03-27 04:00:32

Then you need to be more specific in your question... I really can't understand what exactly it is you're trying to do.

David Zaslavsky 2009-03-27 04:08:38

Answer 3

+1 A:

itertools.groupby() can get you by.

itertools.groupby(biglist, operator.itemgetter(2))

Ignacio Vazquez-Abrams 2009-03-27 03:56:19

Answer 4

A:

If I've understood your question correctly:

import re
def getlines(ori):
    matches = re.finditer(r'(<([1-4])[a-zA-Z]>.*)', ori)
    mainlist = []
    sublist = []
    for sr in matches:
        if int(sr.groups()[1]) == 1:
            if sublist != []:
                mainlist.append(sublist)
            sublist = []
        else:
            sublist.append(sr.groups()[0])
    else:
        mainlist.append(sublist)
    return mainlist

...would do the job for you, if you felt like using regular expressions.

The version below would break all of the data down into sublists (not just the first four in each grouping) which might be more useful depending what else you need to do to the data. Use David's listINeed = [sublist[1:4] for sublist in biglist] to get the first four results from each list for the specific task above.

import re
def getlines(ori):
    matches = re.finditer(r'(<(\d*)[a-zA-Z]>.*)', ori)
    mainlist = []
    sublist = []
    for sr in matches:
        if int(sr.groups()[1]) == 1:
            print "1 found!"
            if sublist != []:
                mainlist.append(sublist)
            sublist = []
        else:
            sublist.append(sr.groups()[0])
    else:
        mainlist.append(sublist)
    return mainlist

mavnn 2009-03-27 13:33:48

ansaurus

tags:

views:

answers:

Reading and Grouping a List of Data in Python

related questions