ansaurus

Question

pythonic way to optimize the logic to filter/extract data from list

Answer 1

+3 A:

The best thing to do would be to turn your data into a dict mapping UID to FLAGS, then searching it will be easy. So the data will look something like this:

{'3254': '', '3304': '', '3236': '\\Deleted', '3237': '-FLAGS \\Seen +FLAGS', '3234': 'seen \\Seen', '3235': '\\Seen', '3430': '\\Seen', '3431': '', '3252': '\\Seen', '3253':'\\Deleted', '3478': '', '3479': '', '3256': '\\Seen', '3481': '', '3480': '', '3318': '\\Seen', '3434': '\\Seen', '3243': '\\Seen', '3242': '\\Seen', '3241': '-FLAGS \\Seen +FLAGS', '3247': '\\Seen', '3245': '\\Seen', '3244': '\\Seen', '3447': '-FLAGS \\Seen +FLAGS'}

You can do this using a Regular Expression to match each entry in the list. If we get the regexp to return two groups in the match we can easily build the dict.

So we end up with something like this:

items = ['1 (UID 3234 FLAGS (seen \\Seen))', '2 (UID 3235 FLAGS (\\Seen))', '3 (UID 3236 FLAGS (\\Deleted))', '4 (UID 3237 FLAGS (-FLAGS \\Seen +FLAGS))', '5 (UID 3241 FLAGS (-FLAGS \\Seen +FLAGS))', '6 (UID 3242 FLAGS (\\Seen))',  '7 (UID 3243 FLAGS (\\Seen))', '8 (UID 3244 FLAGS (\\Seen))',  '9 (UID 3245 FLAGS (\\Seen))', '10 (UID 3247 FLAGS (\\Seen))', '11 (UID 3252 FLAGS (\\Seen))', '12 (UID 3253 FLAGS (\\Deleted))', '13 (UID 3254 FLAGS ())', '14 (UID 3256 FLAGS (\\Seen))', '15 (UID 3304 FLAGS ())', '16 (UID 3318 FLAGS (\\Seen))', '17 (UID 3430 FLAGS (\\Seen))', '18 (UID 3431 FLAGS ())', '19 (UID 3434 FLAGS (\\Seen))', '20 (UID 3447 FLAGS (-FLAGS \\Seen +FLAGS))', '21 (UID 3478 FLAGS ())', '22 (UID 3479 FLAGS ())', '23 (UID 3480 FLAGS ())', '24 (UID 3481 FLAGS ())']

import re
pattern = re.compile(r"\d+ \(UID (\d+) FLAGS \(([^)]*)\)\)")
values = dict(pattern.match(item).groups() for item in items)

We can then easily query the items in values to get what you want:

print "All UIDs:",values.keys()
print "Seen UIDs:",[uid for uid,flags in values.iteritems() if r"\Seen" in flags]
print "Deleted UIDs:",[uid for uid,flags in values.iteritems() if r"\Deleted" in flags]

Dave Webb 2010-10-08 09:08:01

Aren't you iterating over the list of items multiple times to get Seen and Deleted in your solution here?

Noufal Ibrahim 2010-10-08 09:11:34

@Noufal Ibrahim - Yes. I'm assuming the list isn't horribly long so I'm valuing readability over performance.

Dave Webb 2010-10-08 13:40:42

I totally agree with your approach. The questioner asked for a single iteration. That's why I brought it up.

Noufal Ibrahim 2010-10-08 14:37:06

@Noufal Ibrahim - Good point! I hadn't read the question properly.

Dave Webb 2010-10-08 14:59:52

Answer 2

+1 A:

I'm not sure about list comprehensions since those usually map one list to another (using either filtering or mapping). I've not seen them being used to split lists. However, you could do this with a combination of a genexp and a loop in a single iteration. I've blown this up a little so that it's clear.

import re
grepper = re.compile(r'[0-9]+ \(UID (?P<uid>[0-9]+) FLAGS (?P<flags>\(.*\))\)')

t = [..] #your list

items = (grepper.search(m).groupdict() for m in t)

all = []
seen = []
deleted = []
for i in items:
  if "Seen" in i:
    seen.append(i["uid"])
  if "Deleted" in i:
    deleted.append(i["uid"])
  all.append(i["uid"])

You should have your 3 lists now.

Noufal Ibrahim 2010-10-08 09:10:22

You are iterating over the list twice :(

Santiago Lezica 2010-10-08 09:21:30

Where? [15 chars...]

Noufal Ibrahim 2010-10-08 09:59:39

Technically, grepper.search and then for i in items.

Just Some Guy 2010-10-08 11:41:58

The `grepper.search` is a generator expression and it doesn't iterate over `t` in advance. Of course, if you're referring to scanning over the element to match the regular expression, it is an iteration.

Noufal Ibrahim 2010-10-08 14:36:33

Answer 3

+1 A:

all,deleted,seen = [list(filter(None, a)) for a in \
    zip(*map(lambda a: (a[2], '\Deleted' in a[-1] and a[2], '\Seen' in  a[-1] and a[2]), map(lambda a: a.split(' '), items)))]

which will be faster using re or without re - you need to check with timeit !!!

Tumbleweed 2010-10-08 09:31:21

Oh boy. I'm not sure I'd want to see that in production code. :)

Noufal Ibrahim 2010-10-08 15:43:01

ohhh too much lambda flter map zip..... :-)

Tumbleweed 2010-10-09 04:33:08

Answer 4

A:

all=[]
seen=[]
deleted=[]
for item in alist:
    s=item.split(" ",4)
    all.append(s[2])
    if "seen" in s[-1].lower():
        seen.append(s[2])
    elif "delete" in s[-1].lower():
        deleted.append(s[2])

ghostdog74 2010-10-08 09:32:27

Answer 5

A:

The only way I can think of of doing it in one iteration generating the three lists you ask, is by iterating manually. No python magic I can come up with.

You can easily improve this if you know specifics about the format and how it's generated. I don't know why +FLAGS and -FLAGS in some items, for example, and didn't know when to expect parenthesis, so I had to use find(). Also, I could've just split() the string in two, but then again, I don't know what the flag format means,...

def parseList(l):
    lall = []
    lseen = []
    ldeleted = []

    for item in l:
        spl = item.split()

        uid = int(spl[2])

        lall.append(uid)

        for word in spl[4:]:
            if word.find("\Seen") != -1:
                lseen.append(uid)

            elif word.find("\Deleted") != -1:
                ldeleted.append(uid)

    return lall, lseen, ldeleted

Santiago Lezica 2010-10-08 09:38:50

Answer 6

+1 A:

import re

data = ['1 (UID 3234 FLAGS (seen \\Seen))', '2 (UID 3235 FLAGS (\\Seen))',
 '3 (UID 3236 FLAGS (\\Deleted))', '4 (UID 3237 FLAGS (-FLAGS \\Seen +FLAGS))',
 '5 (UID 3241 FLAGS (-FLAGS \\Seen +FLAGS))', '6 (UID 3242 FLAGS (\\Seen))', 
 '7 (UID 3243 FLAGS (\\Seen))', '8 (UID 3244 FLAGS (\\Seen))', 
 '9 (UID 3245 FLAGS (\\Seen))', '10 (UID 3247 FLAGS (\\Seen))', 
'11 (UID 3252 FLAGS (\\Seen))', '12 (UID 3253 FLAGS (\\Deleted))', 
'13 (UID 3254 FLAGS ())', '14 (UID 3256 FLAGS (\\Seen))', '15 (UID 3304 FLAGS ())', 
'16 (UID 3318 FLAGS (\\Seen))', '17 (UID 3430 FLAGS (\\Seen))', 
'18 (UID 3431 FLAGS ())', '19 (UID 3434 FLAGS (\\Seen))', 
'20 (UID 3447 FLAGS (-FLAGS \\Seen +FLAGS))', '21 (UID 3478 FLAGS ())', 
'22 (UID 3479 FLAGS ())', '23 (UID 3480 FLAGS ())', '24 (UID 3481 FLAGS ())']

r = re.compile('\d+\s\(UID\s(?P<uid>\d+)\sFLAGS\s\((?P<data>.*)\)\)')
uid_list = []
seen_uid_list = []
deleted_uid_list = []
for s in data:
    m = r.match(s)
    if m:
        uid_list.append(m.group('uid'))
        if m.group('data').rfind('Seen') > 0: seen_uid_list.append(m.group('uid'))
        if m.group('data').rfind('Deleted') > 0: deleted_uid_list.append(m.group('uid'))

print uid_list
print seen_uid_list
print deleted_uid_list

Fbo 2010-10-08 09:44:19

Answer 7

+1 A:

This one works for your data sample....

uids, seen, deleted = [], [], []
for item in myList:
    uids.append(int(item[7:12]))
    if 'Se' in item[20:]:  seen.append(uids[-1])
    elif 'De' in item[20:]: deleted.append(uids[-1])

2010-10-08 10:31:02

ansaurus

tags:

views:

answers:

pythonic way to optimize the logic to filter/extract data from list

related questions