ansaurus

Question

How to use re to search for items in one list inside another list in Python

Answer 1

A:

Here's another way to do it that is likely faster than Alex's original code:

item_list = ['item1', 'item2']
search_list = ['item1.exe', 'item2.pdf']
matches = []
for item in item_list:
    for filename in search_list:
        if filename.partition(".")[0] == item:
            matches.append((item,filename))

Series8217 2009-12-09 06:52:28

Answer 2

+2 A:

You could combine all the items into one regexp like this which will be more efficient

import re
item_list = ['item1', 'item2']
regex = re.compile("^("+"|".join(item_list)+")\.")
search_list = ['item1.exe', 'item2.pdf']
matches = []
for file in search_list:
    match = regex.match(file)
    if match:
        matches.append((match.group(1), file))

A better solution might be to parse the filenames using os.path functions though to parse out the basenames and look for them in a set.

Nick Craig-Wood 2009-12-09 06:55:28

If the items can contain regex-special punctuation like `.`, you'll need to `re.escape` each item in `item_list` before joining.

bobince 2009-12-09 14:12:35

Thanks Nick, this post deserves a hundred useful votes! Found the timeit module and ran tests based on my original algorithm, Dave Kirby's algorithm, and yours. The results were as follows:alex_k : 15.93dave_kirby : 6.98nick_craig_wood : 0.24

Alex 2009-12-10 00:16:05

Answer 3

+2 A:

Use splitext to get the filename without the extension:

import os.path

for item in item_list:
    for filename in search_list:
        if item == os.path.splitext(filename)[0]:
            matches.append((item, file))

It's more correct, but it's also easier to understand what your intention is from reading the code. Alternatively, if you want to allow foo to match foo.bar.txt then use filename.startswith(item + '.') instead.

Mark Byers 2009-12-09 07:01:28

+1 for splitext. Accurately does what it says; more readable than regex.

bobince 2009-12-09 14:27:57

Answer 4

A:

I think you should use .rsplit(".",1) for that purpose, regex aren't overkill?

>>> item_list = ['item1', 'item2','item3']
>>> search_list = ['item1.exe', 'item2.pdf','item9999.txt']
>>>
>>> [(x.rsplit(".",1)[0],x) for x in search_list if x.rsplit(".",1)[0] in item_list]
[('item1', 'item1.exe'), ('item2', 'item2.pdf')]

or with for loop

matches=[]
for x in search_list:
    y=x.rsplit(".",1)[0]
    if y in item_list:
        matches.append((y,x))

S.Mark 2009-12-09 07:08:14

Answer 5

+1 A:

You do not need to use a regex for this since you are doing exact string matches (no wildcards, groups etc) - you can use str.startswith(..) instead. This is equivalent to your code:

for item in item_list:
    match = item + "."
    for file in search_list:
        if file.startswith(match)
            matches.append((item, file))

However Nick Craig-Wood's suggestion of compiling all the matches into a single regex may be more efficient - I suggest you benchmark both if speed is an issue.

Dave Kirby 2009-12-09 07:19:06

Any tools/commands to help benchmark would be a +1!

Alex 2009-12-09 22:32:25

Answer 6

A:

>>> for file in search_list:
...  tomatch=file.split(".")[0]
...  if tomatch in item_list:
...     found=item_list.index(tomatch)
...     matches.append( ( file, item_list[found] ) )
...
>>> print matches
[('item1.exe', 'item1'), ('item2.pdf', 'item2')]
>>>

No need for regex.

2009-12-09 07:21:11

Answer 7

+1 A:

Avoid re unless you really need it. For simple string matching, you don't really need it.

Mark Byers's answer duplicates the original behaviour of keeping matches in item_list-order. If you don't need that, you could do it even more simply/quickly:

for file in search_list:
    item= os.path.splitext(file)[0]
    if item in item_list:
        matches.append((item, file))

If you don't need to keep the (item) matched either (since it's redundant from the filename anyway), you've got a one-liner:

matches= [file for file in search_list if os.path.splitext(file)[0] in item_list]

bobince 2009-12-09 14:24:19

They do need to be matched, but thanks for giving a great example of a one-liner!

Alex 2009-12-09 22:51:50

ansaurus

tags:

views:

answers:

How to use re to search for items in one list inside another list in Python

related questions