views:

1310

answers:

4

Hi.

I have a text file of URLs, about 14000. Below is a couple of examples:

http://www.domainname.com/pagename?CONTENT_ITEM_ID=100&param2=123
http://www.domainname.com/images?IMAGE_ID=10
http://www.domainname.com/pagename?CONTENT_ITEM_ID=101&param2=123
http://www.domainname.com/images?IMAGE_ID=11
http://www.domainname.com/pagename?CONTENT_ITEM_ID=102&param2=123

I have loaded the text file into a Python list and I am trying to get all the URLs with CONTENT_ITEM_ID separated off into a list of their own. What would be the best way to do this in Python?

Cheers

+3  A: 
list2 = filter( lambda x: x.find( 'CONTENT_ITEM_ID ') != -1,  list1 )

The filter calls the function (first parameter) on each element of list1 (second parameter). If the function returns true (non-zero), the element is copied to the output list.

The lambda basically creates a temporary unnamed function. This is just to avoid having to create a function and then pass it, like this:

function look_for_content_item_id( elem ):
    if elem.find( 'CONTENT_ITEM_ID') == -1:
        return 0
    return 1
list2 = filter( look_for_content_item_id, list1 )
Graeme Perrow
+17  A: 

Here's another alternative to Graeme's, using the newer list comprehension syntax:

list2= [line for line in file if 'CONTENT_ITEM_ID' in line]

Which you prefer is a matter of taste!

bobince
+1: My taste is to avoid lambdas.
S.Lott
+5  A: 

I liked @bobince's answer (+1), but will up the ante.

Since you have a rather large starting set, you may wish to avoid loading the entire list into memory. Unless you need the whole list for something else, you could use a Python generator expression to perform the same task by building up the filtered list item by item as they're requested:

for filtered_url in (line for line in file if 'CONTENT_ITEM_ID' in line):
   do_something_with_filtered_url(filtered_url)
Blair Conrad
syntax error, unbalanced )
hop
+5  A: 

For completeness; You can also use ifilter. It is like filter, but doesn't build up a list.

from itertools import ifilter

for line in ifilter(lambda line: 'CONTENT_ITEM_ID' in line, urls):
    do_something(line)
MizardX