tags:

views:

402

answers:

12

I'm writing a Python script that goes through a directory and gathers certain files, but there are a number of files I want excluded that all start the same.

Example code:

for name in files:
   if name != "doc1.html" and name != "doc2.html" and name != "doc3.html":
      print name

Let's say there are 100 hundred HTML files in the directory all beginning with 'doc'. What would be the easiest way to exclude them?

Sorry I'm new to Python, I know this is probably basic.

Thanks in advance.

+5  A: 
for name in files:
    if not name.startswith("doc"):
        print name
Mike
What I'm lookin for would be the opposite of startswith, if there was a method called doesnotstartwith(), I'd be sorted :)
Ruth
@Ruth: not True == False
telliott99
sorry, dunno how I missed that
Ruth
+11  A: 
if not name.startswith('doc'):
     print name

If you have more prefixes to exclude you can even do this:

if not name.startswith(('prefix', 'another', 'yetanother')):
     print name

startswith can accept a tuple of prefixes.

Nadia Alramli
I new it was gonna be embarrasingly simple, thanks a lot
Ruth
yes, startswith() accepts tuples and its ver 2.5 onwards.
ghostdog74
A: 
for name in files:
    if name[0:3] == "doc":
         continue
Stefano Borini
A: 

If all of them start the same (i.e. with "doc") you could use python's string's startswith() method.

for name in files:
    if not name.startswith("doc"):
       print name
TwentyMiles
A: 

Since you didn't say if there are good files starting with 'doc' and ending with '.html' you will have to declare a set of bad filenames and process only files not in that set.

bad_files = set(["doc1.html", "doc2.html", "doc3.html"])

for file in files:
  if file not in bad_files:
    print file

If you need to dynamically change the list of filenames use a list.

honk
thats good too, thanks
Ruth
At least use a set instead of a list. set lookup is O(1), list lookup is O(N).
Nadia Alramli
@Nadia Alramli changed to example like you suggested
honk
You changed it to a tuple, not a set. It should be: bad_files = set(["doc1.html", "doc2.html", "doc3.html"])
Nadia Alramli
A: 
for name in files:
    if not name.startswith("doc"):
        print name
Alex LE
+1  A: 
import os
os.chdir("/home")
for file in os.listdir("."):
   if os.path.isfile(file) and not file.startswith("doc"):
      print file
ghostdog74
+3  A: 

If you find functional programming matches your style better, Python makes it simple to filter lists with the filter() function:

>>> files = ["doc1.html", "doc2.html", "doc3.html", "index.html", "image.jpeg"]
>>> filter_function = lambda name: not name.startswith("doc")
>>> filter(filter_function, files)
['index.html', 'image.jpeg']

Also take a look at apply(), map(), reduce(), and zip().

Troy J. Farrell
But note that `apply()` has been deprecated since 2.3: http://docs.python.org/library/functions.html#apply
Peter Hansen
+1  A: 

looks like this problem might be a better fit for list stuff so like Troy said (Although I prefer putting the function directly into the filter)

filter(lambda filename: not filename.startswith("doc"),files)

or

[filename for filename in files if not filename.startswith("doc")]
Roman A. Taycher
+1  A: 

You could also use a list comprehension.

cleaned_list = [filename for filename in files if not filename.startswith('doc')]
Josh Wright
A: 

An alternate take to a functional solution to this issue, with the advantage of using recent additions to the standard library (using the same example filenames as Troy J. Farrell in another answer) :

>>> import operator, itertools
>>> filter_fun= operator.methodcaller("startswith", "doc")
>>> files = ["doc1.html", "doc2.html", "doc3.html", "index.html", "image.jpeg"]
>>> list(itertools.ifilterfalse(filter_fun, files))
['index.html', 'image.jpeg']

operator.methodcaller called with methodname, [optional arguments] returns a function that, when called with an object obj as its argument, returns the result of obj.methodname(optional_arguments). itertools.ifilterfalse, unlike filter, returns an iterator instead of a list and the filter decision is negated.

ΤΖΩΤΖΙΟΥ
A: 

This is my 2 cents:
A bit of list comprehension.It's always better for effeciency.

file_list = [file for file in directory if not file.startswith(("name1", "name2", "name3"))]
Kurt Campher