views:

795

answers:

6

What is the best way to choose a random file from a directory in Python?

Edit: Here is what I am doing:

import os
import random
import dircache

dir = 'some/directory'
filename = random.choice(dircache.listdir(dir))
path = os.path.join(dir, filename)

Is this particularly bad, or is there a particularly better way?

+1  A: 

Independant from the language used, you can read all references to the files in a directory into a datastructure like an array (something like 'listFiles'), get the length of the array. calculate a random number in the range of '0' to 'arrayLength-1' and access the file at the certain index. This should work, not only in python.

Mork0075
A: 

Check out this page, it might help you achieve what you're trying to do.

http://lost-theory.org/python/randomimg.html

Banang
+1  A: 

If you don't know before hand what files are there, you will need to get a list, then just pick a random index in the list.

Here's one attempt:

import os
import random

def getRandomFile(path):
  """
  Returns a random filename, chosen among the files of the given path.
  """
  files = os.listdir(path)
  index = random.randrange(0, len(files))
  return files[index]

EDIT: The question now mentions a fear of a "race condition", which I can only assume is the typical problem of files being added/removed while you are in the process of trying to pick a random file.

I don't believe there is a way around that, other than keeping in mind that any I/O operation is inherently "unsafe", i.e. it can fail. So, the algorithm to open a randomly chosen file in a given directory should:

  • Actually open() the file selected, and handle a failure, since the file might no longer be there
  • Probably limit itself to a set number of tries, so it doesn't die if the directory is empty or if none of the files are readable
unwind
Yeah, didn't know about that, saw it in another answer though. Good to know, thanks!
unwind
+10  A: 
import os, random
random.choice(os.listdir("C:\\")) #change dir name to whatever


Regarding your edited question: first, I assume you know the risks of using a dircache, as well as the fact that it is deprecated since 2.6, and removed in 3.0.

Second of all, I don't see where any race condition exists here. Your dircache object is basically immutable (after directory listing is cached, it is never read again), so no harm in concurrent reads from it.

Other than that, I do not understand why you see any problem with this solution. It is fine.

Yuval A
Thanks for mentioning that dircache is deprecated.
Cristian Ciupitu
+2  A: 

Language agnostic solution:

1) Get the total no. of files in specified directory.

2) Pick a random number from 0 to [total no. of files - 1].

3) Get the list of filenames as a suitably indexed collection or such.

4) Pick the nth element, where n is the random number.

karim79
A: 

If you want directories included, Yuval A's answer. Otherwise:

import os, random

random.choice([x for x in os.listdir("C:\\") if os.path.isfile(os.path.join("C:\\", x))])
mavnn