tags:

views:

36

answers:

2

i want to convert all my .pdf files from a specific directory to .txt format using the command pdftotext... but i wanna do this using a python script... my script contains:

import glob 
import os

fullPath = os.path.abspath("/home/eth1/Downloads")

for fileName in glob.glob(os.path.join(fullPath,'*.pdf')):
   fullFileName = os.path.join(fullPath, fileName)
   os.popen('pdftotext fullFileName')

but I am getting the following error:

Error: Couldn't open file 'fullFileName': No such file or directory.
+3  A: 

You are passing fullFileName literally to os.popen. You should do something like this instead (assuming that fullFileName does not have to be escaped):

os.popen('pdftotext %s' % fullFileName)

Also note that os.popen is considered deprecated, it's better to use the subprocess module instead:

import subprocess
retcode = subprocess.call(["/usr/bin/pdftotext", fullFileName])

It is also much safer as it handles spaces and special characters in fullFileName properly.

Tamás
hey buddy thanx for the correction..
but one catch... it is converting only the first pdf... whereas i wanted to convert all from that dir
Well, of course place the whole stuff within your `for` loop where you originally had the `os.popen` call.
Tamás
i tried that code as well... but still it is able to convert only the first file in the list....
Is it running `pdftotext` on each `pdf` file? You could tell by putting a `print` statement (or calling the `print()` function) inside the loop, or by adding a counter and printing its value at the end.
martineau
it is calling for all the files
thanx... IT WORKED
+1  A: 

Change the last line to

os.open('pdftotext {0}'.format(fullFileName))

This way the value of fullFileName will be passed, instead of the name.

Space_C0wb0y
hey buddy thanx for the correction..
but one catch... it is converting only the first pdf... whereas i wanted to convert all from that directory