ansaurus

Question

Running a command line from python and piping arguments from memory

Answer 1

+1 A:

Popen.communicate from subprocess takes an input parameter that is used to send data to stdin, you can use that to input your data. You also get the output of your program from communicate, so you don't have to write it into a file.

The documentation for communicate explicitly warns that everything is buffered in memory, which seems to be exactly what you want to achieve.

Fabian 2010-09-19 09:50:19

Answer 2

+1 A:

with Popen.communicate:

import subprocess
out, err = subprocess.Popen(["pdftotext", "-", "-"], stdout=subprocess.PIPE).communicate(pdf_data)

tokland 2010-09-19 09:50:47

Another problem, not directly related is converting the memory variable into a "seekable stream". Because right now I am getting an error saying "Error: Document base stream is not seekable". I presume there is some method/module I can pass the pdf_data to to make it a seekable stream?

Chaitanya 2010-09-19 10:33:37

@Chaitanya. This is a pdftotext regression bug which was already solved: http://bugs.freedesktop.org/show_bug.cgi?id=7334, update your poppler package. BTW, you cannot build a "seekable stream" in this scenario, the output is being written to process file-descriptor, nothing Python can do there.

tokland 2010-09-19 12:00:54

So its complaining about the output from pdftotext not being seekable then, not the input data file. Fedora 12's software updater doesn't seem to update it, and neither does doing 'su -c 'yum update poppler'. I've downloaded and upzipped the version 0.14 from here, http://poppler.freedesktop.org/ but can't seem to install it (make and make install fail)

Chaitanya 2010-09-19 14:42:22

Sorry to keep coming back to this, but how to I pass optional parameters using POpen. I am using a temporary file as suggested by greggo below. I want to preserve the layout by running "pdftotext -layout". I tried replacing the "-" in Popen with "-layout", replacing "pdftotext" with "pdftotext -layout" and passing it into communicate, etc. None of it works. I just get an empty text back.

Chaitanya 2010-10-09 01:29:46

Answer 3

+1 A:

os.tmpfile is useful if you need a seekable thing. It uses a file, but it's nearly as simple as a pipe approach, no need for cleanup.

tf=os.tmpfile()
tf.write(...)
tf.seek(0)
subprocess.Popen(  ...    , stdin = tf)

This may not work on Posix-impaired OS 'Windows'.

greggo 2010-09-19 14:16:27

This works too. To expand for future users, my 4th line above is this "out, err = subprocess.Popen(["pdftotext", "-", "-"], stdin = tf, stdout=subprocess.PIPE ).communicate()" . After this, the variable 'out' contains the pdf in text format.

Chaitanya 2010-09-19 14:57:13

ansaurus

tags:

views:

answers:

Running a command line from python and piping arguments from memory

related questions