views:

750

answers:

3

I am trying to write a PHP script that uses the pdftk app to merge an XFDF with a PDF form and output the merged PDF to the user. According to the pdftk documentation, I can pass the form data in via stdin and have the PDF output to the stdout stream. The normal, file-not-stream way to use pdftk from the command line is:

pdftk blankform.pdf fill_form formdata.xfdf output filledform.pdf

to use streams on the command line, you'd enter:

pdftk blankform.pdf fill_form - output -

I have a couple of problems:

1) I have gotten pdftk to return output via stdout using an xfdf file (instead of stdin) like so:

    exec("pdftk blankform.pdf fill_form formdata.xfdf output -", $pdf_output);
    file_put_contents("filledform.pdf",$pdf_output);

But the pdf that it creates is corrupt, according to Adobe Reader and a quick peek at the file with a text editor shows that, at the very least, it is not setting the line endings where they should be. I have an identical PDF created by pdftk where it output to a file, and the pdf looks fine in the text editor, so I know that it's not pdftk that's outputting bad data.

2) I can not for the life of me figure out how to set the stdin stream in PHP so that I can use that stream as my input for pdftk. From what I'm reading on the PHP documentation, stdin is read-only, so how does anything ever get into that stream?

Ideally, I would like to keep this really simple and avoid using proc_open(). I attempted to use that function and wasn't very sucessful, which is probably my fault, not the function's, but really my goals are simple enough I'd rather avoid using robust functions I don't need.

Ideally my code would look something like:

 $form_data_raw = $_POST;
 $form_data_xfdf = raw2xfdf($form_data_raw); //some function that turns HTML-form data to XFDF

 $blank_pdf_form = "blankform.pdf";

 header('Content-type: application/pdf');
 header('Content-Disposition: attachment; filename="output.pdf"');

 passthru("pdftk $blank_pdf_form fill_form $form_data_xfdf output -);

Just a heads up, it is possible to put the actual xml string in the command line, but I've had very unreliable results with this.

Edit

With much help, I now understand that my real question was "how can pipe a variable to a command line execution in PHP". Apparently proc_open is the best way to go, or at least the most straightforward. Since it took me forever to figure this out and since my research on Google suggests others may be struggling, I'll post the code that specifically worked for my problem:

$blank_pdf_form = "blankform.pdf";
$cmd = "pdftk $blank_pdf_form fill_form - output -";

$descriptorspec = array(
   0 => array("pipe", "r"),
   1 => array("pipe", "w")
);

$process = proc_open($cmd, $descriptorspec, $pipes);

if (is_resource($process)) {

    //row2xfdf is made-up function that turns HTML-form data to XFDF
    fwrite($pipes[0], raw2xfdf($_POST));
    fclose($pipes[0]);

    $pdf_content = stream_get_contents($pipes[1]);
    fclose($pipes[1]);

    $return_value = proc_close($process);

    header('Content-type: application/pdf');
    header('Content-Disposition: attachment; filename="output.pdf"');
    echo $pdf_content;
}
A: 

The method I use to send fdf input is something like this:

$command = "echo '$fdf_data' | pdftk $infile fill_form - output $outfile"; shell_exec($command);

But like you, I am not able to get reliable output to stdout from pdftk. I don't know why.

bmb
This isn't an answer.
Michael Mior
A: 

1) Why are you outputting to standard out and then putting that stuff into a file? Why not just have pdftk dump to the file, i.e.

exec("pdftk blankform.pdf fill_form formdata.xfdf output filledform.pdf");

2) Use proc_open(). Feel free to post any problems you have with the function.

Michael Mior
I am writing to a file for testing purposes. You'll notice that in my final "what I want this to look like" bit, that I'm outputting to the browser after setting the headers. The end goal is to not need any files stored on the server other than the blank pdf form.
Anthony
Ah, sorry. As you can tell, I didn't read your question carefully enough. However, if you do it as I wrote above, do you get a correctly formatted PDF?
Michael Mior
Actually, no. I could have sworn I tried something almost identical to your example when I was testing this yesterday, but today, I get the corrupted pdf with the missing line endings.
Anthony
No, wait. It seems that the stupid program * coughDreamWeavercough * wasn't updating my changes to the server. Using all file locations with no streams via exec does work. I have a sneaking suspicion that the problem I'm having with the corrupted PDF output is due to exec returning the output as an array. I'll let you know.
Anthony
Ah, success. It looks like it was due to the variable type. When I changed it to `file_put_contents("filledform",implode("\n",$pdf_output));` It worked great. Of course, it threw a fit when I tried either `\r` or `\r\n`, so anybody looking to do this for their code be aware.
Anthony
Now to figure out the right way to not use files at all.
Anthony
+1  A: 

I'm not sure about what you're trying to achieve. You can read stdin with the URL php://stdin. But that's the stdin from the PHP command line, not the one from pdftk (through exec).

But I'll give a +1 for proc_open()


<?php

$cmd = sprintf('pdftk %s fill_form %s output -','blank_form.pdf', raw2xfdf($_POST));

$descriptorspec = array(
   0 => array("pipe", "r"),  // stdin is a pipe that the child will read from
   1 => array("pipe", "w"),  // stdout is a pipe that the child will write to
   2 => null,
);

$process = proc_open($cmd, $descriptorspec, $pipes);

if (is_resource($process)) {
    // $pipes now looks like this:
    // 0 => writeable handle connected to child stdin
    // 1 => readable handle connected to child stdout

    fwrite($pipes[0], stream_get_contents(STDIN)); // file_get_contents('php://stdin')
    fclose($pipes[0]);

    $pdf_content = stream_get_contents($pipes[1]);
    fclose($pipes[1]);

    // It is important that you close any pipes before calling
    // proc_close in order to avoid a deadlock
    $return_value = proc_close($process);


    header('Content-type: application/pdf');
    header('Content-Disposition: attachment; filename="output.pdf"');
    echo $pdf_content;
}
?>
Savageman
I guess what I'm not understanding is what pdftk is using when I tell it to use stdin and how to write to that stream before calling pdftk.
Anthony
If you want to modify what is being used as stdin for pdftk, you need to use `proc_open()`. You can't write to `stdin`, so it's really only useful when PHP is being run from the command line.
Michael Mior
That's right. You can't pass something to stdin using exec or similair functions. I'm just editing my answer to show you what it could look like with `proc_open()`.
Savageman
Yeah, after playing around with this more, I can tell I had the wrong idea on what stream wrapper are and how to use them. I thought I could set php://stdin and then use it at the command line via exec as the stdin.
Anthony
I think your answer was slightly based on my misunderstanding, so it was more sophisticated then what I ended up doing, but I wouldn't have gotten there without this, for sure. I'm posting my slightly modified version to the end of my answer.
Anthony