views:

258

answers:

4

I'm using PDF::FromHTML to generate a PDF from HTML(as the name of the module would imply) :)

I'm able to run it from the command line fine and receive my expected output - however when I use the exact same code in my web app, the output doesn't look correct at all - text is appearing in the wrong places and not all of it is appearing.

I am using the exact same input file in the web app and on the command line - some reason when it's called from inside my web app, it's appearing differently.

Here is the code:

use PDF::FromHTML;

my $filename = '/tmp/backup.html';


my $font      = 'Helvetica';
my $encoding  = 'utf-8';
my $size      = 12;
my $landscape = 0;

my $pdf = PDF::FromHTML->new(
 encoding => $encoding,
);


my $input_file = $filename;
my $output_file = "$input_file.pdf";


warn "$input_file\n$output_file\n";
$pdf->load_file($input_file);
$pdf->convert(
  Font        => $font,
  LineHeight  => $size,
  Landscape   => $landscape,
);

$pdf->write_file($output_file);

The web app code is the same, just with that block thrown into a method.

I have looked at the two generated PDF files in a hex editor and found the differences. They're the same until a block whose intent I can't understand...

Good PDF contents at that block:

/Length 302 >> stream
**binary data
endstream endobj
10 0 obj << /Filter [ /FlateDecode ] /Length 966

Bad PDF contents:

/Length 306 >> stream
**binary data
endstream endobj
10 0 obj << /Filter [ /FlateDecode ] /Length 559

As you can see, the length of the content contained in here differs, as does the binary data contained in that stream(the length 302 vs length 306 one) as well as the next stream(the length 966 vs 559 one).

I'm not entirely sure what could be causing this discrepancy, the only thing I can think of is some sort of difference in the environments when I'm running this as my user on the command line versus running it from the web app. I don't know where I should start with debugging that, however.

+4  A: 

In general, the CGI environment is different than your interactive login environment just like someone else's login environment is different than yours. The trick is to figure out what thing you have set or unset on your command line that makes your program work.

You might want to see my Troubleshooting Perl CGI scripts for a step-by-step method to track down these problems.

Some things to investigate:

  • Is your CGI script running on the same platform (i.e. is it a Windows versus Unix sorta thing)
  • What's different about the environment variables?
  • Does your CGI script use the same version of Perl?
  • Does that perl binary have different compilation options?
  • Are you using the same versions of the modules?
  • If some of those modules use external libraries, are they the same?

A useful technique is to make your login shell temporarily have the same setup as your CGI environment. Once you do that, you should get the same results on the command line even if those results are wrong. However, once you get the wrong results you can start tracking it down from the command line.

Good luck.

brian d foy
Thanks, that's good advice for general Perl debugging.I'm running them as the same user on the same box with the same perl binary. I'm thinking it's probably an issue with differences in the modules being included.
ashgromnies
Almost certainly, yes. The difference in sizes suggests to me that you are using two different versions of a compression library. Try printing @INC in both environments and looking for differences.
Chris Dolan
@Chris Dolan: setting the lib paths of the script being run from the command line to the lib paths of the web application(verifying both by checking @INC) still resulted in a successful output file being generated.
ashgromnies
I'm wondering (as Chris probably is) if this is maybe a C library problem ... eg: the compression library is pulling in a different libz.so depending on the LD_LIBRARY_PATH.
Sharkey
+1  A: 

Couple of suggestions:

  • PDF::FromHTML uses PDF::Writer, which in turn uses a PDF rendering library as a plugin (think the options are PDFLib and some others). Are the same version of the libraries available as plugins?
  • Does your HTML input file have a CSS file that you haven't uploaded?
  • Try setting the other PDF::FromHTML variables: PageWidth, PageResolution, PageSize etc

Is the ordering of the output text different or merely the postions? If it's position then try setting the PageWidth etc as the library being used (PDFLib or whatever) may pick different defaults between the two environments. If ordering is wrong then I have no idea.

The two PDF blocks you posted don't really show much - just shows that the compressed sections are of different sizes. There's nothing actually wrong syntactically with either example.

Steve Claridge
I am not using any CSS. Like I said, I am using the same input to PDF::FromHTML from the script and from the web application.If I backtick the script and run that in the web application it renders the correct PDF.
ashgromnies
What is really alarming to me is the fact that the two compressed sections ARE of different sizes. They should be exactly the same. Both PDF files load, the issue is just that when I generate it with the web app(with the SAME INPUT) it looks different.
ashgromnies
I understand your problem but you are not saying how the two outputs differ. Do you control the environment in which the web app runs and can confirm that Perl and associated libraries are identical to your deve enviornment ones?
Steve Claridge
A: 

Maybe there is some encoding problem? Have a look at the headers.

Vili
A: 

I would take a good look at what user the Web server is running as and what that users environment variables look like. Also pay attention to that users permissions on the directories. Also are there other things limiting the web server users such as SElinux on a linux box?

trent