views:

765

answers:

2

Hello there,

we're currently generating all our official documents using XSL-FO transformation using .xml files as input and generating .pdfs & basically all the content within these .xml's is either plain text or xhtml. This works perfectly fine for every-day use-cases, but some of our users refer to Microsoft Excel files which our XSL-Fo transformer (Antenna House) cannot handle natively (and afaik, no other one really does that either).

So what we did or are doing as an intermediate, short-term solution is we create images out of the printareas defined by the users and embedded these images within the .pdfs.

However, since these images are obviously not 'searchable' content wise, we were looking down the post-processing step of OCR'ing these .pdfs etc etc, but to my mind, this all goes to deep into the workaround hole.

I had the idea of converting these .xls files to SpreadsheetML and cover that with our xsl-fo stylesheet but looking at the spreadsheetml specs I kinda gave up that hope, too.. at least without throwing several dozen man-months at the implementation.

So, to come to my actual question, how would or do you handle Microsoft Excel files within your xsl-fo driven document generation?

Cheers & thanks, -J

A: 

You could convert the Excel file to a PDF and then merge the resulting documents. Its not a direct through XSL-FO processing though.

Personally, I wouldn't even try to store the Excel file inside the PDF, I would link to it from the PDF. Excel is closed, very complicated and for the parts that are open (the new XML format), it can still (more often than not) contain binary blobs and its a moving target. I don't think its a war that's worth fighting.

Loki
A: 

Do your .xls files have formulas? If not just transform the xlsx to xsl-fo using the table, row and cell tags in fo. Never looked at the xml for an xlsx file with formulas. Make me wonder if you can save an Excel doc "as visible" with no formulas just the results? Kinda like "print to excel"?

EthR