Is there a good PDF to XHTML strict converter | ansaurus

tags:

views:

1459

answers:

1

+2 Q:

Is there a good PDF to XHTML strict converter

It is basically all in the title, I need to take a bunch of large PDFs and have them in XHTML 1.0 strict, close is good enough, then I can clean it up. Thanks

+2 A:

This is a complex request, because it depends on the PDF itself (and how it was created) whether this can be done or not. As a first attempt, I would try to use adobe's own online PDF to HTML convertor

http://www.adobe.com/products/acrobat/access_onlinetools.html

and then try to fix up the HTML after the fact with something like tidy

http://tidy.sourceforge.net/

If the PDFs were creating by scanning images in then there may be no text associated with them at all - then the best you can do is either cut apart the pages and turn them into JPG documents, or use some sort of OCR software on the PDF itself.

I warn you that even if the PDFs were created by hand and thus have text information in them, there are likely to be a lot of mistakes in the conversion process that will have to be fixed by hand. I work on a product that basically does this process for corporate annual reports/etc and we ultimately settled on cutting up the pages into JPG/GIF images and HTMLing that - as the other processes we tried introduced too many error and it was too labor intensive to fix them all.

TJ 2009-03-10 21:01:31

related questions

What causes java.io.CharConversionException with EOF or isHexDigit messages in Tomcat?

C# VB.NET Conversion

Tabs and spaces conversion

Converting large ASP.NET VB.NET project to C# - incrementally?

How to migrate SVN with history to a new Git repository?

Best way to convert pdf files to tiff files

How to convert numbers between Hex and Decimal in C#

Using C#, what is the most efficient method of converting a string containing binary data to an array of bytes

How do I programmatically convert mp3 to an itunes-playable aac/m4a file?

Using Java JAR file in .NET

Best way to convert text files between character sets?

Convert from scientific notation string to float in C#

BufferedImage in IKVM

What tools exist to convert a Delphi 7 application to C# and the .Net framework?

Is there a tool that can convert common image formats (.bmp, jpg,..) to .emf files?

XAML to SVG?

Backward Converting SQL Databases

Converting latitude/longitude to Alberta 10 TM Projection

Migrating from ASP Classic to .NET and pain mitigation

PHP ToString() equivalent

Java: Best way of converting List<Integer> to List<String>

Easy way for Crystal Reports to MS SQL Server Reporting Services conversion

CVS to SVN conversion and reorganizing branches

C# Convert Integers into Written Numbers

Are there any conversion tools for porting Visual J# code to C#?