It is basically all in the title, I need to take a bunch of large PDFs and have them in XHTML 1.0 strict, close is good enough, then I can clean it up. Thanks
This is a complex request, because it depends on the PDF itself (and how it was created) whether this can be done or not. As a first attempt, I would try to use adobe's own online PDF to HTML convertor
http://www.adobe.com/products/acrobat/access_onlinetools.html
and then try to fix up the HTML after the fact with something like tidy
If the PDFs were creating by scanning images in then there may be no text associated with them at all - then the best you can do is either cut apart the pages and turn them into JPG documents, or use some sort of OCR software on the PDF itself.
I warn you that even if the PDFs were created by hand and thus have text information in them, there are likely to be a lot of mistakes in the conversion process that will have to be fixed by hand. I work on a product that basically does this process for corporate annual reports/etc and we ultimately settled on cutting up the pages into JPG/GIF images and HTMLing that - as the other processes we tried introduced too many error and it was too labor intensive to fix them all.