views:

43

answers:

1

hi, im doing a project as part of academic programme.Im doing this in linux platform.here i wanted to create a application which retrieve some information from some pdf files .for eg i have pdfs of subject2,subject1,in both the whole pdf is divided in to 4 modules and i want to get the data of module 1 from pdf..for this purpose my tutor told me to use pdftohtml application and convert pdf files to html and jpeg images.now i want to create a Python script which will combine the pages(which have been coverted in to jpeg images) under module 1 and merge it into a single file and then i will convert it back to pdf . how can i do this?.if anyone can provide any such python script which have done any functions similar to this then it will be very helpful.

.... thanks in advance

+1  A: 

Not exactly knowing what you mean my sequence - ImageMagick, esp. its 'montage' is probably the tool you need. IM has python interface, too, altough I have never used it. EDIT: As after your edit I do not get the point of this any more, I cannot recommend anything, either. :(

Mart Oruaas
what sequence and merge means?I have number of jpeg images(which is produced using pdftohtml tool by converting a pdf file). Now i need to take some images based on some condition(which i have vaguely specified in original post) and group it in to a single file and convert that file back to pdfi believe the idea is somewhat clear,....thanks for reply
DILi