hi, im doing a project as part of academic programme.Im doing this in linux platform.here i wanted to create a application which retrieve some information from some pdf files .for eg i have pdfs of subject2,subject1,in both the whole pdf is divided in to 4 modules and i want to get the data of module 1 from pdf..for this purpose my tutor told me to use pdftohtml application and convert pdf files to html and jpeg images.now i want to create a Python script which will combine the pages(which have been coverted in to jpeg images) under module 1 and merge it into a single file and then i will convert it back to pdf . how can i do this?.if anyone can provide any such python script which have done any functions similar to this then it will be very helpful.
.... thanks in advance