Hi ,
I have a question about crawling the files that are accessable via http. I am talking about pdf files.
I am not able to do it using Nutch 1.0. the protocol I am using is similar to this http://www.ontla.on.ca/library/repository/ser/140213/2006/
but I do not see any data fetched. the files generated are 1kb.
But on Local file system with file protocol I am able to do it.
Can someone show me some pointers please.
thanks