What are correctly content-types for this documents ?
I need to write a simple crawler, that only fetches this kind of files.
Nowadays http://somedomain.com/index.html can serve for example an JPEG file due to mod_rewrite, so I need to check the content-type from the response header and compare it with a list of allowed content-types.
From where I can get such list ?