I need a regular expression that detects if a given string is a url to a [potential] file ie
/file.pdf
http://www.whatever.com/file.docx
../file.longfileextension
Thanks guys
views:
305answers:
3You might inspect the end to see if it looks like a file extension, but URLs don't actually map to files; what if the URL is rewritten?
If you wanted to determine what a given URL resolved to, you could issue a HEAD
request and inspect the content-type
and content-disposition
headers to see if the content is of a type that implies an underlying file, but even that's not bulletproof, since images, PDF, etc. could all be dynamically generated.
You can't.
E.g. http://example.com/files/readme
might be a text file or a folder (*nix style OSs conventionally would not add a .txt
extension).
Even if there is a file extension, there may be no file, with server side code processing the URL to create content (e.g. an ASP.NET HttpHandler).
Why are you trying to do this? If you wish to detect if the URL would return a file, you can guess with the extension (remembering that applications are free to invent their own), but the only real way is to perform a HTTP HEAD request and check the returned content type (but again you have the same problem with what is a valid file MIME type).
This expression will do the job.
^.*/(?<filename>[^/]+?\.[^/]+)$
^ Anchor to the begining of the string .* Any character zero or more times / Slash (?<filename> Named group 'filename' [^/]+? Not a slash at least once and captured lazily \. One file extension separator (dot) [^/]+ Not a slash at leats once ) End of named group $ Anchor to the end of the string