tags:

views:

5107

answers:

4

I need a short code snippet to get a directory listing from an HTTP server.

Thanks

+2  A: 

Basic undersatnding:

Directory listings are just HTML pages generated by a web server. Each web server generates these HTML pages in its own way because there is no standard way for a web server to list these directories.

The best way to get a directory listing, is to simply do an HTTP request to the URL you'd like the directory listing for and to try to parse and extract all of the links from the HTML returned to you.

To parse the HTML links please try to use the HTML Agility Pack.

Directory Browsing:

The web server you'd like to list directories from must have directory browsing turned on to get this HTML representation of the files in its directories. So you can only get the directory listing if the HTTP server wants you to be able to.

A quick example of the HTML Agility Pack:

HtmlDocument doc = new HtmlDocument();
doc.Load(strURL);
foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a@href")
{
HtmlAttribute att = link"href";
//do something with att.Value;
}

Cleaner alternative:

If it is possible in your situation, a cleaner method is to use an intended protocol for directory listings, like the File Transfer Protocol (FTP), SFTP (FTP like over SSH) or FTPS (FTP over SSL).

What if directory browsing is not turned on:

If the web server does not have directory browsing turned on, then there is no easy way to get the directory listing.

The best you could do in this case is to start at a given URL, follow all HTML links on the same page, and try to build a virtual listing of directories yourself based on the relative paths of the resources on these HTML pages. This will not give you a complete listing of what files are actually on the web server though.

Brian R. Bondy
A: 

You can't, unless the particular directory you want has directory listing enabled and no default file (usually index.htm, index.html or default.html but always configurable). Only then will you be presented with a directory listing, which will usually be marked up with HTML and require parsing.

roryf
+6  A: 

A few important considerations before the code:

  1. The HTTP Server has to be configured to allow directories listing for the directories you want;
  2. Because directory listings are normal HTML pages there is no standard that defines the format of a directory listing;
  3. Due to consideration 2 you are in the land where you have to put specific code for each server.

My choice is to use regular expressions. This allows for rapid parsing and customization. You can get specific regular expressions pattern per site and that way you have a very modular approach. Use an external source for mapping URL to regular expression patterns if you plan to enhance the parsing module with new sites support without changing the source code.

Example to print directory listing from http://www.ibiblio.org/pub/

namespace Example
{
    using System;
    using System.Net;
    using System.IO;
    using System.Text.RegularExpressions;

    public class MyExample
    {
        public static string GetDirectoryListingRegexForUrl(string url)
        {
            if (url.Equals("http://www.ibiblio.org/pub/"))
            {
                return "<a href=\".*\">(?<name>.*)</a>";
            }
            throw new NotSupportedException();
        }
        public static void Main(String[] args)
        {
            string url = "http://www.ibiblio.org/pub/";
            HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
            using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
            {
                using (StreamReader reader = new StreamReader(response.GetResponseStream()))
                {
                    string html = reader.ReadToEnd();
                    Regex regex = new Regex(GetDirectoryListingRegexForUrl(url));
                    MatchCollection matches = regex.Matches(html);
                    if (matches.Count > 0)
                    {
                        foreach (Match match in matches)
                        {
                            if (match.Success)
                            {
                                Console.WriteLine(match.Groups["name"]);
                            }
                        }
                    }
                }
            }

            Console.ReadLine();
        }
    }
}
smink
A: 

You can alternatively set the server up for WebDAV.

Frank Krueger