tags:

views:

128

answers:

2

Hello guys,

How is it possibe to generate a list of all the pages of a given website programatically using PHP?

What I'm basically trying to achieve is to generate something like an sitemap, in nested unordered list with links for all the pages contained in a website.

Thank you in advance for your answers,
Constantin TOVISI

A: 

You can easly list the files with the glob function... But if the pages uses includes/requires and other stuff to mix multiple files into "one page" you'll need to import the Google "site:mysite.com" search results.. Or just create a table with the URL of every page :P

Maybe this can help: http://www.xml-sitemaps.com/ (SiteMap Generator)

TiuTalk
A: 

If all pages are linked to one another, then you can use a crawler or spider to do this.

If there are pages that are not all linked you will need to come up with another method. You can try this:

  1. Add an "image bug/web beacon/web bug" to each page you tracked as follows:
    OR
    alternatively add a javascript function to each page that makes a call to /scripts/logger.php You can use any of the javascript libraries that make this super simple like Jquery, Mootools, or YUI.
  2. Create the logger.php script, have it save the request's originating URL somewhere like a file or a database.

Pros: - Fairly simple

Cons:

  • Requires edits to each page
  • Pages that aren't visited don't get logged

Some other techniques that don't really fit your need to do it programatically but may be worth considering include:

  • Create a spider or crawler
  • Use a ripper such as CURL, or Teleport Plus.
  • Using Google Analytics (similar to the image bug technique)
  • Use a log analyzer like Webstats or a freeware UNIX webstats analyzer