ansaurus

Question

Answer 1

+1 A:

Obviously, the links in a site are not a tree, but a graph. You should have a Page object, which is identified by a URL, and a Link object, which points from one page to another (and Page A can point to page B, while page B is pointing to Page A, making it a graph, instead of a tree).

Scanning algorithm pseudo-code:

process_page(current_page):
    for each link on the current_page: 
    if target_page is not already in your graph:
        create a Page object to represent target_page
        add it to to_be_scanned set
    add a link from current_page to target_page

scan_website(start_page)
    create Page object for start_page
    to_be_scanned = set(start_page)
    while to_be_scanned is not empty:
        current_page = to_be_scanned.pop()
        process_page(current_page)

Ofri Raviv 2010-09-07 13:16:56

Yes, it's totally a graph and not a tree. Thanks!

hyperboreean 2010-09-08 09:36:42

ansaurus

tags:

views:

answers:

storing links of a site in a tree

related questions