tags:

views:

26

answers:

2

If I have a link say http://yahoo.com/ so can I get the links inside yahoo? For example, I have a website http://umair.com/ and I know there are just 5 pages Home, About, Portfolio, FAQ, Contact so can I get links as follows programmatically?

http://umair.com/index.html
http://umair.com/about.html
http://umair.com/portfolio.html
http://umair.com/faq.html
http://umair.com/contact.html
+1  A: 

Define what you mean by "links inside yahoo".

Do you mean all pages for which there is a link to on the page returned by "http://www.yahoo.com"? If so, you could read the HTML returned by an HTTP GET request, and parse through it looking for <a> elements. You could use the "HTML Agility Pack" for help.

If you mean, "All pages on the server at that domain", probably not. Most websites define a default page which you get when you don't explicitly request one. (for example, requesting http://umair.com almost certainly returns http://umair.com/index.html). Very few website don't define a default, and they will return a list of files.

If you mean, "All pages on the server at that domain, even if they define a default page", no that cannot be done. It would be an extreme breach of security.

James Curran
The last paragraph you mentioned was is my question. You're saying this is not possible but if you know about a tool named WebCopier, that does it. no?
Umair Ashraf
A: 

This could be done by a Web Crawler, read some basic information about it:

http://en.wikipedia.org/wiki/Web_crawler

Includes Open Source crawlers, see if any of them is what you are looking for.

Eton B.
Thanks, I already know about it. I needed to ask whether I can overcome the default page setting in Servers?
Umair Ashraf