crawling

SharePoint crawling - Windows authentication failing for STS4?

In a Sharepoint 2010 installation, we are trying to crawl the content of a small, single-node SharePoint installation. The crawling is partially successful. We are able to retrieve data delivered from the web services (_vti_bin/sitedata.asmx), but when the crawler tries to access the full page contents, it fails. The error message shown ...

How to block search engines from finding sub.domains?

Hi, I want to block search engines like Google and Yahoo from crawling user sub.domains like user.example.com, how can i do it? ...

Crawling a social network in python

I would like to write a python script to crawl a social network website. The aim of the script should be to retrieve a piece of the social graph (friendships relationship). The website does not provide any API. The problem is: how can i crawl a website in python which pretends a login session to access the contact pages (for example, ...