views:

197

answers:

6

I'm working on a graduation project for one of my university courses, and I need find some place to run several crawlers I wrote in C# from. With no web hosting experience, I'm a bit lost. Is this something that any site allows? Do I need a special host that gives more access to the server? The crawler is a simple app that does its work, then periodically writes information to a remote database.

+4  A: 

A web crawler is a simulation of a normal user. It acess sites like browsers do, getting the html code (javascript, etc.) returned from the server (so no internal access to server code). Being that, any site can be crawled.

Be aware of some web crawler ethics guidelines. There are pages you shouldn't index or follow its links. And web developers build some files and instructions to web crawlers, saying what you can index or follow.

Samuel Carrijo
A: 

You will need a VPS(Virtual private server) or a full on dedicated server. Crawlers are nothing more then applications that "crawl" the internet. While you could set up a web site to be a crawler, it is not practical because the web page would have to be accessed for you crawler to work. You will have to read the ToS(Terms of service) for the host to see what the terms are for usage. Some of the lower prices hosts will cut your connection with a reason of "negatively impacting the network" if you try to use to much bandwidth even though they have given you plenty to use.

VPS are around $30-80 for a linux server and $60+ for a windows server. Dedicated services run $100+ for both linux and windows servers.

Tony
A: 

You don't need any web hosting to run your spider. Just ask for a PC with web connection that can act as a dedicated server,configure the database and run the crawler from there.

bruno conde
A: 

This doesn't seem to have anything to do with web hosting. You just need a machine with an internet connection and a database server.

I'd check with your university if I were you. At least in my time, a lot was possible to arrange in-house when it came to graduation projects.

Failing that, you could look into a simple VPS (Virtual Private Server) account. Unless you are sure your app runs under Mono, you will need a Windows one. The resource limits are usually a lot lower than you'd get from a dedicated server, but they're relatively affordable. Some will offer a MS SQL Server database you can use next to the VPS account (on another machine). Installing SQL Server on the VPS itself can be a problem license wise.

Make sure you check the terms of usage before you open an account, as well as the (virtual) system specs though. Also check if there is some kind of minimum contract period. Sometimes this can be longer than a single month, especially if there is no setup fee.

If at all possible, find a host that's geographically close to you. A server on the other side of the world can get a little annoying to access remotely using Remote Desktop.

Thorarin
A: 

If you can't run it off your desktop for some reason, you'll need a host that lets you execute arbitrary C# code. Most cheap web servers don't do this due to the potential security implications, since there will be several other people running on the same server.

This means you'll need to be on a server where you have your own OS. Either a VPS - Virtual Private Server, where virtualization is used to give you your own OS but share the hardware - or your own dedicated server, where you have both the hardware and software to yourself.

Note that if you're running on a server that's shared in any way, you'll need to make sure to throttle yourself so as to not cause problems for your neighbors; your primary issue will be not using too much CPU or bandwidth. This isn't just for politeness - most web hosts will suspend your hosting if you're causing problems on their network, such as denying the other users of the hardware you're on resources by consuming them all yourself. You can usually burst higher usage levels, but they'll cut you off if you sustain them for a significant period of time.

matthock
A: 

80legs lets you use their crawlers to process millions of web pages with your own program.

The rates are:

  • $2.00 per million pages
  • $0.03 per CPU-hour

They claim to crawl 2 billion web pages a day.

f3lix