views:

201

answers:

5

I'm working on some academic research projects involving scraping large data sets from the web using Python. It's been inconvenient to work on my academic institution's Linux server because (1) I don't have superuser access, meaning I'm dependent on the IT staff to install my packages, and (2) my disk quota is somewhat limited (I would ideally want ~10 GB). What is the simplest way for me to get access to a machine that solves these problems? I don't need huge processing power; I just need access to a reasonably fast machine that runs 24/7, so that my programs can run continuously, and above all, something very simple to get running, use, and maintain, since I have a few non-CS people working on this project with me. Linux would be preferable, but I'd consider Windows too.

I'm aware of Amazon Web Services, but am wondering if there's something more appropriate to my specific needs.

By the way, it would be a huge bonus if I could get some sort of remote desktop access to this machine so I wasn't limited to using SSH and SFTP.

Suggestions?

EDIT: I can't use VirtualBox or Virtual PC because I need the program to be running around the clock, and I need to turn off my laptop often, etc.

A: 

If you have a linux machine you can use, then SSH -X will allow you to start GUI programs. It's not remote desktop, but it's close.

ssh -X [email protected]
firefox

Then bam. A firefox window pops on your desktop.

Grant
A: 

Why does it need to be remote? It seems to me that you could buy a cheap box -- maybe even get one from FreeCycle -- and bring up Linux on it quickly. Ubuntu, for example.

kdgregory
The issue is that there are other people working on the project, and we're geographically distributed. So this solution would get complex.
RexE
+1  A: 

If you do want to stick with running on your CS department's machines, use virtualenv to solve your package installation woes. And if disk space is an issue, you could use S3 (and perhaps FUSE) to store huge amounts of data extremely cheaply.

However, if that's not really what you're after, I can recommend Slicehost very highly. They give you a virtual private server - so you have complete control over what gets installed, users, admin, etc.

In principle, it's very much like EC2 (which I prefer to use for "real" servers), but has a friendly interface, great customer service and is aimed at smaller projects like yours.

Alabaster Codify
A: 

I have been pretty happy with TekTonic Virtual Private Servers. It's a virtualized environment, but you have full root access to install any packages you need. I'm not sure what your CPU and memory constraints are, but if they aren't too extensive then this should fit the bill nicely for you. I don't know if you would be able to enable a remote desktop as I've never tried but it may be possible to install the requisite packages.

The plans range from $15/mo to $100/mo, the $15/mo plan comes with 294MB RAM, 13GB disk space, and 2.6GHz max CPU speed. I ran on that plan for quite a while and eventually moved up to the next level up with double the disk/cpu/mem, and I've been quite happy with it. I've been with them since 2003 and have yet to find anyone who offers equivalent plans at these prices.

Jay
+1  A: 

Use x11vnc with ssh. 'sudo apt-get install x11vnc' on your remote server.

Once you have that, you can access your remote server via vnc, but the great thing is that you can tunnel vnc over ssh like so:

ssh -X -C -L 5900:localhost:5900 remotehost x11vnc -localhost -display :0

For more details see the x11vnc manpage.

Or, just setup remote desktop -- (which is actually vnc) on your linux distribution. Most distributions come with a GUI to configure remote desktop access.

Steve Lazaridis