views:

545

answers:

8

I need to have as part of a desktop application a file server which should respond as fast as possible to file transfer requests (from remote clients, usually located on the same LAN). There will be many file requests for small sized files. The server should be able to provide both upload and download services.

I am not tight to any particual technology so I am open to any programming language, toolkits, libraries as long as they can run on Windows.

My initial take is to go with a C/C++ implementation using Windows Sockets or use the services provided by libraries such as Boost (asio or such). I have also thought of Erlang but that I'll have to learn and so the performance benefits should justify the increased development time due to having to learn the language.

LATER EDIT: I appreciate the answers that say use FTP or HTTP or basically anything that has been already created but considering you still want to write one from scratch, what would you do?

+4  A: 

Why not just go with FTP? You should be able to find an adequate server implementation in any language, and client access libraries too.

It sounds like a lot of wheel-reinvention. Granted, FTP is not ideal, and has a few odd spots, but ... it's there, it's standard, well-known, and already very widely implemented.

unwind
Yeah, or even TFTP! :)
Greg D
+1  A: 

Sounds like you should use an SFTP (SSH) server, it's firewall/NAT safe, secure, and already does what you want and more. You could also use SAMBA or windows file sharing for an even more simple implementation.

Brian R. Bondy
+2  A: 

If all the machines are running on Windows on the same LAN, why do you need a server at all? Why not simply use Windows file sharing?

anon
This would more likely to be a better solution specially on Windows Server 200x range as high throughput APIs (TransmitFile/TransmitPacket et al.) being used in file sharing.
Indeera
I can see using Windows File Sharing as creating management headaches - it would be at themercy of however the end-user (or administrator) happened to have configured it, whichmight not be compatible with the requirement.
MarkR
This would be a srhink-wrapped product, sharing is not an option as pointed by MarkR
grivei
+1  A: 

Why not use something existing, for example a normal Web server handles a lot of small files (images) very well and fast.

And lots of people already spent time in optimizing the code.

And the second benefit is that the transfer is done with HTTP which is an established protocol. And is easily switched to SSL if you need more security.

For the uploads, they are also no problem with a script or custom module - with the same method you can also add authorization.

As long as you don't need to dynamically seek the files i guess this would be one of the best solutions.

Fionn
+2  A: 

I would suggest not to use FTP, or SFTP, or any other connection oriented technique. Instead, go for a connectionless protocol or technique.

The reason is that, if you require lots of small files to be uploaded or downloaded, and the response should be as fast as possible, you want to avoid the cost of setting up and destroying connections.

I would suggest that you look at either using an existing implementation or implementing your own HTTP or HTTPS server/service.

Dave Van den Eynde
Implementing my own HTTP seems a bit of overkill for me as I need a tiny subset of the functionality of an HTTP server and my authentication model would be quite different (and simpler). Thanks for answering.
grivei
A: 

It's a new part to an existing desktop application? What's the goal of the server? Is it protecting the files that are uploaded/downloaded and providing authentication and/or authorisation? Does it provide some kind of structure for the uploads to be stored in?

One option may be to install Apache HTTP Server on the machine and serve the file via that. Use POST to upload and GET to download.

If the clients are within a LAN could you not just share a drive?

Steve Claridge
it is a new shrink wrapped dektop app. Not a line of code has been written yet. Sharing is not an option due to deployement/setup reasons. HTTP has been suggested before. I'll look into it. Thanks
grivei
+1  A: 

Your bottlenecks are likely to come from one of the following sources:

  • Harddisk I/O - The WD velociraptor is supposed to have a random access speed of about 100MB/s. Also, it is important whether you set it up as RAID0,1,5 or what nots. Some read fast but write slow. Trade-offs.

  • Network I/O - Assuming that you have the fastest harddisks in a fast RAID setup, unless you use Gbit I/O, your network will be slow. If your pipes are big, you still need to supply it with data.

  • Memory cache - The in-memory file-system cache will need to be big enough to buffer all the network I/O so that it does not slow you down. That will require large amounts of memory for the kind of work you're looking at.

  • File-system structure - Assuming that you have gigabytes worth of memory, then the bottleneck will most likely be the data-structure that you use for the file-system. If the file-system structure is cumbersome it will slow you down.

Assuming that all the other problems are solved, then do you worry about your application itself. Notice, that most of the bottlenecks are outside your software control. Therefore, whether you code it in C/C++ or use specific libraries, you will still be at the mercy of the OS and hardware.

sybreon
+1  A: 

For frequent uploads of small files, the fastest way would be to implement your own proprietary protocol, but that would require a considerable amount of work - and also it would be non-standard, meaning future integration would be difficult unless you are able to implement your protocol in any client you'll support. If you choose to do it anyway, this is my suggestion for a simple protocol:

  1. Command: 1 byte to identify what'll be done: (0x01 for upload request, 0x02 for download request, 0x11 for upload response, 0x12 for download response, etc).
  2. File name: can be fixed-size or prefixed with a byte for the length (assuming the name is less than 255 bytes)
  3. Checksum, MD5 for instance (if upload request or download response)
  4. File size (if upload request or download response)
  5. payload (if upload request or download response)

This could be implemented on top of a simple TCP socket. You can also use UDP, avoiding the cost of establishing a connection but in this case you have to deal with retransmission control.

Before deciding to implement your own protocol, take a look at HTTP libraries like libcurl, you could make your server use standard HTTP commands like GET for download and POST for upload. This would save a lot of work and you'll be able to test the download with any web browser.

Another suggestion to improve performance is to use as the file repository not the filesystem, but something like SQLite. You can create a single table containing one char column for the file name and one blob column for the file contents. Since SQLite is lightweight and does an efficient caching, you'll most of the time avoid the disk access overhead.

I'm assuming you don't need client authentication.

Finally: although C++ is your preference to give you raw native code speed, rarely this is the major bottleneck in this kind of application. Most probably will be disk access and network bandwidth. I'm mentioning this because in Java you'll probably be able to make a servlet to do exactly the same thing (using HTTP GET for download and POST for upload) with less than 100 lines of code. Use Derby instead of SQLite in this case, put that servlet in any container (Tomcat, Glassfish, etc) and it's done.

Fabio Ceconello
Thank you for your answer
grivei