tags:

views:

61

answers:

1

Could mapreduce be used to implement a webserver?

I'm thinking something like when a request comes in then the request sits on a queue, until a server is free to process it? Or am I missing the point here?

+2  A: 

I think you're missing the point. MapReduce is a way of breaking up a large data set so that it can be distributed among a number of compute nodes for parallel processing.

Paul R
But isn't the web requests into a web server switch the data, and the HTML pages the response? Isn't this what a load balancer does?
Zubair
@Paul: It sounds like you're describing Hadoop, not MapReduce. The latter is just an algorithm for processing data.
skaffman
@skaffman: noted - I've edited my answer to hopefully make it more accurate
Paul R
@Zubair - individual web server requests are small and asynchronous - there is no large data set to break up for parallel processing - unless I'm missing something ?
Paul R
@paul. Are you saying that MapReduce only is useful for large input datasets? What distributed algorithm can I use for small input datasets then?
Zubair
@Zubair - it's not clear to me why you would need to consider distributed processing if you only have a small amount of data ?
Paul R
@paul. The output data is not small, it is quite large HTML files. But I still am trying to understand what you said about mapreduce not being suitable for small input requests. For example, if there were 1000 requests a second, for example for twitter, wouldn't they use mapreduce to process this even though they are small input requests? So, the datasets I need it for are huge in total size, but each request in small in size
Zubair
@Zubair - for this particular scenario it's not apparent what benefit there could possibly be - all it would seem to do is add latency. There's an old adage about a man who has only a hammer seeing every problem as a nail...
Paul R
@Paul. but google uses this exact same thing to scale requests to their website. Also, most high volume web sites use load balancers, so it isn't just google. I guess you are right and we are talking about different things, as I'm talking about high volume web sites, all of which use distribute their requests to different machines. Thanks for looking into it though :)
Zubair
@Zubair: I think if you read the Wikipedia page on MapReduce carefully you'll get a better idea of what it does and the kind of applications that Google uses it for.
Paul R
@Paul. I read through that and I see where I went wrong now. A webserver only implements the Map part of map-reduce, as it produces HTML output on each Mapper node, but there no Reduce phase for a webserver. Thanks, that was very helpful
Zubair