views:

4559

answers:

6

So, I've been looking at Hadoop with keen interest, and to be honest I'm fascinated, things don't get much cooler.

My only minor issue is I'm a C# developer and it's in Java.

It's not that I don't understand the Java as much as I'm looking for the Hadoop.net or NHadoop or the .net project that embraces the Google MapReduce approach. Does anyone know of one?

Thanks

+10  A: 

Have you looked at using Hadoop's streaming?

I use it in python all the time :-).

I'm starting to see that the heterogeneous approach is often the best and it looks like other folks are doing the same.

If you look at projects like protocol-buffers or facebook's thrift you see that sometimes it's just best to use an app written in another language and build the glue in the language of your preference.

chews
A: 

There's a pretty cute MapReduce implementation for .NET at: http://mapsharp.codeplex.com/

+3  A: 

Recently, MySpace released their .NET MapReduce framework, Qizmt, as Open Source, so this is also a potential contender in this space.

foxxtrot
Their license is GPL ;( Would be great if they've chosen something less restrictive...
IgorK
+2  A: 

I would say that DryadLinq is the closest thing that us .Net folk have to Hadoop. But it depends what you want to use hadoop for. If you are looking for the optimized self maintaining distributed file (DFS) system than DraydLinq isnt what you are looking for. It has an analog to the DFS but you have to manually build the partitions and distribute each partition.

That being said, if its the distributed execution aspect of hadoop that you are looking for than draydLinq is truly wonderful (and no, i'm not affiliated with MS). As long as you have a Microsoft HPC cluster setup than getting going with dryadLinq is really easy.

The code you write is really just straight LINQ code, except instead of executing the LINQ on IEnumerable you have to execute it on PartitionedTable (the self build distributed data structure).

What has really been cool about DryadLinq is the fast turn around time (try, test, adjust, repeat) when developing algorithms. You just write LINQ code to do your calculations and DryadLinq will take care of the whole distributed execution part. Its the most natural analog i've come across that makes writing code for distributed processing just like writing code for single process processing

Turbo
A: 

May be better to use apache Hadoop and streaming.Because apache hadoop is actively developed and maintained by big giants in Industry like yahoo,facebook.So it can do what expected.If u need in dot net please check Myspace implementation of 'MySpace Qizmt - MySpace’s Open Source Mapreduce Framework'

Dileep stanley