tags:

views:

350

answers:

2

Currently my application uses C# with MONO on Linux to communicate to local file systems (e.g. ext2, ext3). The basic operations are open a file, write/read from file and close/delete the file. For this, currently i use C# native APIs (like File.Open) to operate on the file.

My Question is: If i install Hadoop file system on my Linux box. Then what change i need to do to my existing functions so that they communicate to hadoop file system to do basic operations on the file. Since Hadoop infrastructure is based on Java, How any C# (with MONO on linux) application will do basic operations with Hadoop. Do the basic APIs in C# to operate on a file(likr File.Open or File.Copy) work well with Hadoop filesystems too?

I was thinking something like this: Since Hadoop exposes C API for file operations. So write a C wrapper and make a DLL out of it. Then use this DLL in C# code to communicate to Hadoop FileSystems.

Does this seem right? or Can someone please suggest some document or the steps so that my c# programs can open/read/write files from Hadoop FileSystems.

thanks, Anil.

+1  A: 

Hadoop supports mounting HDFS via fuse: http://wiki.apache.org/hadoop/MountableHDFS This is probably a simpler solution than wrapping the native C libraries, although that approach would also work.

Jakob Homan
+1  A: 

Hey Anil,

You can also use the Thrift interface to HDFS to get a C# client. See http://wiki.apache.org/hadoop/HDFS-APIs for more information.

I'd recommend the FUSE route, however, as there is significant work underway to improve that interface and allow use of HDFS with a native client.

Lastly, there is a WebDAV interface that we have used internally for access to HDFS files from a Windows machine. Here's the internal wiki page:

How to configure CDH2 + WebDav.

  1. Clone the HDFS-over-Webdav repository

     1. git clone git://github.com/huyphan/HDFS-over-Webdav.git
     2. Set HDFS_WEBDAV_SRC_DIR to the path you cloned it to
    
  2. Edit conf/hadoop-webdav.sh

    export HADOOP_WEBDAV_HOST=xxx.xxx.xxx.xx # Externally accessible NN host/IP export HADOOP_WEBDAV_PORT=9001 # Choose a port export HADOOP_WEBDAV_HDFS=hdfs://localhost:9000/ # fs.default.name export HADOOP_WEBDAV_CLASSPATH=$HDFS_WEBDAV_SRC_DIR/lib # See above

  3. Build/install

    export HADOOP_HOME=XXX cd HDFS-over-Webdav ant -Dhadoop.dir=$HADOOP_HOME cp bin/* $HADOOP_HOME/bin cp conf/* $HADOOP_HOME/conf

  4. Start WebDav server

    cd $HADOOP_HOME ./bin/start-webdav.sh # logs in $HADOOP_HOME/logs

  5. Access

     1. Can use user@ authority syntax below if you have HDFS permissions setup
     2. XP: Add network place http://$HADOOP_WEBDAV_HOST:$HADOOP_WEBDAV_PORT/ under "My Network Places"
     3. Vista/Win7: "Map Netowork Drive" using the above ID
     4. Linux CLI: cadaver $HADOOP_WEBDAV_HOST:$HADOOP_WEBDAV_PORT
     5. Linux Nautilus: Go -> Location, use above ID
    

Regards,

Jeff

Jeff Hammerbacher