Repository organization for Hadoop project | ansaurus

tags:

views:

73

answers:

1

+6 Q:

Repository organization for Hadoop project

I am starting on a new Hadoop project that will have multiple hadoop jobs(and hence multiple jar files). Using mercurial for source control, I was wondering what would be optimal way of organizing the repository structure? Should each job live in separate repo or would it be more efficient to keep them in the same, but break down into folders?

+1 A:

If you're pipelining the Hadoop jobs (output of one is the input of another), I've found it's better to keep most of it in the same repository since I tend to generate a lot of common methods I can use in the various MR jobs.

Personally, I keep the streaming jobs in a separate repo from my more traditional jobs since there are generally no dependencies.

Are you planning on using the DistributedCache or streaming jobs? You might want a separate directory for files you distribute. Do you really need a JAR per Hadoop job? I've found I don't.

If you give more details about what you plan on doing with Hadoop, I can see what else I can suggest.

Eric Wendelin 2010-06-02 04:34:44

Thanks Eric. I won't planning on doing any streaming of jobs yet(may get there in the future, but not yet). Project is very young and sort of growing, so I am curious as to how to layout a good foundation that can accommodate further project growth.

Alex N. 2010-06-02 22:51:02

related questions

Downgrading from ClearCase to SVN/Mercurial

Using Mercurial, is there an easy way to diff my working copy with the tip file in the default remote repository

Can I commit only parts of my code using SVN or Mercurial?

Using mercurial, what's the easiest way to commit and push a single file while leaving other modifications alone?

Using mercurial's mq for managing local changes

How can I integrate a bitbucket repository with the hosted on-demand version of FogBugz?

Can Mercurial be integrated into VS2008?

What are some GUI clients available for Mercurial?

BitBucket or freeHg?

Setting Environment Variables for Mercurial Hook

What are the relative strengths and weaknesses of Git, Mercurial, and Bazaar?

Which is the most useful Mercurial hook for programming in a loosely connected team?

Kenai invites: where to get them?

How to repeatedly merge branches in Mercurial

Why is branching and merging easier in Mercurial than in Subversion?

How to use p4merge as the merge/diff tool for Mercurial?

HgTortoise in Vista 64-bit not showing the context menu

What is the Difference Between Mercurial and Git?

Mercurial .hgignore for Visual Studio 2008 projects

Good Mercurial repository viewer for Mac

Is there a bug/issue tracking system which integrates with Mercurial?

how to allow files starting with period and no extension in windows 2003 server?

Mercurial stuck "waiting for lock"

How to combine two projects in Mercurial?

What is a good Mercurial usage pattern for this setup?