tags:

views:

85

answers:

2

This weekend I was working on a project and I needed to use a binomial distribution to test the probability of an event (the probability that x of y characters would be alphanumeric given random bytes). My first solution was to write the test myself since it is rather simple.

def factorial(n):
    if n == 0:
        return 1
    else:
        return n * factorial(n-1)

def binomial_prob(n,k,p):
    bin_coeff = (factorial(n))/(factorial(k)*factorial(n-k))
    return = bin_coeff * pow(p,k) * pow((1 - p),(n-k))

And I used that. However, SciPy includes a binom_test method that does exactly this. But, for distribution this probably increases the size significantly (both SciPy and NumPy would be required) and it is for a relatively simple test. I suppose an auxiliary question is how intelligent is py2exe. Does it just import the modules I use from SciPy and NumPy, or the whole libraries. I expect just the modules that I reference, but I guess the next question is on how many modules does SciPy.stats depend on. But I digress... So my question is this, when should I use code already written at the cost of including far more than I need, and when should I just write my own implementation?

(I tagged this as python, but I suppose it could be a more general question)

+5  A: 

"when should I use code already written at the cost of including far more than I need"

Always.

When should I just write my own implementation?

Never.

The "including far more than I need" question is generally quite silly. What do you care how much is "included"?

The only time this can ever matter is when you're writing embedded software and are severely memory-constrained.

For all other programming -- All other programming -- don't think twice. Include pre-written code early and often. Write less. Solve problems more quickly. The operating system will swap the unused pages out of memory. You can safely ignore them.

Programming is about solving problems, not producing code. Less code is better. No code is best.

S.Lott
+1 as reinventing the wheel is very seldomly a good idea. Additionally, using these libraries gives more robustness. F.ex. the code above for computing binomial coefficients performs way too many multiplications. So using the library also gives you better performance.
Frank
I'm wary of using words like "always" and "never": the universe is full of special cases.
Derrick Turk
@Derrick Turk: I always use those terms specifically to force people to try and locate their actual edge cases. It's never possible to for me to begin to enumerate all of their possible situations. It's always easier to say "always" and "never". Note that I actually provided an edge case AND also said "always".
S.Lott
Thanks for the pointers. I certainly believe in not reinventing the wheel, I just didn't know if even something this simple as this counted as reinventing the wheel.
ZVarberg
If it's been done, and you're doing it again, you're "reinventing the wheel". It's software, it doesn't wear out. Use what's been written before. Worse than reinventing the wheel is justifying it by claiming there's a "cost of including far more than I need".
S.Lott
A: 

The answer depends on who will use your application and how widely it will be distributed. The Unix/Linux folks tend to heavily favour use of existing libraries, because they are used to every machine being a development machine that can rebuild its own software from source. Partly this is because of necessity, as native code libraries typically need to be compiled and linked against the local environment. But on Windows it's a different proposition entirely, since most users can't, won't, and indeed shouldn't do that, so you have to consider how the use of these 3rd party libraries will affect your distribution plans - in terms of the license, in terms of the download size, the usability, etc.

You're talking about py2exe which suggests to me that you're making a single-file executable for distribution to Windows users. This means that your main concerns will be compatibility (since libraries containing native code can only run on one type of platform - Win32 code should be fine though) and size, since py2exe will not do anything cunning with the dependencies; expect the whole thing to be bundled into your executable. The best approach is to package it up and see what happens. It's a simple and non-destructive step, so you should try it for yourself as soon as possible.

You also need to consider the licenses of any libraries you distribute. Again the 're-use everything' crowd sometimes forget this because they often work on software that they don't have to redistribute and so this isn't an issue. For you, it might be, especially if you have code that is owned by your employer or institution, although it's important to realise that when you distribute Python apps, you essentially distribute source code for anybody to look at anyway.

Kylotan