ansaurus

Question

Answer 1

+2 A:

Familiar code, every programmer knows C or C-like code

Many devs are familiar with C or C-like code, it doesn't make them C++ compliant. Unexperienced C++ devs can do a lot of harm to such a complex project and you would have to take extra care.

I can't speak for python but I heard it's more beginner-friendly.

I'd say, once again, you should go for the language you (as a team) know best.

f4 2010-02-26 19:01:18

Your first point is really interesting, but the second point isn't important for us we have a cool collection of programmers on both languages.

Khaled Al Hourani 2010-02-26 21:34:21

Answer 2

+5 A:

Write it in Python, profile it, and if you need to speed parts of it up, write them in C++. Python and C++ are similar enough that the "familiar" advantage with C++ will be irrelevant pretty quick.

I say this as someone who has developed primarily in C++ and has recently gotten serious with Python. I like them both, but I can get Python code working a lot faster than C++. Seriously, dict beats std::map in usability.

P.S. Here's some information on how to call C code from Python.

Mike DeSimone 2010-02-26 19:02:29

+1 for algorithm development and prototyping python wins; it's easy to then move computationally intensive parts into a c/c++ module if need be.

Autopulated 2010-02-26 19:03:47

I did that already, C++ has amazing execution time, however we may ignore this regarding Python pros.But your idea is really cool and pragmatic.

Khaled Al Hourani 2010-02-26 21:16:09

@Kahled: That's been our experience. Sure, I can get FFTs going insanely fast (using fftw or MKL) in C++, but >95% of my code isn't `fft()`, it's decision making, initialization, and management. Also, that's the code that gets changed most of the time, not the inner-loop stuff. And when I did that part in Python, I was impressed with how it was measurably slower but not practically slower in my application, while being far faster to develop.

Mike DeSimone 2010-02-26 22:30:24

Answer 3

+9 A:

Although this is subjective and argumentative, there is evidence that you can write a successful NLP project in python like NLTK. They also have a comparison of NLP functionality in different languages:

(Quoting from the comparison)

Many programming languages have been used for NLP. As explained in the Preface, we have chosen Python because we believe it is well-suited to the special requirements of NLP. Here we present a brief survey of several programming languages, for the simple task of reading a text and printing the words that end with ing. We begin with the Python version, which we believe is readily interpretable, even by non Python programmers:

import sys
for line in sys.stdin:
    for word in line.split():
        if word.endswith('ing'):
            print word

[...]

The C programming language is a highly-efficient low-level language that is popular for operating system and networking software:

#include <stdio.h>
#include <string.h>

int main(int argc, char **argv) {
   int i = 0;
   int c = 1;
   char buffer[1024];

   while (c != EOF) {
       c = fgetc(stdin);
       if ( (c >= '0' && c <= '9') || (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') ) {
           buffer[i++] = (char) c;
           continue;
       } else {
           if (i > 2 && (strncmp(buffer+i-3, "ing", 3) == 0 || strncmp(buffer+i-3, "ING", 3) == 0 ) ) {
               buffer[i] = 0;
               puts(buffer);
           }
           i = 0;
       }
   }
   return 0;
}

Edit: I didn't include comparable code in C++/Boost, so I add a code sample that does something similar, although not identical from the Boost documentation. Note that this isn't the cleanest version.

// char_sep_example_1.cpp
#include <iostream>
#include <boost/tokenizer.hpp>
#include <string>

    int main()
    {
      std::string str = ";;Hello|world||-foo--bar;yow;baz|";
      typedef boost::tokenizer<boost::char_separator<char> > 
        tokenizer;
      boost::char_separator<char> sep("-;|");
      tokenizer tokens(str, sep);
      for (tokenizer::iterator tok_iter = tokens.begin();
           tok_iter != tokens.end(); ++tok_iter)
        std::cout << "<" << *tok_iter << "> ";
      std::cout << "\n";
      return EXIT_SUCCESS;
    }

Otto Allmendinger 2010-02-26 19:05:50

+1 Above is directly applicable to the question and provides additional info.

Dana the Sane 2010-02-26 19:08:33

thanks, it would be nice to know who -1ed me without comment though

Otto Allmendinger 2010-02-26 19:09:57

The question is about python and C++/boost, this answer is about python and C. You can write a lot cleaner equivalent in C++ here

f4 2010-02-26 19:16:29

I really appreciate your work and if I have a second correct answer, I'd give to you :)

Khaled Al Hourani 2010-02-26 21:19:08

@Otto: I was an early upvote for you; your answer covers a lot I didn't. Sometimes people just hit you with a downvote and don't say why. It's annoying, but no big deal in the long run. Also, I'm no Boost expert, but the C++ solution doesn't seem to do what the Python or C solutions do... nothing in there involving `"ing"`...

Mike DeSimone 2010-02-26 22:35:47

@Khaled no problem, doesn't matter that much. @Mike I haven't found analogue code for boost, the sample I found at least demonstrates word iteration. I'll try to find a more representative sample.

Otto Allmendinger 2010-02-27 00:28:02

Answer 4

+1 A:

This is more or less a reply/supplement to Otto Almendinger's answer. If you honestly wanted to implement something (roughly) similar to his Python example in C++, I think something like this would be closer:

#include <string>
#include <iostream>

int main() { 
    std::string temp;
    while (std::cin>>temp) 
        if (temp.size()>2 && temp.substr(temp.size()-3, 3)=="ing")
           std::cout << temp;
}

This does essentially the same thing as the Python does, and is about the same length as well -- the C++ has more syntactic "fluff", but they have exactly the same number of lines of code that really do anything (though there's no question that the individual lines in the C++ version are longer).

Don't get me wrong: I'm certainly not trying to claim that development with C++ will be as quick or easy as with Python. I do think the margin might be a tad smaller than some of the code presented here might imply though.

Edit: If you did want to claim C++ would be faster and easier, you could present code like:

for (std::string temp; std::cin>>temp; )
    temp.size()>2 && temp.substr(temp.size()-3, 3)=="ing" && std::cout << temp;

...along with a factually accurate (though grossly misleading) claim like: "The C++ code has only half as many statements as the Python implementation."

Jerry Coffin 2010-02-27 07:57:12

ansaurus

tags:

views:

answers:

NLP project, python or C++

related questions