views:

333

answers:

8

I'm amazed to see that Civilization IV has 200,000 lines of code just for its rules.

I don't know how someone could reason about 100,000+ lines of code. Have you worked on a similar big project? How would you know that a new feature is not going to break something?

A side note: The lead designer and AI programmer of Civ4 recently spoke about the game AI.

+3  A: 

The web app that I work on has many thousands of lines of code, but we're using the MVC methodology and have follow consistent code structure practices, so really finding, reading, adding or maintaining code isn't that difficult.

Don't make the mistake of thinking those 200,000 lines of code are all in the same file, minified and compressed, they will be structured to ensure development efficiency.

ILMV
Well I am interested in the question itself. What if you have a large, some kilo lines of code, single file? I had the bad luck of having to provide support for a system made up of files at least 2000 lines of code each, and this was very difficult. Can't imagine what to do with larger files.
Alexander
+17  A: 

How do you eat an elephant? One bite at a time.

James Gregory
That's on how to progress. Now how do you _make_ an elephant, how do you _repair_ one. You can't do that without proper knowledge of the whole.
xtofl
On large enough systems it's simply not feasible for a single person to know the whole system well. Individuals may know it all of it vaguely and parts of it specifically, but it's down to the team and how it's lead that will dictate whether a large system is maintainable.
James Gregory
A: 

KDE Contains over 6 million lines of code, and QT is larger than that.

Jords
This is not relevant - the op asked how people managed large projects, not if people could just quote the largest projects they know of.
Paul Hadfield
+2  A: 

Divide et impera, say the ancient romans. You can't work on all that code as a single entity, but if you divide it in many small entities, each with a precise, confined, function, then you can conquer them easily. See, for instance, the Rails application structure. That's a very good example of small files with a semantic structure.

To ensure new features do not break anything you rely on testing. While you develop you write tests, a set of functions with an attended result. For instance, if you program a calculator, you can write a test to tell "hey, 2+2 should be 4". When you add a new functionality you just run your set of tests and see if any of them fail. Tests are usually written on a sample data set, and are often divided into test sets (like, math tests, network tests, and so on). There also an interesting view on programming called test-driven development, where you first write tests, and then implement the function to statisfy those tests.

On a final note, don't be afraid of large numbers, when you see behind the matrix, code starts to flow and you'll be writing thousands of lines without even noticing. Good work.

Marek
+1  A: 

I joined a group not to long ago that's working on a code base of around 900 kloc of C++. At least for me, I find the following strategies helpful:

  1. Not being afraid to ask my coworkers for help with unfamiliar parts when needed. I find that nearly every part of the program has at least one person who's the expert on it.
  2. Test suites. Run them to sanity check your changes.
  3. Perform experiments. Trace code in the debugger. Take copious notes on how the data flows.
  4. As far as tools, Ack, GNU Global, and Emacs' ido-mode all together have been very useful (putting the later two together is particularly potent.)
Boojum
A: 

If software lives long enough, it can't avoid growing big, even huge.

I don't consider 200kLoC all that big. As a student, during a 4 months internship in the 90ies, I alone wrote ~20kLoC C++ code.

I have once worked in a C++ project which, during almost a decade, grew to several MLoC, half of which was written by not even a dozen developers. (The rest was either licensed or open-sourced 3rd-party code.) That software has several million installations (there's a good chance, you have it installed on your desktop machine), so it has to be quite robust. (The cost of 0.1% of such a customer base needing support could quickly kill bigger companies than that one.) Compiling it from scratch took 1hr on a dual core wintel machine running VS.
It was sold to a major company that shipped it as part of their product.

sbi
+2  A: 

As said before: divide and conquer. But in order to keep things maintainable, one addition is needed: divide, conquer and reuse.

By the way, each line of code may translate to more than 100 lines of assembly; language designers did a great job abstracting away the underlying machine. You should also abstract away serialization, GUI-creation, ... and build larger blocks with less lines of code.

If you have 200kLoc of which 80% is boilerplate, you have a problem: you should find a way of getting rid of the boilerplate code by e.g. using a preprocessor, a DSL, metaprogramming... It's both more fun, and more maintainable. Every construct that has to be written twice is a candidate for reuse.

On the other hand, code reuse requires meticulous bookkeeping: nothing must creep into the code base that's already written, and everything that's written may be a reuse candidate. You need a budget to maintain your library, too!

xtofl
+1  A: 

Read the documentation (if there's any), ask the guy who wrote it (if he's still around) or go through the relevant code (which as you suggested isn't a simple task).

Oren A