This is a debate I'm taking a part in. I would like to get more opinions and points of view.

We have some classes that are generated in build time to handle DB operations (in This specific case, with SubSonic, but I don't think it is very important for the question). The generation is set as a pre-build step in Visual Studio. So every time a developer (or the official build process) runs a build, these classes are generated, and then compiled into the project.

Now some people are claiming, that having these classes saved in source control could cause confusion, in case the code you get, doesn't match what would have been generated in your own environment.

I would like to have a way to trace back the history of the code, even if it is usually treated as a black box.

Any arguments or counter arguments?

UPDATE: I asked this question since I really believed there is one definitive answer. Looking at all the responses, I could say with high level of certainty, that there is no such answer. The decision should be made based on more than one parameter. Reading the answers below could provide a very good guideline to the types of questions you should be asking yourself when having to decide on this issue.

I won't select an accepted answer at this point for the reasons mentioned above.

+23  A: 

Put it in source code control. The advantage of having the history of everything you write available for future developers outweighs the minor pain of occasionally rebuilding after a sync.

-1 because ....
Thats not an advantage - since the code that created it is checked in you already have 'everything you write' available for future developers.
Shane C. Mason
@Shane, i strongly disagree. Having the code that created it does not equal having the code. Any extra steps that must be included for generation is extra annoyance when tracking down a bug. It's much simpler to go through the history of the code than it is to check out N versions of the file and re-generate N versions of the generated code.
It is sometimes beneficial to have the generated files in source control. For example if you upgrade a component, in this case SubSonic, you can easily detect changes in the generated source. This could be useful in tracking down bugs and issues. I wouldn't add all generated code to source control. Sometimes it is very useful. Most source control systems will let you do a diff to see if the files have really changed although it maybe more of a manual process if you have to manually revert the files even though the only change is the timestamp.
+1 having a repository for your application is most valuable when the *working application* is in the repository, not just the tools to eventually create the working application. Minimize friction.
Rex M
+1 Anything you need to debug should be in source control. Generated or not...its code.
Jason Punyon
Trying to debug based on intermediate code you didn't generate from original source just makes my head hurt. :(
le dorfier
By that logic you should also check in your compiled object files, libraries and executables.
Laurence Gonsalves
@Laurence and pdbs.
@Laurence, I disagree. DLL's provide no historical value in the sense that you cannot understand the history by examining diffs.
How often do you need to look at the history of the generated files as opposed to the history of the source files they were generated from? In my experience: virtually never (and I've worked on a code generator for the last 7 years). The extra costs imposed by checking in the generated code aren't huge for each change, but they add up quickly. Every single change that modifies a source to a generated file has an extra source of mistakes ("oops, forgot to check in generated file") and is more tedious to deal with in code review. Writing a script to regenerate historical versions is trivial.
Laurence Gonsalves
@Laurence i disagree on 2 of your points. First I find it to be extremely useful especially if i'm not familiar with the tool that generated the code. The original language is meaningless, all I care about is "what got compiled". Second, writing a script is not trivial. Your script must take into account all versions of the code generator and know for which change list which code generator was used. That is certainly not a trivial problem
What kind of code generator are you using where "the original language is meaningless"? As for the point about keeping track of what versions of your tools you're using to build each version of the code, you already need to solve that problem for your entire toolchain. After all, how do you expect to backport a bug to an older version of your product unless you know what version of the compiler and linker you were using back then? A code generator is no different from your C++/Java/C# compiler. The fact that you might be able to read its output is immaterial: its input is the source.
Laurence Gonsalves
In some cases the "original" no longer exists. Consider a DAL (NHibernate, LINQ, etc.) generated from a database. Unless you also snapshot your database, the input to your generator will never exist in exactly the same state again...
@Laurence Gonsalves. I don't think it's black and while. For example, code generated by Visual Studio designers, and by tools such as the MSDataSetGenerator or ResXFileCodeGenerator is generally checked in to source control. But I can imagine situations where I'd generate files using a preprocessor that aren't in source control.
@Laurence, as to what code generator I'm talking about is mainly aimed at several old hacky generators I've seen in source trees. The weird format the original and long since left author decided upon is meaningless. All I cared about in that case was the code that made it into the product. I don't feel the rest of your comment addresses how writing a script to re-generate the code would be a trivial operation. How for instance would you get your generator to understand that we use version X of the generator change list Y? Much easier to just browse the source history
@Joe: the original question was talking about code generated by the build process (ie: "every time a developer ... runs a build, these classes are generated"). Code generated by your IDE is a very different situation because there is no "source" for the generated code.
Laurence Gonsalves
@JaredPar: you have to deal with the issue of what tools are used in your build for each version of your code regardless of code generators. Checking in your generated code doesn't help for those build artifacts you aren't checking in or in cases when you need to change the source of generated code in an old version (say, to backport a bug fix). It's not only an incomplete solution, but a poor one as it imposes a recurring cost on every change that affects your generated code. Solve the real problem: keep either your build tools or pointers to them in source control.
Laurence Gonsalves
@Laurence, I fail to follow your argument as to why checking in *everything* is an incomplete solution. Yes you must have generators available, and likely checked in, for every version of the product you build. That is not the issue I am raising. You said it would be trivial to write a code generation script. I am stating that even with all of the generators checked in, writing a script to regenerate all of the versions of the actual compiled code is a non-trivial operation. It involves lots of syncing, building, mapping versions to generators, branch issue, etc ...
(cont) As a developer I want to be able quickly view the history of code that went my product over time. Typically, the primary goal is to identify when a particular change made it into the product. The easiest and simplest way to do this is to have all source code checked in period. Then I can rely on my source control server to do the job it was designed to do. Show me the history of my code.
@JaredPar: I don't know what your build environment looks like, but in mine the script to sync to a particular version and then build the project using the right version of the code generators is 2 lines. Roughly: "p4 sync @changelist; build projectname"
Laurence Gonsalves
@JaredPar: I think the fundamental problem is that you're using a code generator where "the original language is meaningless". Because you're not fixing that problem you end up having what are really workarounds. It makes no sense to have a code generator where its output is easier to deal with than its input. If you find yourself in that situation, go ahead and check in the generated code, but then also delete the code generator and the source as they aren't adding any value.
Laurence Gonsalves
@Laurence, sure that will work in most environments if you sync the world. I find that terrifically unacceptable and very costly. You also must generate all versions of the file and the script to run the diffs. Building one version is easy, building a history is not.
@Laurence, the original language being meaningless is an extreme example of a more general problem. The more general problem is you simply cannot expect every developer to be able to look at any code generator file and know exactly what the output will be. There are quite simply too many generators and differences between versions of the same generator to make this assumption. At the end of the day what matters is what source was actually used to build the product and hence should be checked in.
You're contradicting yourself. You already claimed that you can identify which input files led to changes in your generated files. After all, that's how you know that you need to resubmit them to source control. So you can use the same logic to determine which files to sync to.
Laurence Gonsalves
As for "the original language being meaningless is an extreme example": I agree, but it's also the only case where your argument holds water. The whole point of having a code generator is that its input is more concise than its output. Every code generator I've worked with generates code that is *far* more complicated than the input to the code generator. I cannot remember a single instance of someone wanting to look at the history of the generated code.
Laurence Gonsalves
@Laurence, I disagree that I am contradicting myself. The only way to *definitively* produce the exact same file is to sync the file and all of it's dependencies, including the generator dependecies (transitive and direct). In a project of medium or greater size the safest way to do this is to sync everything.
@Laurence, Yes generator code can be complex. But that is not the primary issue. What is valuable about checking the code into source code control is being able to view the history of the code. This is oftentimes much more valuable than having the actual code.
@Laurence, There are many cases where history is important. I've already cited identifying a regression. Another is Beta / RC bug fixes. These often have to be reviewed by many devs (and often non-devs) outside your organization who you will not be familiar with every generator you use. They simply don't care what I changed in an XML file because the change is meaningless to them. They want to see how my changes effect they shipping code. Having a version history of the file provides this function.
The contradiction is that you claim that it's too hard to figure out which files to sync yet at the same time you think it's easy to figure out when a generated file needs to be resubmitted. These amount to the same problem. The question is, do you want to have to figure it out on every single change (pretty much guaranteeing screw-ups) or do you want to do it the handful of times that you need to look at the historical changes to a generated file (if ever -- where I work we use a bunch of code generators which generate hundreds of files and not once have I heard of anyone needing to do this).
Laurence Gonsalves
I also have my doubts about non-devs preferring to see your generated code over an XML file. :-)
Laurence Gonsalves
@Laurence, it's simple to know when a generated file needs to be submitted. If I change a generator source, build and the file is different, I should submit. What is difficult about that? If the file is unchanged most / some source code control providers will simply ignore that file during check in.
@Laurence, I understand the doubts about non devs wanting to look at code but I assure this indeed true. Half of preparing for a RC / Beta bug is figuring out how to explain a code diff to people who are not every day devs. The first time I had to do this was an ... interesting experience.
@Laurence @JaredPear: I have never needed to review revision history for a generated file in 10 years of coding. I agree with Laurence's point that the input is easier to deal with then the output. The generated code should be a black box, if you use it, you should understand its input. In my experience, the real logic is in the input, or the code generator. Therefore, the bug will have to be fixed either in the code generator or in the input file.
Juan Mendes
@Juan, just because you haven't had the experience doesn't mean it's not common. I've had to review generated code files several time (doesn't make my experience normal either)
@JaredPear, I mentioned my experience, but the real point of my comment is that the real logic that may need to be fixed is in the input or the code generator, not the output. Therefore, it's a lot more likely that inspecting that history will give you more insight into the bug.
Juan Mendes
@Juan in crunch time late in the cycle changing a generator which would affect hundreds of files in a project is simply not a good suggestion when you could just change one of them.
@JaredPar Sorry, but hacking a generated file by hand does not sound like a better alternative. You do that and now you have some files that are generated and some that are hand maintained.
Juan Mendes
@Juan hacking a single generated file on a release only branch has no long term maintenance issues.

I would argue for. If you're using a continuous integration process that checks out the code, modifies the build number, builds the software and then tests it, then it's simpler and easier to just have that code as part of your repository.

Additionally, it's part and parcel of every "snapshot" that you take of your software repository. If it's part of the software, then it should be part of the repository.

I love the drive by -1's. If you don't agree, don't vote it up - vote up the other answers. Save the downvotes for a wrong answer. This is a subjective question.
+2  A: 

I agree with the other people, it can cause you a lot of problems. :-)

Agreed - since that source code is really output from a generation process, there's really no point in storing it in source control.
@mark_s What if you need to rollback to that version. One could argue that you cannot succesfully rollback unless you have the source as it was generated (i.e. regen source may not produce the exact same result). So I would prefer to have the source as it was generated for this reason.
that's true, but by the same reasoning, executables should also be saved. (which is necessary in some cases)
Jason S
JD: if you need to rollback to a version, you ought to have the original values you've generated your source from somewhere - so you should be able to re-generate your code without any problems. You need to store the *SOURCE* (whether it be an XML or whatever) inside your source control - *NOT* what gets generated from it!
+24  A: 

Saving it in source control is more trouble than it's worth.

You have to do a commit every time you do a build for it to be any value.

Generally we leave generated code( idl, jaxb stuff, etc) outside source control where I work and it's never been a problem

I disagree with "you have to do a commit every time you build". This should cause no extra commit because the only thing that should affect the commit is a change to the code which hence changes the generated source. So in effect you have to commit the generated code only when you're already commiting the change to the source of the generated code.
Agree with JaredPar. Also your code-generator may be an external tool, and if you update it, the generated code may change and therefore you may need to commit changes. But in this case I would really want to see the changes in source-control anyway.
+17  A: 

Every time I want to show changes to a source tree on my own personal repo, all the 'generated files' will show up as having changed and need comitting.

I would prefer to have a cleaner list of modifications that only include real updates that were performed, and not auto-generated changes.

Leave them out, and then after a build, add an 'ignore' on each of the generated files.

Also, on updates, you can get strange conflicts that the VCS will consider as needing resolution, but will actually resolve themselves the next time you build. Not to mention the clutter in the logs, which I consider even worse than the clutter in your local tree.
Where I'm at, they don't show up as 'having changed' unless they really have changed. If they were regenerated but still have the same content so the only thing different is file create/modified dates, the system thinks they haven't changed and everything is fine.
Joel Coehoorn
+1 I only want to be responsible for what code I write, not some code that got generated by some toolkit that may have had issues at the time that now are impossible to duplicate (but someone could spend a lot of time trying.)
le dorfier
I've seen autogenerating tools that update the timestamp every time they run. I curse them.
+13  A: 

I would say that you should avoid adding any generated code (or other artifacts) to source control. If the generated code is the same for the given input then you could just check out the versions you want to diff and generate the code for comparison.

+1  A: 

My preference is not to, but we do it anyway, and it's never caused a problem.

Mike Dunlavey
+12  A: 

I really don't think you should check them in.

Surely any change in the generated code is either going to be noise - changes between environments, or changes as a result of something else - e.g. a change in your DB. If your DB's creation scripts (or any other dependencies) are in source control then why do you need the generated scripts as well?

+5  A: 

The general rule is no, but if it takes time to generate the code (because of DB access, web services, etc.) then you might want to save a cached version in the source control and save everyone the pain.

Your tooling also need to be aware of this and handle checking-out from the source control when needed, too many tools decide to check out from the source control without any reason.
A good tool will use the cached version without touching it (nor modifying the time steps on the file).

Also you need to put big warning inside the generated code for people to not modify the file, a warning at the top is not enough, you have to repeat it every dozen lines.

Shay Erlichmen
+10  A: 

I call the DRY principle. If you already have the "source files" in the repository which are used to generate these code files at build time, there is no need to have the same code committed "twice".

Also, you might avert some problems this way if for example the code generation breaks someday.


I would say that yes you want to put it under source control. From a configuration management standpoint EVERYTHING that is used to produce a software build needs to be controlled so that it can be recreated. I understand that generated code can easily be recreated, but an argument can be made that it is not the same since the date/timestamps will be different between the two builds. In some areas such as government, they require a lot of times this is what's done.

Do you check in your object files (.o)?
+3  A: 

We don't store generated DB code either: since it is generated, you can get it at will at any given version from the source files. Storing it would be like storing bytecode or such.

Now, you need to ensure the code generator used at a given version is available! Newer versions can generate different code...

+8  A: 

Look at it this way: do you check your object files into source control? Generated source files are build artifacts just like object files, libraries and executables. They should be treated the same. Most would argue that you shouldn't be checking generated object files and executables into source control. The same arguments apply to generated source.

If you need to look at the historical version of a generated file you can sync to the historical version of its sources and rebuild.

Checking generated files of any sort into source control is analogous to database denormalization. There are occasionally reasons to do this (typically for performance), but this should be done only with great care as it becomes much harder to maintain correctness and consistency once the data is denormalized.

Laurence Gonsalves
+1  A: 

In some projects I add generated code to source control, but it really depends. My basic guideline is if the generated code is an intrinsic part of the compiler then I won't add it. If the generated code is from an external tool, such as SubSonic in this case, then I would add if to source control. If you periodically upgrade the component then I want to know the changes in the generated source in case bugs or issues arise.

As far as generated code needing to be checked in, a worst case scenario is manually differencing the files and reverting the files if necessary. If you are using svn, you can add a pre-commit hook in svn to deny a commit if the file hasn't really changed.

+1  A: 

In general, generated code need not be stored in source control because the revision history of this code can be traced by the revision history of the code that generated it!

However, it sounds the OP is using the generated code as the data access layer of the application instead of manually writing one. In this case, I would change the build process, and commit the code to source control because it is a critical component of the runtime code. This also removes the dependency on the code generation tool from the build process in case the developers need to use different version of the tool for different branches.

It seems that the code only needs to be generated once instead of every build. When a developer needs to add/remove/change the way an object accesses the database, the code should be generated again, just like making manual modifications. This speeds up the build process, allows manual optimizations to be made to the data access layer, and history of the data access layer is retained in a simple manner.

I disagree. If you make it a manual process, it *will* get broken, and no one will notice until it comes time to rerun it. If it's generated every day on your build servers (and every developers machine when the do a 'clean' build), you won't get surprised.
If the data access layer code is checked into source control, there should be no surprises because people will be forced to update code. If someone happens to change the version of the code generation tool on the build machine and the developers have old versions on their development machine (different branch of code, perhaps), then there will be headaches. I'm suggesting that he removes the code generation step out of the build process, since they are not the maintainers of the code generator.

I would leave generated files out of a source tree, but put it in a separate build tree.

e.g. workflow is

  1. checkin/out/modify/merge source normally (w/o any generated files)
  2. At appropriate occasions, check out source tree into a clean build tree
  3. After a build, checkin all "important" files ("real" source files, executables + generated source file) that must be present for auditing/regulatory purposes. This gives you a history of all appropriate generated code+executables+whatever, at time increments that are related to releases / testing snapshots, etc. and decoupled from day-to-day development.

There's probably good ways in Subversion/Mercurial/Git/etc to tie the history of the real source files in both places together.

Jason S
+1  A: 

It really depends. Ultimately, the goal is to be able to reproduce what you had if need be. If you are able to regenerate your binaries exactly, there is no need to store them. but you need to remember that in order to recreate your stuff you will probably need your exact configuration you did it with in the first place, and that not only means your source code, but also your build environment, your IDE, maybe even other libraries, generators or stuff, in the exact configuration (versions) you have used.

I have run into trouble in projects were we upgraded our build environment to newer versions or even to another vendors', where we were unable to recreate the exact binaries we had before. This is a real pain when the binaries to be deplyed depend on a kind of hash, especially in secured environment, and the recreated files somehow differ because of compiler upgrades or whatever.

So, would you store generated code: I would say no. The binaries or deliverables that are released, including the tools that you reproduced them with I would store. And then, there is no need to store them in source control, just make a good backup of those files.

"that not only means your source code, but also your build environment, your IDE, maybe even other libraries, generators or stuff"\nThat's all stuff I would check in. If you build your compiler from source on every developer machine as part of the same build as your apps (ie: you type 'make' once), check in the source. If you don't, then check in the binaries
Norman Ramsey

If it is part of the source code then it should be put in source control regardless of who or what generates it. You want your source control to reflect the current state of your system without having to regenerate it.

"without having to regenerate it." so you check in compiled binaries? Do you also check in a version of the target platform as well? That strategy won't scale well. :(
And that gets me a down vote?? Of course you don't check in compiled binaries (unless they are from third party libraries) since they can be regenerated from your source code. I was talking about having to regenerate the generated code not the binaries. But hey, if you want to misinterpret what I'm saying then go right ahead...
This answer wasn't worth a downvote! At the very least, it seems sound to put generated code in SC (maybe in a clearly identified place) so that at the very least you can compare the hash of the code used to generate the object against the new code you're going to generate for a new build. Interesting how polarizing this question is.
+5  A: 

No, for three reasons.

  1. Source code is everything necessary and sufficient to reproduce a snapshot of your application as of some current or previous point in time - nothing more and nothing less. Part of what this implies is that someone is responsible for everything checked in. Generally I'm happy to be responsible for the code I write, but not the code that's generated as a consequence of what I write.

  2. I don't want someone to be tempted to try to shortcut a build from primary sources by using intermediate code that may or may not be current (and more importantly that I don't want to accept responsibility for.) And't it's too tempting for some people to get caught up in a meaningless process about debugging conflicts in intermediate code based on partial builds.

  3. Once it's in source control, I accept responsibility for a. it being there, b. it being current, and c. it being reliably integratable with everything else in there. That includes removing it when I'm no longer using it. The less of that responsibility the better.

le dorfier

Absolutely have the generated code in source control, for many reasons. I'm reiterating what a lot of people have already said, but some reasons I'd do it are

  1. With codefiles in source control, you'll potentially be able to compile the code without using your Visual Studio pre-build step.
  2. When you're doing a full comparison between two versions, it would be nice to know if the generated code changed between those two tags, without having to manually check it.
  3. If the code generator itself changes, then you'll want to make sure that the changes to the generated code changes appropriately. i.e. If your generator changes, but the output isn't supposed to change, then when you go to commit your code, there will be no differences between what was previously generated and what's in the generated code now.
Joe Enos
And your code generator itself isn't in source control because...?
Jeffrey Hantin
@Jeffrey: I never said the code generator wasn't in source control.
Joe Enos
I know, I'm just teasing. :-) I've found that a lot of CodeDom-based code generators like to produce their output in random order, though, so for repeatability (and thus the ability to readily tell if the generated code changes from run to run) I've written a routine that sorts the contents of a `CodeCompileUnit` into a canonical order.
Jeffrey Hantin

There is a special case where you want to check in your generated files: when you may need to build on systems where tools used to generate the other files aren't available. The classic example of this, and one I work with, is Lex and Yacc code. Because we develop a runtime system that has to build and run on a huge variety of platforms and architectures, we can only rely on target systems to have C and C++ compilers, not the tools necessary to generate the lexing/parsing code for our interface definition translator. Thus, when we change our grammars, we check in the generated code to parse it.

+1  A: 

arriving a bit late ... anyway ...

Would you put compiler's intermediate file into source version control ? In case of code generation, by definition the source code is the input of the generator while the generated code can be considered as intermediate files between the "real" source and the built application.

So I would say: don't put generated code under version control, but the generator and its input.

Concretely, I work with a code generator I wrote: I never had to maintain the generated source code under version control. I would even say that since the generator reached a certain maturity level, I didn't have to observe the contents of generated code although the input (for instance model description) changed.


Looks like there are very strong and convincing opinions on both sides. I would recommend reading all the top voted answers, and then deciding what arguments apply to your specific case.

UPDATE: I asked this question since I really believed there is one definitive answer. Looking at all the responses, I could say with high level of certainty, that there is no such answer. The decision should be made based on more than one parameter. Reading the other answers could provide a very good guideline to the types of questions you should be asking yourself when having to decide on this issue.

Ron Harlev
+1  A: 

Leave it out.

If you're checking in generated files you're doing something wrong. What's wrong may differ, it could be that your build process is inefficient, or something else, but I can't see it ever being a good idea. History should be associated with the source files, not the generated ones.

It just creates a headache for people who then end up trying to resolve differences, find the files that are no longer generated by the build and then delete them, etc.

A world of pain awaits those who check in generated files!


The job of configuration management (of which version control is just one part) is to be able to do the following:

  • Know which changes and bug fixes have gone into every delivered build.
  • Be able to reproduce exactly any delivered build, starting from the original source code. Automatically generated code does not count as "source code" regardless of the language.

The first one ensures that when you tell the client or end user "the bug you reported last week is fixed and the new feature has been added" they don't come back two hours later and say "no it hasn't". It also makes sure they don't say "Why is it doing X? We never asked for X".

The second one means that when the client or end user reports a bug in some version you issued a year ago you can go back to that version, reproduce the bug, fix it, and prove that it was your fix has eliminated the bug rather than some perturbation of compiler and other fixes.

This means that your compiler, libraries etc also need to be part of CM.

So now to answer your question: if you can do all the above then you don't need to record any intermediate representations, because you are guaranteed to get the same answer anyway. If you can't do all the above then all bets are off because you can never guarantee to do the same thing twice and get the same answer. So you might as well put all your .o files under version control as well.

Paul Johnson

The correct answer is "It Depends". It depends upon what the client's needs are. If you can roll back code to a particular release and stand up to any external audit's without it, then you're still not on firm ground. As dev's we need to consider not just 'noise', pain and disk space, but the fact that we are tasked with the role of generating intellectual property and there may be legal ramifications. Would you be able to prove to a judge that you're able to regenerate a web site exactly the way a customer saw it two years ago?

I'm not suggesting you save or don't save gen'd files, whichever way you decide if you're not involving the Subject Matter Experts of the decision you're probably wrong.

My two cents.

James Fleming