views:

774

answers:

15

I have encountered this topic lately and couldn't understand why they are needed.

Can you explain why I should use them in my projects and how they can ease my life.

Examples will be great, and where from I can learn this topic little more.

+2  A: 

How 'bout an example of a good use of a code generator?

This uses t4 templates (a code generator built in to visual studio) to generate compressed css from .less files: http://haacked.com/archive/2009/12/02/t4-template-for-less-css.aspx

Basically, it lets you define variables, real inheritance, and even behavior in your style sheets, and then create normal css from that at compile time.

Joel Coehoorn
+13  A: 

Well, it's either:

  • you write 250 classes, all pretty much the same, but slightly different, e.g. to do data access; takes you a week, and it's boring and error-prone and annoying

OR:

  • you invest 30 minutes into generating a code template, and let a generation engine handle the grunt work in another 30 minutes

So a code generator gives you:

  • speed
  • reproducability
  • a lot less errors
  • a lot more free time! :-)

Excellent examples:

  • Linq-to-SQL T4 templates by Damien Guard to generate one separate file per class in your database model, using the best kept Visual Studio 2008 secret - T4 templates

  • PLINQO - same thing, but for Codesmith's generator

and countless more.....

marc_s
"you write 250 classes, all pretty much the same, but slightly different" I can't help but think that if I ever find myself in this position, I'd be "doing it wrong". Somewhere along the way I should have stepped back and tried to figure out why I'm repeating myself so much and built things better (make a parent class, or something).
MGOwen
@MGOwen: but inheritance doesn't always help. It's not always possible to factor out common code in that way. Consider the case of a data layer where the difference between the classes representing two entities will include the set of columns for each entity. You can't factor that out using inheritance.
John Saunders
@MGOwen: what if you have a database of 250 tables, and you need to write some sort of a generic data access repository? All the tables are basically the same - columns and rows - but the columns vary from table to table. Not sure if inheritance could help much in such a case - but you still need to write a mapping layer between 250 database tables and some object model in memory...
marc_s
I don't understand why there is a direct mapping between your data tables and your object model. You should be using stored procedures/routines to further abstract your data... meaning multiple tables will use one procedure etc... Also, you should be trying to make your tables as generic as possible... This way one table/stored procedure can serve multiple object representations. Using all of these combined, you should be able to cut down on the number of tables/classes you will need.
Polaris878
Polaris878: It's been years since I've had the luxury of defining my own database. For the most part I have to interoperate with other vendors' systems where I have no ability to change their schema or make stored procedures. Why wouldn't I use codegen?
Gabe
@gabe: I think he's saying your object model should be an abstraction of the database, and not map to it directly.
John Saunders
John: I don't understand why this abstraction shouldn't be generated from, say, an XML file that describes how the schema should map to objects.
Gabe
You answer ignores the maintenance burdens that code generation adds to any solution, but like all solutions there is a time and place where code generation is appropriate - I just don't think there are as many scenarios as some people seem to think there are.
Neal
@Neal: what about the maintenance burden if you have to maintain 250 manually created files? Doesn't sound like fun to me either.....
marc_s
The burden is created because you need to maintain the system in multiple places and unless your templating tool is very very well written and your processes for using it are well understood you add more friction than you take away. It is rare that, once created, you would need to touch all 250 files to fix a problem or add a feature but using code-generation you cannot take a code-first approach, you must first alter the source of your model (e.g. database) and then regenerate all 250 classes. This adds friction and over time leads to workarounds and hacks that corrupt the original intent.
Neal
@Neal: the database is not the source of the model. The model is likely to be at a higher level of abstraction than the database. Part of the model will describe the mapping between the conceptual lavel and the logical level. The code generation tools will likely ignore the logical level and just generate code based on the conceptual model. I'm free to optimize the database as I like, while maintaining the conceptual model the developers and code generators use.
John Saunders
+2  A: 

Using GUI builders, that will generate code for you is a common practice. Thanks to this you don't need to manually create all widgets. You just drag&drop them and the use generated code. For simple widgets this really saves time (I have used this a lot for wxWidgets).

gruszczy
+3  A: 

Anytime you need to produce large amounts of repetetive boilerplate code, the code generator is the guy for the job. Last time I used a code generator was when creating a custom Data Access Layer for a project, where the skeleton for various CRUD actions was created based on an object model. Instead of hand-coding all those classes, I put together a template-driven code generator (using StringTemplate) to make it for me. The advandages of this procedure was:

  • It was faster (there was a large amount of code to generate)
  • I could regenerate the code in a whim in case I detected a bug (code can sometimes have bugs in the early versions)
  • Less error prone; when we had an error in the generated code it was everywhere which means that it was more likely to be found (and, as noted in the previous point, it was easy to fix it and regenerate the code).
Fredrik Mörk
A: 

If with code generator you also intend snippets, try the difference between typing ctor + TAB and writing the constructor each time in your classes. Or check how much time you earn using the snippet to create a switch statement related to an enum with many values.

Maurizio Reginelli
+12  A: 

At least you have framed the question from the correct perspective =)

The usual reasons for using a code generator are given as productivity and consistency because they assume that the solution to a consistent and repetitive problem is to throw more code at it. I would argue that any time you are considering code generation, look at why you are generating code and see if you can solve the problem through other means.

A classic example of this is data access; you could generate 250 classes ( 1 for each table in the schema ) effectively creating a table gateway solution, or you could build something more like a domain model and use NHibernate / ActiveRecord / LightSpeed / [pick your orm] to map a rich domain model onto the database.

While both the hand rolled solution and ORM are effectively code generators, the primary difference is when the code is generated. With the ORM it is an implicit step that happens at run-time and therefore is one-way by it's nature. The hand rolled solution requires and explicit step to generate the code during development and the likelihood that the generated classes will need customising at some point therefore creating problems when you re-generate the code. The explicit step that must happen during development introduces friction into the development process and often leads to code that violates DRY ( although some argue that generated code can never violate DRY ).

Another reason for touting code generation comes from the MDA / MDE world ( Model Driven Architecture / Engineering ). I don't put much stock in this but rather than providing a number of poorly expressed arguments, I'm simply going to co-opt someone elses - http://www.infoq.com/articles/8-reasons-why-MDE-fails.

IMHO code generation is the only solution in an exceedingly narrow set of problems and whenever you are considering it, you should probably take a second look at the real problem you are trying to solve and see if there is a better solution.

One type of code generation that really does enhance productivity is "micro code-generation" where the use of macros and templates allow a developer to generate new code directly in the IDE and tab / type their way through placeholders (eg namespace / classname etc). This sort of code generation is a feature of resharper and I use it heavily every day. The reason that micro-generation benefits where most large scale code generation fails is that the generated code is not tied back to any other resource that must be kept in sync and therefore once the code is generated, it is just like all the other code in the solution.

@John
Moving the creation of "basic classes" from the IDE into xml / dsl is often seen when doing big bang development - a classic example would be developers try to reverse engineer the database into a domain model. Unless the code generator is very well written it simply introduces an additional burden on the developer in that every time they need to update the domain model, they either have to context-switch and update the xml / dsl or they have to extend the domain model and then port those changes back to the xml / dsl ( effectively doing the work twice).

There are some code generators that work very well in this space ( the LightSpeed designer is the only one I can think of atm ) by acting as the engine for a design surface but often these code generators generate terrible code that cannot be maintained (eg winforms / webforms design surfaces, EF1 design surface) and therefore rapidly undo any productivity benefits gained from using the code generator in the first place.

Neal
Does your domain model have no repetitive code? Really? You don't use the same code for every string property?
John Saunders
@Neal: How does NHibernate, etc. remove the need for a code generation? Aren't you just defining classes in that case and generating the scripts to create the database tables? Or are you manually coding somewhere less than 250 classes and 250 tables which share a lot of characteristics in common (i.e. violating DRY)?
Michael Maddox
@John - not sure what you are getting at here, your argument lacks substance.@Michael - I model only what I need and if I need a large domain I would model that but storage concerns should not leak into domain concerns which is what happens when you codegen from a db. NH + FNH (or ConfORM) allows you to express the model and mappings in code and is a very powerful approach esp when coupled with convention over configuration. NH does allow generation of the DDL to create the database and this is one of the few "appropriate" cases for code (SQL) generation.
Neal
@Neal: I didn't make an argument, I asked a question that you didn't answer. I also didn't say anything about codegen from a DB. Code gen could come from an XML or other representation of your domain model.
John Saunders
@John Your question makes no sense and your argument is fallacious - just because the basic structure of a class is the same as all others is not a good reason to generate code. When you generate code you merely shift the burden of maintenance into a less flexible / more complicated medium - the time saved is rarely repaid over the life of the system.
Neal
@Neal: I have no idea what bad code generation tools you've used, but there is no reason for them to be inflexible. Duplicate code structure is a perfect reason to generate code, assuming the differences between the classes can be factored out into declarative data.
John Saunders
Code generation is more than just for data access. The ASP.NET engine for processing .aspx/.ascx files is a code generator. Are you saying you wouldn't use those and just `Response.Write("<b>blah</b>")` all your pages?
Dean Harding
@codeka Templating engines for generating presentation code are great places to use code generators, but my answer is focussed on generating business code rather than presentation code. As an aside - I don't use Response.Write, if I need something in a webform I use placeholders and literals.
Neal
@John It's hard to argue with you as I have yet to see a clearly stated rationale as to why you believe using code generation is better than not using it so I have no point of purchase against which to prepare a counter-argument.
Neal
@Neal: I have made no such blanket statement. I've been stating that code generation can be valuable for creating the basic classes from your domain model (in XML or some DSL). You can then extend the domain classes with domain logic, but creating the basic structure of the classes is a mechanical process, and could be performed mechanically.
John Saunders
@John See the additional points I added in the edit of my original answer
Neal
@Neal: I guess I don't know why you don't have tools that can generate code directly from your domain model. Even if it's necessary to translate the domain model into XML or some other format, the entire process should be fairly automatic.
John Saunders
@Neal: perhaps we're talking about very different things and using the same terms. What tools are you using for your domain models? When I think of domain models and code generation, I'm thinking about either models built on the DSL Toolkit (now the Visualization and Modeling SDK), part of the Visual Studio SDK; otherwise, built in Sparx Enterprise Architect. The VS tool can generate code or other text-based artifacts through T4 templates; EA follows MDA and uses "transforms". They both have codegen very close to the model.
John Saunders
@John My domain is modeled in code and specified using tests. Your tools show you follow the "MSDN way" whereas I am firmly in the alt.net camp (if you don't understand the references look up Karl Seguin's Foundations of Programming). I seriously doubt I can win an argument with you because your mindset is firmly stuck in MDA-land whereas I look at MDA with pity and wish people would recognise that building a system requires more than just modelling a bunch of entities.
Neal
@Neal: Glad we got to the bottom of that. I wouldn't say you were using modeling at all. I've never heard of a model that is code - do you have a reference that uses the word "model" in this way? As I've understood the term, a model and code are meant to have different levels of abstraction.
John Saunders
@Neal: *"ZOMG asp.net/nhibernate guru with 983 rep on SO, he knowz it allz and has made answers to every comment, including 3l33t speak once he got pwned"* (code != model).
Webinator
My comment was poor and has been deleted (commenting while grumpy is bad karma). It is hard to write a counter-argument when the core belief is not clearly stated. MDA was not referred to until the 13th comment. If MDA have been raised earlier I would have better understood the POV and therefore responed with comments that were more in line with that POV. I gave up because for an argument to be won requires that both sides have strong opinions weakly held. I never claim to be a guru I merely post what I understand and accept corrections to my understanding when it is shown to be flawed.
Neal
@Neal: what do you think NHibernate / ActiveRecord / LightSpeed / [pick your orm] do? They generate code for you. It is not important that they emit code into dynamic assemblies, the question is — is it cheaper to shoehorn your problem into an existing ORM or to write your own generators?
Anton Tykhyy
Thank you for distilling the essence of the argument to it's simplest form. Code generation is essentially the transformation of one abstraction to another, the differences lie in how different the level of abstraction is between the source and target are and in how explicit the code generation step is. With ORMs the levels are very close and the code generation is implicit ( and explicitly one way - code to sql ). This is important because of *when* the sql is generated. Explicit one way code generation adds a burden to any system in that the system cannot be changed without regenerating it.
Neal
@Neal: I don't usually say this sort of thing but, you really need to get some experience of .NET. Most code generators in the .NET space generate the code into partial classes. These can be extended by adding another part of the class, in a separate file that is not touched by subsequent code generation. Also, not all ORMs produce a direct mapping to the database. In some (.NET's Entity Framework for example), one models at a conceptual level then maps to the database. Code is only generated for the conecptual level, so db changes don't affect the consumers of the generated code.
John Saunders
John, I am well aware of partial classes and I have used EF, both leave a sour taste in my mouth ( although EF4 sounds like it may be a step up in the right direction ). I am a developer, that means I convert the clients wishes into specifications and use those specs to write code that does exactly what is reqd - no more and no less. I see MDA as an ivory tower approach for non-coding architects.
Neal
+1  A: 

If you're paid by LOC and work for people who don't understand what code generation is, it makes a lot of sense. This is not a joke, by the way - I have worked with more than one programmer who employs this technique for exactly this purpose. Nobody gets paid by LOC formally any more (that I know of, anyway), but programmers are generally expected to be productive, and churning out large volumes of code can make someone look productive.

As an only slightly tangential point, I think this also explains the tendency of some coders to break a single logical unit of code into as many different classes as possible (ever inherit a project with LastName, FirstName and MiddleInitial classes?).

MusiGenesis
+3  A: 

Really, when you are using almost any programming language, you are using a "code generator" (except for assembly or machine code.) I often write little 200-line scripts that crank out a few thousand lines of C. There is also software you can get which helps generate certain types of code (yacc and lex, for example, are used to generate parsers to create programming languages.)

The key here is to think of your code generator's input as the actual source code, and think of the stuff it spits out as just part of the build process. In which case, you are writing in a higher-level language with fewer actual lines of code to deal with.

For example, here is a very long and tedious file I (didn't) write as part of my work modifying the Quake2-based game engine CRX. It takes the integer values of all #defined constants from two of the headers, and makes them into "cvars" (variables for the in-game console.)
http://meliaserlow.dyndns.tv:8000/alienarena/lua_source/game/cvar_constants.c

Here is the short Bash script which generated that code at compile-time:
http://meliaserlow.dyndns.tv:8000/alienarena/lua_source/autogen/constant_cvars.sh

Now, which would you rather maintain? They are both equivalent in terms of what they describe, but one is vastly longer and more annoying to deal with.

Max E.
+1  A: 

For domain-driven or multi-tier apps, code generation is a great way to create the initial model or data access layer. It can churn out the 250 entity classes in 30 seconds ( or in my case 750 classes in 5 minutes). This then leaves the programmer to focus on enhancing the model with relationships, business rules or deriving views within MVC.

The key thing here is when I say initial model. If you are relying on the code generation to maintain the code, then the real work is being done in the templates. (As stated by Max E.) And beware of that because there is risk and complexity in maintaining template-based code.

If you just want the data layer to be "automagically created" so you can "make the GUI work in 2 days", then I'd suggest going with a product/toolset which is geared towards the data-driven or two-tier application scenario.

Finally, keep in mind "garbage in=garbage out". If your entire data layer is homogeneous and does not abstract from the database, please please ask yourself why you are bothering to have a data layer at all. (Unless you need to look productive :) )

Jennifer Zouak
"750 classes in 5 minutes", now that's a large domain model!
Mark Rogers
+1  A: 

I'm actually adding the finishing touches to a code generator I'm using for a project I've been hired on. We have a huge XML files of definitions and in a days worth of work I was able to generate over 500 C# classes. If I want to add functionality to all the classes, say I want to add an attribute to all the properties. I just add it to my code-gen, hit go, and bam! I'm done.

It's really nice, really.

Joel
+3  A: 

The canonical example of this is data access, but I have another example. I've worked on a messaging system that communicates over serial port, sockets, etc., and I found I kept having to write classes like this over and over again:

public class FooMessage
{
    public FooMessage()
    {
    }

    public FooMessage(int bar, string baz, DateTime blah)
    {
        this.Bar = bar;
        this.Baz = baz;
        this.Blah = blah;
    }

    public void Read(BinaryReader reader)
    {
        this.Bar = reader.ReadInt32();
        this.Baz = Encoding.ASCII.GetString(reader.ReadBytes(30));
        this.Blah = new DateTime(reader.ReadInt16(), reader.ReadByte(),
            reader.ReadByte());
    }

    public void Write(BinaryWriter writer)
    {
        writer.Write(this.Bar);
        writer.Write(Encoding.ASCII.GetBytes(
            this.Baz.PadRight(30).Substring(0, 30)));
        writer.Write((Int16)this.Blah.Year);
        writer.Write((byte)this.Blah.Month);
        writer.Write((byte)this.Blah.Day);
    }

    public int Bar { get; set; }
    public string Baz { get; set; }
    public DateTime Blah { get; set; }
}

Try to imagine, if you will, writing this code for no fewer than 300 different types of messages. The same boring, tedious, error-prone code being written, over and over again. I managed to write about 3 of these before I decided it would be easier for me to just write a code generator, so I did.

I won't post the code-gen code, it's a lot of arcane CodeDom stuff, but the bottom line is that I was able to compact the entire system down to a single XML file:

<Messages>
    <Message ID="12345" Name="Foo">
        <ByteField Name="Bar"/>
        <TextField Name="Baz" Length="30"/>
        <DateTimeField Name="Blah" Precision="Day"/>
    </Message>
    (More messages)
</Messages>

How much easier is this? (Rhetorical question.) I could finally breathe. I even added some bells and whistles so it was able to generate a "proxy", and I could write code like this:

var p = new MyMessagingProtocol(...);
SetFooResult result = p.SetFoo(3, "Hello", DateTime.Today);

In the end I'd say this saved me writing a good 7500 lines of code and turned a 3-week task into a 3-day task (well, plus the couple of days required to write the code-gen).

Conclusion: Code generation is only appropriate for a relatively small number of problems, but when you're able to use one, it will save your sanity.

Aaronaught
I've used Google's protobuf library (http://code.google.com/p/protobuf/) for generating extremely high-performance message serialization/deserialization code. Would highly recommend it for situations such as this.
sblom
@sblom: protobuf and protobuf-net are great when you need to put together a messaging protocol from scratch - not so great when you need to adhere to an existing protocol that some hardware dudes threw together. ;)
Aaronaught
I applaud your conclusion however the problem could also be solved by building something that could read meta-data associated with the message and use that to intelligently deserialise the binary stream into a message. This approach would significantly reduce the amount of actual code in the system. This is a classic example of how, by re-evaluating the core of the problem, you can create a simpler solution sans code-generation.
Neal
@Neal: Great, that eliminates the `Read` and `Write` methods but **I still have to create 300 message classes** and now add a whole bunch of serialization attributes/configuration. Your solution would have eliminated maybe 20% of the work, not the 90% that I was able to eliminate with code-gen.
Aaronaught
I was thinking something like this http://gist.github.com/348784 which has about the same content as your xml based solution (where you still effectively wrote 300 messages).
Neal
I remember doing something similar for C to Cobol conversions with validation functions. Meta-data reading was out of choice as it had to run as quickly as possible on several platforms.
weismat
@Neal: Now extend your solution to handle sub-messages, polymorphic messages (of *course* they reused message IDs) message sequences, little-endian/big-endian encodings, custom text encodings, checksums, and everything else that goes into a real-life messaging system, and actually write the serializer, not just an example message class. And you *still* have a solution that is significantly more verbose, less transparent, harder to extend (i.e. using partial classes), more error-prone (can't enforce specific types i.e. with XSD), and only 50% complete (no request/response pairings or proxies).
Aaronaught
Programmers tend to be heavily optimistic - they look at any complex system, make a dozen simplifying assumptions, and go "bah, I could do that!" Don't think for a minute that I didn't thoroughly investigate the possibility of a Reflection-based serializer. Oh, and keep in mind that this has to support the .NET 2.0 Framework, so no Linq or expression trees or any of that fun stuff allowed.
Aaronaught
My "simplifying assumptions" were based on the content of the xml file you provided. The xml snippet provides no indication of any of the additional details that you raised therefore where are you injecting the additional context required to allow you to handle these concerns you raise?
Neal
@Neal: There *is* no additional context. The code gen infers it. Polymorphic messages use `IsCustom = "True"`, these generate a partial class, and sequence message use `IsSequence = "True"`, so that the codegen uses an enumerable/list instead. No further information is needed to generate the proxy, the syntax is a good deal more succinct than code, XSD provides compile-time validation (which an attribute mapping could not possibly do), and on top of that, generated serialization code is several orders of magnitude more efficient than Reflection, which is important for a low-level protocol.
Aaronaught
Are the details of the protocol publicly available? I am curious about this beast you have tamed.
Neal
@Neal: They are not "secret", but I am bound by more than one NDA, which was probably evident from the anonymized code. And to be clear, I'm not claiming to have performed any Herculean task; my assertion is simply that it was faster to develop a code-gen than it would have been to develop a complete serialization library. I think you may be overestimating the difficulty of transforming XML into CodeDom; the entire tool is not more than 500 LOC or so.
Aaronaught
Possibly - but I'm now more curious about the protocol than the task itself because it is an interesting problem to solve and if it were a public protocol I could then play around in my own time =)
Neal
@Neal: Well I am sure I could concoct something that's "close enough" to the original without exposing any significant details of the technology, and perhaps post it as a "how would you implement this" CW - of course, what you would be lacking is an endpoint to test it with, and it wouldn't be very enlightening as to the real-world applications. If your interest in academic, though... sure, stick around, I'll post something when I have some time. Incidentally, given the option of using .NET 3.5, a *really* quick solution would be to just write a custom encoding/transport for WCF.
Aaronaught
+1  A: 

A code generator is useful if:

  1. The cost of writing and maintaining the code generator is less than the cost of writing and maintaining the repetition that it is replacing.

  2. The consistency gained by using a code generator will reduce errors to a degree that makes it worthwhile.

  3. The extra problem of debugging generated code will not make debugging inefficient enough to outweigh the benefits from 1 and 2.

Larry Watanabe
+1  A: 

Everyone talks here about simple code generation, but what about model-driven code generation (like MDSD or DSM)? This helps you move beyond the simple ORM/member accessors/boilerplate generators and into code generation of higher-level concepts for your problem domain.

It's not productive for one-off projects, but even for these, model-driven development introduces additional discipline, better understanding of employed solutions and usually a better evolution path.

Like 3GLs and OOP provided an increase in abstraction by generating large quantities of assembly code based on a higher level specification, model-driven development allows us to again increase the abstraction level, with yet another gain in productivity.

MetaEdit+ from MetaCase (mature) and ABSE from Isomeris (my project, in alpha, info at http://www.abse.info) are two technologies on the forefront of model-driven code generation.

What is needed really is a change in mindset (like OOP required in the 90's)...

Rui Curado
@Rui: although it's easy to look at your profile and see your association with ABSE, I recommend you explicitly state that in your answer so you will not be considered a spammer.
John Saunders
Sorry, John. You're right. I usually refer to ABSE as "my project" (check other posts of mine), but somehow I missed that this time. I've edited the answer.
Rui Curado
+1  A: 

There are many uses for code generation.

Writing code in a familiar language and generating code for a different target language.

  • GWT - Java -> Javascript
  • MonoTouch - C# -> Objective-C

Writing code at a higher level of abstraction.

  • Compilers
  • Domain Specific Languages

Automating repetitive tasks.

  • Data Access Layers
  • Initial Data Models

Ignoring all preconceived notions of code-generation, it is basically translating one representation (usually higher level) to another (usually lower level). Keeping that definition in mind, it is a very powerful tool to have.

The current state of programming languages has by no means reached its full potential and it never will. We will always be abstracting to get to a higher level than where we stand today. Code generation is what gets us there. We can either depend on the language creators to create that abstraction for us, or do it ourselves. Languages today are sophisticated enough to allow anybody to do it easily.

Anurag
A: 

Here's some heresy:

If a task is so stupid that it can be automated at program writing time (i.e. source code can be generated by a script from, let's say XML) then the same can also be done at run-time (i.e. some representation of that XML can be interpreted at run-time) or using some meta-programming. So in essence, the programmer was lazy, did not attempt to solve the real problem but took the easy way out and wrote a code generator. In Java / C#, look at reflection, and in C++ look at templates

Carsten Kuckuk
although i very much like reflection and metaprogramming, there's usually a performance hit associated in doing it at run-time
Anurag
true. But with n-core CPUs running at 3+GHz it doesn't matter that much anymore on the client side.
Carsten Kuckuk
in a development environment on the client side, yes it's perfectly acceptable.
Anurag
@Carsten: it depends on the platform. If I tried this "runtime" thing in .NET, I'd have my framework generate and compile the code at runtime, after which I'd be running compiled code.
John Saunders