views:

180

answers:

3

For those of you who are not familiar with D string mixins, they're basically compile-time evals. You can take any compile time string (whether a literal or generated by template metaprogramming or compile time function evaluation) and compile it as code. If you use a simple string literal, it's basically compiler-automated copy-paste.

Would you consider it an antipattern to use string mixins of literals as a means of simple code reuse where other methods of factoring don't quite fit? On the one hand, it's basically compiler-automated literal copy-and-paste, meaning that once mixed in the instances have nothing whatsoever to do with each other. Bad Things will happen (though at compile time, not at runtime) if a symbol in the string mixin collides with a symbol in the mixed in scope. It's relatively unstructured in that one can, for example, mixin a string into the middle of a function that will work if and only if variables in the scope are named according to a certain convention. Mixins can also declare variables that outer scopes can then use as they see fit.

On the other hand, because the copy-and-paste is compiler automated, there is a single point of truth for the code in question at the source level and if it needs to be modified, it only needs to be modified in one place, and everything stays in sync. String mixins also vastly simplify reusing code that's very hard to factor in any other way and would otherwise have a very high probability of being cut-and-pasted manually.

+6  A: 

All the criticisms you raised are true.

Regardless, it's still superior to manual copypaste.

Actually, I've got something similar running in my tools library, string table expansion. Example code, from a path tracer's dynamic value implementation:

  T to(T)() {
    static if (!is(T == Scope)) {
      T value;
      if (flatType == FlatType.ScopeValue) value = sr.value().to!(T);
    }
    const string Table = `
                 | bool          | int         | string               | float   | Scope
      -----------+---------------+-------------+----------------------+---------+----------
      Boolean    | b             | b           | b?q{true}p:q{false}p | ø       | ø
      Integer    | i != 0        | i           | Format(i)            | i       | ø
      String     | s == q{true}p | atoi(s)     | s                    | atof(s) | ø
      Float      | ø             | cast(int) f | Format(f)            | f       | ø
      ScopeRef   | !!sr          | ø           | (sr?sr.fqn:q{(null:r)}p) | ø   | sr
      ScopeValue | value         | value       | value                | value   | sr`;
    mixin(ctTableUnrollColMajor(Table,
      `static if (is(T == $COL))
        switch (flatType) {
          $BODY
          default: throw new Exception(Format("Invalid type: ", flatType));
        }
      else `,
      `case FlatType.$ROW:
        static if (q{$CELL}p == "ø")
          throw new Exception(q{Cannot convert $ROW to $COL: }p~to!(string)~q{! }p);
        else return $CELL;
      `
    ).litstring_expand() ~ `static assert(false, "Unsupported type: "~T.stringof); `);
  }

I'm sure it's easy to see what a horrible, redundant mess of nested ifs and case statements that would be without string mixins - this way, all the ugliness is concentrated at the bottom, and the actual behavior of the function is easy to read off at a glance.

FeepingCreature
*sniff* I heart you, Feep. This one really is one of your more awesome ideas.
DK
Every credibly semi-neat trick like that has three or four ideas that are just plain _horrible_ - like the CTFE text-based state machine graph parser buried deep in tools.base :)
FeepingCreature
+2  A: 

While other, more elegant solutions may be better to use if you can, string mixins can be extremely useful. They allow for both code re-use and code generation. They're checked at compile time. The code that results is exactly the same as if you'de written it yourself by hand, so it's not any less safe than if you had written it yourself by hand.

The problem with string mixins is that they're harder to control than hand-written code in the sense that it's not physically laid out in your source in the same manner with line numbers clearly traceable to errors, and it may be harder to debug. For instance, take hello world with a string mixin:

import std.stdio;

void main()
{
    mixin(hello());
}

string hello()
{
    return "
    writeln(\"hello world\");
";
}

If we were to remove the semicolon after writeln(), then the error we got would be

d.d(7): found 'EOF' when expecting ';' following statement

The mixin is done on line 5. Line 7 is a blank line. So, the line number is of limited usefulness here. Now, this mixin is short enough that we could have put it on a single line and gotten it to say that the error was on the same line as the mixin, but with more complicated mixins, that obviously won't work. So, by using a string mixin, your ability to figure out where an error is is impaired. If the code is generated using CTFE, then it would become that much more difficult to figure out exactly what the code even looks like in order to figure out what's wrong with it. It's a lot like figuring out what code a C-style macro turns into, except that it could be worse because they could be generated rather than a direct replacement. However, they don't replace except where you explicitly tell them to, so they're much safer than C-style macros.

String mixins are totally safe, and there's nothing particularly wrong with them, but they do make maintenance harder in some ways. The corresponding hand-written code would be easier to debug. However, string mixins are powerful enough that they can do a lot of code generation for you and save you a lot of maintainence costs in that sense, and they allow you to re-use code, which can be a big maintanence gain as well.

So, whether using a string mixin is a good idea in a particular situation depends on that situation. I don't see anything particularly wrong with them, and I certainly wouldn't call them an anti-pattern, but there are both pros and cons to using them such that whether they're a good idea depends on what you're doing. In many cases, there are more elegant, cleaner solutions which would be better. In others, they're exactly what the doctor ordered.

Personally, I think that they're fantastic if you're looking to generate code, saving yourself the effort of having to write that code by hand, and possibly making it easier to generate correct code for a variety of situations and avoiding risking creating new bugs like you might have had you written it yourself in each of those places where you used the mixin. It also is one of the ways to just outright re-use code without having to worry about the cost of a function call or issues with the limits of single-inheritance or anything else that makes code re-use by calling functions or inheritance harder. You're simply copying and pasting the code into each place in a manner which makes it so that if you change the code, the changes will be properly pasted everywhere without you having to worry about tracking them all down like if you had hand copy and pasted.

So, use string mixins where appropriate, and it's probably best not to use them if they're not needed, but there's nothing really wrong with using them.

Jonathan M Davis
I found that the only way to ameliorate the line number problem was to insert `#line` directives into the string mixin source. That way, errors usually fell in roughly the right spot; made debugging an entire binding library built out of CTFE-generated string mixins less of a complete and utter nightmare.
DK
You can just write a CTFE function to remove newlines, you know. ^^
FeepingCreature
@FeepingCreature LOL. Cute. That would help with the line in the error message indicating the line that the string is mixed in on as well as error messages later in the file, but it wouldn't help with finding the error in the mixin itself. Of course, since you don't even necessarily know what the code being mixed in looks like, knowing which line number in it is bad doesn't necessarily help you anyway. I suppose that the moral of the story is that string mixins that work are great, but buggy ones can be a pain to fix.
Jonathan M Davis
Weeell .. point taken, but principally speaking once you know the mixin you can just put the newlines back in and subtract line numbers. :-)
FeepingCreature
However, given how mixing in strings tends to mess up the line numbers of errors for the rest of the file (even if the mixin is fine), it might be good practice to mixout the newlines in the mixin just so that the other error messages are sensible.
Jonathan M Davis
+1  A: 

String mixin is like goto: it should be avoided where ever possible and should be used wherever required.

BCS
There's some truth to that, but I'd generally avoid goto like the plague while I'd be quite willing to use string mixins in a variety of circumstances. Now, if there's a better solutions than string mixins, I'll definitely take it, but I'm perfectly willing to put string mixins in my code while I'll think hard and definitely reconsider before actually putting a goto in my code.
Jonathan M Davis
@Jonathan M Davis: It sounds to me like you are more reluctant to use `goto` than I am. OTOH there are very few cases where you have to use `goto` in D what with labeled break and continue.
BCS
@BCS I'd essentially never use goto in any language unless I had no other choice to get the performance I needed. I don't really like labeled continues or labeled breaks either, but I'd be willing to use them if it made good sense in a particular situation. But straight up goto? Pretty much no chance of that. Unless you *really* need the performance, there's always a better way. So, yeah, it looks like I'm more reluctant to use goto than you are.
Jonathan M Davis
@Jonathan: What's wrong with labeled continues/breaks? How else (besides goto) are you supposed to break out of a nested control structure?
dsimcha
Oh, they can be quite useful. But code is generally much cleaner if it's written in a way that doesn't require jumping around like that. I would argue that in most cases, code that requires a labeled continue or break could be rewritten in a cleaner manner which didn't require them (and without stuff like ugly flags which indicate that you should just fall out of the set of loops that you're in - labeled continues and breaks are certainly better than *that* sort of code). So, I think that the language should have them, but if your code uses them, it should be refactored if at all possible.
Jonathan M Davis
@Jonathan: @dsimcha: The simplest way to avoid labeled breaks is to wrap the outer loop in a function and do a return. OTOH that (if your lucky) ends up being inlined back to the same thing. @Jonathan: Could you give an example of what kind of refactoring you are thinking of?
BCS
@BCS I'd really have to look at specific examples to say - though refactoring some of it out into a function is one way. A lot of times, it just means going about the problem slightly differently or breaking it up differently. Personally, I rarely run into the need to break to an outer loop, and when I have, I've usually found ways to refactor out the need. But it's infrequent enough that I don't have any good examples off the top of my head.
Jonathan M Davis
I've rarely run into it either but I never felt any need to re-factor it away.
BCS
@BCS It's more of an issue in C++ than D because you have to use a flag variable of some kind in C++, since there is no labeled continue or break in C++. I wouldn't be quite as concerned about refactoring it out in D as in C++, but it would probably still bother me enough to refactor it if I could reasonably do so.
Jonathan M Davis