views:

409

answers:

6

It is "common knowledge" that source filters are bad and should not be used in production code.

When answering a a similar, but more specific question I couldn't find any good references that explain clearly why filters are bad and when they can be safely used. I think now is time to create one.

  1. Why are source filters bad?
  2. When is it OK to use a source filter?
+14  A: 

Why source filters are bad:

  1. Nothing but perl can parse Perl. (Source filters are fragile.)
  2. When a source filter breaks pretty much anything can happen. (They can introduce subtle and very hard to find bugs.)
  3. Source filters can break tools that work with source code. (PPI, refactoring, static analysis, etc.)
  4. Source filters are mutually exclusive. (You can't use more than one at a time -- unless you're psychotic).

When they're okay:

  1. You're experimenting.
  2. You're writing throw-away code.
  3. Your name is Damian and you must be allowed to program in latin.
  4. You're programming in Perl 6.
Michael Carman
+9  A: 

I don't like source filters because you can't tell what code is going to do just by reading it. Additionally, things that look like they aren't executable, such as comments, might magically be executable with the filter. You (or more likely your coworkers) could delete what you think isn't important and break things.

Having said that, if you are implementing your own little language that you want to turn into Perl, source filters might be the right tool. However, just don't call it Perl. :)

brian d foy
In that case, can we implement Perl 6 as a source filter? ;-)
Bill
take a look at some of the Perl6::* modules on cpan, a few of them are source filters :-)
Eric Strom
Perl 6 is explicitly designed to have user-extensible syntax; one of its mottos is "All's fair if you predeclare"
ysth
You guys missed the bit about "just don't call it Perl" :)
Bill
+12  A: 

Only perl can parse Perl (see this example):

@result = (dothis $foo, $bar);

# Which of the following is it equivalent to?
@result = (dothis($foo), $bar);
@result = dothis($foo, $bar);

This kind of ambiguity makes it very hard to write source filters that always succeed and do the right thing. When things go wrong, debugging is awkward.

After crashing and burning a few times, I have developed the superstitious approach of never trying to write another source filter.

I do occasionally use Smart::Comments for debugging, though. When I do, I load the module on the command line:

$ perl -MSmart::Comments test.pl

so as to avoid any chance that it might remain enabled in production code.

See also: Perl Cannot Be Parsed: A Formal Proof

Sinan Ünür
According to `$ perl -MO=Deparse -e '@result = (dothis $foo, $bar)'` it parses as `@result = ($foo->dothis, $bar);` Talk about ambiguity. If we predeclare `sub dothis` with no prototype or a prototype of `($$)` or `(@)` it parses as `@result = dothis($foo, $bar)`. It only parses as `@result = (dothis($foo), $bar)` if we declare it with a prototype of `($)`.
Chris Lutz
@Chris Lutz: Yup, I remember doing the same thing when I first saw that snippet in the PPI docs. It is a very clever example.
Sinan Ünür
+4  A: 

The problem I see is the same problem you encounter with any C/C++ macro more complex than defining a constant: It degrades your ability to understand what the code is doing by looking at it, because you're not looking at the code that actually executes.

bradheintz
What about the macro `#define ARRAY_SIZE(x) (sizeof(x)/sizeof((x)[0]))`? Does that degrade your ability to understand what the code is doing just by looking at it?
Chris Lutz
@Chris: in that case, I would *far* rather you simply define an inline function than a macro.
Ether
@Ether: `sizeof` won't work as an inline function. Chris's macro has to be a macro.
Kinopiko
To expand on @Kinopiko's point, if you define `ARRAY_SIZE` as an inline function, the array argument `x` will decay to a pointer and the trick in @Chris Lutz's comment will not work.
Sinan Ünür
The problem with the macro is that you can call it on anything that has a size and that supports bracket operators. That includes pointers, vectors, and maps, all of which are inappropriate for such a macro. A real function works great: `template <typename T, std::size_t N> inline std::size_t size(T( }` You can't call that function on a pointer; its argument must be an array.
Rob Kennedy
@Chris That's a function which you happen to write as a macro for esoteric reasons which side-steps the point. But forget those outer parens and you're in a world of hurt underscoring the danger involved in injecting code. Macros *are* less dangerous than source filters as they get inserted into the code by the compiler at points where they're used by a the caller. Source filters just rewrite all the code. To best see Brad's point, look at the Perl 5 source code some time. Its more C macros than C.
Schwern
@Rob - That's a good solution for C++. Some of us still use C, and in the case of C the macro is by far sufficient. It'll break for pointers, but when you're writing C, you should know that already, and it shouldn't be a problem.
Chris Lutz
+6  A: 

It's worth mentioning that Devel::Declare keywords (and starting with Perl 5.11.2, pluggable keywords) aren't source filters, and don't run afoul of the "only perl can parse Perl" problem. This is because they're run by the perl parser itself, they take what they need from the input, and then they return control to the very same parser.

For example, when you declare a method in MooseX::Declare like this:

method frob ($bubble, $bobble does coerce) {
  ... # complicated code
}

The word "method" invokes the method keyword parser, which uses its own grammar to get the method name and parse the method signature (which isn't Perl, but it doesn't need to be -- it just needs to be well-defined). Then it leaves perl to parse the method body as the body of a sub. Anything anywhere in your code that isn't between the word "method" and the end of a method signature doesn't get seen by the method parser at all, so it can't break your code, no matter how tricky you get.

hobbs
wow Perl is extremely slow because of these complete improvisations in the language
xxxxxxx
I think Python is much much better. Perl seems to be extremely archaic and you guys are trying to put bells and whistles on something that is a relic
xxxxxxx
+1  A: 

In theory, a source filter is no more dangerous than any other module, since you could easily write a module that redefines builtins or other constructs in "unexpected" ways. In practice however, it is quite hard to write a source filter in a way where you can prove that its not going to make a mistake. I tried my hand at writing a source filter that implements the perl6 feed operators in perl5 (Perl6::Feeds on cpan). You can take a look at the regular expressions to see the acrobatics required to simply figure out the boundaries of expression scope. While the filter works, and provides a test bed to experiment with feeds, I wouldn't consider using it in a production environment without many many more hours of testing.

Filter::Simple certainly comes in handy by dealing with 'the gory details of parsing quoted constructs', so I would be wary of any source filter that doesn't start there.

In all, it really depends on the filter you are using, and how broad a scope it tries to match against. If it is something simple like a c macro, then its "probably" ok, but if its something complicated then its a judgement call. I personally can't wait to play around with perl6's macro system. Finally lisp wont have anything on perl :-)

Eric Strom
This simply isn't true. in theory a source filter is infinitely more dangerous. Firstly, not all internal CORE functions can be redefined in perl, and the fact that parsing perl requires perl (for the aformentioned reason of prototyping and indirect object notation) it simply isn't fair to say "no more dangerous." A source filter by its very design totally and unavoidably dependent on assumptions whereas code isn't. Additionally, there is a mechanism to warn you or error during compilation if it can be detected that there is a problem, such as that of code-composition.
Evan Carroll
@EvanCarroll my point was that any module can manipulate the caller's space in potentially unexpected or dangerous ways so you should always be cautious and prefer well tested modules. i then go on to explain how it is much harder to ensure a that a source filter will be safe. you might have seen that if you read more than the first sentance of my post.
Eric Strom
A module author has to go out of their way to do something really wacky. You choose what's going to effect your caller, everything else is contained. Thus "modular". For a source filter, wackiness is the default. A filter touches *every line of code* in the caller, you have to be real careful to only effect the ones you mean. Even the simplest source filter contains danger, whereas simple modules do not.
Schwern