I have a simple program which reads a bunch of things from the filesystem, filters the results , and prints them. This simple program implements a domain specific language to make selection easier. This DSL "compiles" down into an execution plan that looks like this (Input was C:\Windows\System32\* OR -md5"ABCDEFG" OR -tf
):
Index Success Failure Description
0 S 1 File Matches C:\Windows\System32\*
1 S 2 File MD5 Matches ABCDEFG
2 S F File is file. (Not directory)
The filter is applied to the given file, and if it succeeds, the index pointer jumps to the index indicated in the success field, and if it fails, the index pointer jumps to the number indicated in the failure field. "S" means that the file passes the filter, F means that the file should be rejected.
Of course, a filter based upon a simple file attribute (!FILE_ATTRIBUTE_DIRECTORY) check is much faster than a check based upon the MD5 of the file, which requires opening and performing the actual hash of the file. Each filter "opcode" has a time class associated with it; MD5 gets a high timing number, ISFILE gets a low timing number.
I would like to reorder this execution plan so that opcodes that take a long time are executed as rarely as possible. For the above plan, that would mean it would have to be:
Index Success Failure Description
0 S 1 File is file. (Not directory)
1 S 2 File Matches C:\Windows\System32\*
2 S F File MD5 Matches ABCDEFG
According to the "Dragon Book", picking the best order of execution for three address code is an NP-Complete problem (At least according to page 511 of the second edition of that text), but in that case they are talking about register allocation and other issues of the machine. In my case, the actual "intermediate code" is much simpler. I'm wondering of a scheme exists that would allow me to reorder the source IL into the optimal execution plan.
Here is another example:
{ C:\Windows\Inf* AND -tp } OR { -tf AND NOT C:\Windows\System32\Drivers* }
Parsed to:
Index Success Failure Description
0 1 2 File Matches C:\Windows\Inf\*
1 S 2 File is a Portable Executable
2 3 F File is file. (Not directory)
3 F S File Matches C:\Windows\System32\Drivers\*
which is optimally:
Index Success Failure Description
0 1 2 File is file. (Not directory)
1 2 S File Matches C:\Windows\System32\Drivers\*
2 3 F File Matches C:\Windows\Inf\*
3 S F File is a Portable Executable