Say I have a list of integers, where each element is a number from 1 to 20. (That's not what I'm trying to sort.)
Now, I have an array of "operations", where each operation:
- Removes certain (known) numbers from the list
- and Adds certain other (known) numbers to the list
- and Is unable to handle the list if it contains certain (known) numbers at the beginning of the operation - call these Prevent
Edit: There can be zero or more numbers in each of Adds, Removes, and Prevent for each operation, and each number can appear zero or more times in each group for some operation. For any given operation, Adds and Removes are disjoint, Prevent and Removes are disjoint, but Adds and Prevent may overlap.
I want to sort the array of operations so that for each operation:
- If the operation has Prevent items, it is placed after an operation that Removes those numbers. If not immediately after, there cannot be an Adds operation that adds those numbers back between the last Removes and the Prevent.
- If the operation Removes items, all operations that Adds any of those items is placed before it.
In the event of a circular dependency, the chain of operations should remove as many numbers as possible and inform me that it could not remove all the numbers.
Is there a name/implementation for this type of algorithm that outperforms the one I have below?
Added 8/23: The bounty is for covering the the sort requirements considering both the OpCodes (set of structs) and InstructionSemantics
(set of bit flags from an enumeration).
Added later 8/23: I made an 89:1 performance improvement by heuristically pre-sorting the source array. See my current answer for details.
namespace Pimp.Vmx.Compiler.Transforms
{
using System;
using System.Collections.Generic;
using System.Reflection.Emit;
internal interface ITransform
{
IEnumerable<OpCode> RemovedOpCodes { get; }
IEnumerable<OpCode> InsertedOpCodes { get; }
IEnumerable<OpCode> PreventOpCodes { get; }
InstructionSemantics RemovedSemantics { get; }
InstructionSemantics InsertedSemantics { get; }
InstructionSemantics PreventSemantics { get; }
}
[Flags]
internal enum InstructionSemantics
{
None,
ReadBarrier = 1 << 0,
WriteBarrier = 1 << 1,
BoundsCheck = 1 << 2,
NullCheck = 1 << 3,
DivideByZeroCheck = 1 << 4,
AlignmentCheck = 1 << 5,
ArrayElementTypeCheck = 1 << 6,
}
internal class ExampleUtilityClass
{
public static ITransform[] SortTransforms(ITransform[] transforms)
{
throw new MissingMethodException("Gotta do something about this...");
}
}
}
Edit: Below this line is background info on what I'm actually doing, in case people are wondering why I'm asking this. It doesn't change the problem, just shows the scope.
I have a system that reads in a list of items and sends it to another "module" for processing. Each item is an instruction in my intermediate representation in a compiler - basically a number from 1 to ~300 plus some combination of about 17 available modifiers (flags enumeration). The complexity of the processing system (machine code assembler) is proportional to the number of possible unique inputs (number+flags), where I have to hand-code every single handler. On top of that, I have to write at least 3 independent processing systems (X86, X64, ARM) - the amount of actual processing code I can use for multiple processing systems is minimal.
By inserting "operations" between reading and processing, I can ensure that certain items never appear for processing - I do this by expressing the numbers and/or flags in terms of other numbers. I can code each "transformation operation" in a black box by describing its effects, which saves me a ton of complexity per-operation. The operations are complex and unique for each transformation, but much easier than the processing system is. To show how much time this saves, one of my operations completely removes 6 of the flags by writing their desired effects in terms of about 6 numbers (without flags).
In order to keep things in the black box, I want an ordering algorithm to take all the operations I write, order them to have the greatest impact, and inform me about how successful I was at simplifying the data that will eventually reach the processing system(s). Naturally, I'm targeting the most complex items in the intermediate representation and simplifying them to basic pointer arithmetic where possible, which is the easiest to handle in the assemblers. :)
With all that said, I'll add another note. The operation effects are described as "attribute effects" over the list of instructions. In general the operations behave well, but some of them only remove numbers that fall after other numbers (like remove all 6's that don't follow a 16). Others remove all instances of a particular number that contains certain flags. I'll be handling these later - AFTER I figure out the basic problem of guaranteed add/remove/prevent listed above.
Added 8/23: In this image, you can see a call
instruction (gray) that had InstructionSemantics.NullCheck
was processed by the RemoveNullReferenceChecks
transform to remove the semantics flag in exchange for adding another call (with no semantics attached to the added call either). Now the assembler doesn't need to understand/handle InstructionSemantics.NullCheck
, because it will never see them. No criticizing the ARM code - it's a placeholder for now.