views:

493

answers:

4

Say I have a list of integers, where each element is a number from 1 to 20. (That's not what I'm trying to sort.)

Now, I have an array of "operations", where each operation:

  • Removes certain (known) numbers from the list
  • and Adds certain other (known) numbers to the list
  • and Is unable to handle the list if it contains certain (known) numbers at the beginning of the operation - call these Prevent

Edit: There can be zero or more numbers in each of Adds, Removes, and Prevent for each operation, and each number can appear zero or more times in each group for some operation. For any given operation, Adds and Removes are disjoint, Prevent and Removes are disjoint, but Adds and Prevent may overlap.

I want to sort the array of operations so that for each operation:

  • If the operation has Prevent items, it is placed after an operation that Removes those numbers. If not immediately after, there cannot be an Adds operation that adds those numbers back between the last Removes and the Prevent.
  • If the operation Removes items, all operations that Adds any of those items is placed before it.

In the event of a circular dependency, the chain of operations should remove as many numbers as possible and inform me that it could not remove all the numbers.

Is there a name/implementation for this type of algorithm that outperforms the one I have below?

Added 8/23: The bounty is for covering the the sort requirements considering both the OpCodes (set of structs) and InstructionSemantics (set of bit flags from an enumeration).

Added later 8/23: I made an 89:1 performance improvement by heuristically pre-sorting the source array. See my current answer for details.

namespace Pimp.Vmx.Compiler.Transforms
{
    using System;
    using System.Collections.Generic;
    using System.Reflection.Emit;

    internal interface ITransform
    {
        IEnumerable<OpCode> RemovedOpCodes { get; }
        IEnumerable<OpCode> InsertedOpCodes { get; }
        IEnumerable<OpCode> PreventOpCodes { get; }

        InstructionSemantics RemovedSemantics { get; }
        InstructionSemantics InsertedSemantics { get; }
        InstructionSemantics PreventSemantics { get; }
    }

    [Flags]
    internal enum InstructionSemantics
    {
        None,
        ReadBarrier = 1 << 0,
        WriteBarrier = 1 << 1,
        BoundsCheck = 1 << 2,
        NullCheck = 1 << 3,
        DivideByZeroCheck = 1 << 4,
        AlignmentCheck = 1 << 5,
        ArrayElementTypeCheck = 1 << 6,
    }

    internal class ExampleUtilityClass
    {
        public static ITransform[] SortTransforms(ITransform[] transforms)
        {
            throw new MissingMethodException("Gotta do something about this...");
        }
    }
}


Edit: Below this line is background info on what I'm actually doing, in case people are wondering why I'm asking this. It doesn't change the problem, just shows the scope.

I have a system that reads in a list of items and sends it to another "module" for processing. Each item is an instruction in my intermediate representation in a compiler - basically a number from 1 to ~300 plus some combination of about 17 available modifiers (flags enumeration). The complexity of the processing system (machine code assembler) is proportional to the number of possible unique inputs (number+flags), where I have to hand-code every single handler. On top of that, I have to write at least 3 independent processing systems (X86, X64, ARM) - the amount of actual processing code I can use for multiple processing systems is minimal.

By inserting "operations" between reading and processing, I can ensure that certain items never appear for processing - I do this by expressing the numbers and/or flags in terms of other numbers. I can code each "transformation operation" in a black box by describing its effects, which saves me a ton of complexity per-operation. The operations are complex and unique for each transformation, but much easier than the processing system is. To show how much time this saves, one of my operations completely removes 6 of the flags by writing their desired effects in terms of about 6 numbers (without flags).

In order to keep things in the black box, I want an ordering algorithm to take all the operations I write, order them to have the greatest impact, and inform me about how successful I was at simplifying the data that will eventually reach the processing system(s). Naturally, I'm targeting the most complex items in the intermediate representation and simplifying them to basic pointer arithmetic where possible, which is the easiest to handle in the assemblers. :)

With all that said, I'll add another note. The operation effects are described as "attribute effects" over the list of instructions. In general the operations behave well, but some of them only remove numbers that fall after other numbers (like remove all 6's that don't follow a 16). Others remove all instances of a particular number that contains certain flags. I'll be handling these later - AFTER I figure out the basic problem of guaranteed add/remove/prevent listed above.

Added 8/23: In this image, you can see a call instruction (gray) that had InstructionSemantics.NullCheck was processed by the RemoveNullReferenceChecks transform to remove the semantics flag in exchange for adding another call (with no semantics attached to the added call either). Now the assembler doesn't need to understand/handle InstructionSemantics.NullCheck, because it will never see them. No criticizing the ARM code - it's a placeholder for now.

+1  A: 

I think you are talking about an ordering algorithm here rather than a sorting algorithm. That is you want to find an ordering that has the properties listed.

I'd be surprised if a specific ordering algorithm already existed that would satisfy these properties.

Be aware that you may not be able to find an ordering for a given set of operations. Indeed there may not even be a partial ordering / lattice. A trivial example is:

op1(adds(1),removes(2))
op2(adds(2),removes(1))
Stephen C
That's what I meant by a circular dependency. In that case you can't remove both 1 and 2. However, if `op1` removed 2 and 3 (nothing else changed), then I need to know that 1 couldn't be removed but 2 and 3 are (maximum number removed for the given ops). Of course getting a list of items actually removed and desired for remove is trivial set manipulation after the ordering.
280Z28
+3  A: 

Sounds like topological sort, think of each operation as a node in a directed graph with the edges being the constraints you mentioned.


Edit: @280Z28 commented on this answer:

I'm using a topological sort right now, but it's technically too strong. I need some way to have "weak" groups of edges (one or more edges of the group holds in the final ordering)

I'm not sure I follow what you men regarding weak groups of edges, if this refers to breaking cycles then topological sort can do this, the way I did it was to maintain an in count which showed how many unvisited nodes point to this node. Then for each iteration you work on (one of the) nodes with the minimal in count, if the in count is not zero that means there's a cycle and you arbitrarily break the cycle in order to complete the sorting.

Motti
I'm using a topological sort right now, but it's technically too strong. I need some way to have "weak" groups of edges (one or more edges of the group holds in the final ordering).
280Z28
A: 

Since whether or not element X can appear next in the list depends not just on the last element in the list but also on previous elements, you are right to say that topological sorting is too strong. This is a more general search problem, so I would try a more general solution: either backtracking search or dynamic programming. The former can always be done, but is sometimes incredibly slow; the latter will lead to a much faster (but more memory-intensive) solution, but it requires that you can find a D.P. formulation of the problem, which isn't always possible.

j_random_hacker
Eventually, I'll be able to save the final sorted order. However, right now I'm adding and modifying the operations in a guided (by the sorting results) attempt to remove as many inputs as I possibly can.
280Z28
+2  A: 

This works for now. If and only if an ordering exists to meet the conditions, it will find it. I haven't tried optimizing it yet. It works in reverse by keeping track of which items are not allowed to be added by the previous operation.

Edit: I can't mark this as an answer because I'm already taking a huge performance hit from it and I only have 17 operations (ITransforms). It takes 15s to sort right now lol/fail.

Added 8/23: Check the next code section for how I improved the sort to something that is actually workable once again.

Edit 8/25: Things got nasty again as I crossed 25 items, but it turned out I had a small problem in the pre-sort that's fixed now.

    private static ITransform[] OrderTransforms(ITransform[] source)
    {
        return OrderTransforms(
            new List<ITransform>(source),
            new Stack<ITransform>(),
            new HashSet<OpCode>(source.SelectMany(transform => transform.RemovedOpCodes)),
            source.Aggregate(InstructionSemantics.None, (preventedSemantics, transform) => preventedSemantics | transform.RemovedSemantics)
            );
    }

    private static ITransform[] OrderTransforms(List<ITransform> source, Stack<ITransform> selected, HashSet<OpCode> preventAdd, InstructionSemantics preventSemantics)
    {
        if (source.Count == 0 && preventAdd.Count == 0)
            return selected.ToArray();

        for (int i = source.Count - 1; i >= 0; i--)
        {
            var transform = source[i];
            if ((preventSemantics & transform.InsertedSemantics) != 0)
                continue;

            if (preventAdd.Intersect(transform.InsertedOpCodes).Any())
                continue;

            selected.Push(transform);
            source.RemoveAt(i);
#if true
            var result = OrderTransforms(source, selected, new HashSet<OpCode>(preventAdd.Except(transform.RemovedOpCodes).Union(transform.PreventOpCodes)), (preventSemantics & ~transform.RemovedSemantics) | transform.PreventSemantics);
#else // this is even slower:
            OpCode[] toggle = preventAdd.Intersect(transform.RemovedOpCodes).Union(transform.PreventOpCodes.Except(preventAdd)).ToArray();
            preventAdd.SymmetricExceptWith(toggle);
            var result = OrderTransforms(source, selected, preventAdd, (preventSemantics & ~transform.RemovedSemantics) | transform.PreventSemantics);
            preventAdd.SymmetricExceptWith(toggle);
#endif
            if (result != null)
                return result;

            source.Insert(i, transform);
            selected.Pop();
        }

        return null;
    }

In an effort to reclaim my bounty, I have reduced the sort time from 15380s to 173ms by heuristically preconditioning the array, immediately followed by the sort routing above.

private static ITransform[] PreSortTransforms(ITransform[] source)
{
    // maps an opcode to the set of transforms that remove it
    ILookup<OpCode, ITransform> removals =
        source
        .SelectMany(transform => transform.RemovedOpCodes.Select(opcode => new
        {
            OpCode = opcode,
            Transform = transform
        }))
        .ToLookup(item => item.OpCode, item => item.Transform);

    // maps an opcode to the set of transforms that add it
    ILookup<OpCode, ITransform> additions =
        source
        .SelectMany(transform => transform.InsertedOpCodes.Select(opcode => new
        {
            OpCode = opcode,
            Transform = transform
        }))
        .ToLookup(item => item.OpCode, item => item.Transform);

    // maps a set of items (A) to a set of items (B), where ALL elements of B must come before SOME element of A
    ILookup<IEnumerable<ITransform>, ITransform> weakForwardDependencies =
        removals
        .SelectMany(item => additions[item.Key].Select(dependency => new
        {
            Transform = item,
            Dependency = dependency
        }))
        .ToLookup(item => item.Transform.AsEnumerable(), item => item.Dependency);

    /* For items in the previous map where set A only had one element, "somewhat" order the
     * elements of set B before it. The order doesn't [necessarily] hold when a key from one
     * relationship is a member of the values of another relationship.
     */
    var ordered =
        weakForwardDependencies
        .Where(dependencyMap => dependencyMap.Key.SingleOrDefault() != null)
        .SelectMany(dependencyMap => dependencyMap.AsEnumerable());

    // Add the remaining transforms from the original array before the semi-sorted ones.
    ITransform[] semiSorted = source.Except(ordered).Union(ordered).ToArray();

    return semiSorted;
}
280Z28