views:

144

answers:

1

I am searching for an efficient a technique to find a sequence of Op occurences in a Seq[Op]. Once an occurence is found, I want to replace the occurence with a defined replacement and run the same search again until the list stops changing.

Scenario:

I have three types of Op case classes. Pop() extends Op, Push() extends Op and Nop() extends Op. I want to replace the occurence of Push(), Pop() with Nop(). Basically the code could look like seq.replace(Push() ~ Pop() ~> Nop()).

Problem:

Now that I call seq.replace(...) I will have to search in the sequence for an occurence of Push(), Pop(). So far so good. I find the occurence. But now I will have to splice the occurence form the list and insert the replacement.

Now there are two options. My list could be mutable or immutable. If I use an immutable list I am scared regarding performance because those sequences are usually 500+ elements in size. If I replace a lot of occurences like A ~ B ~ C ~> D ~ E I will create a lot of new objects If I am not mistaken. However I could also use a mutable sequence like ListBuffer[Op].

Basically from a linked-list background I would just do some pointer-bending and after a total of four operations I am done with the replacement without creating new objects. That is why I am now concerned about performance. Especially since this is a performance-critical operation for me.

Question:

How would you implement the replace() method in a Scala fashion and what kind of data structure would you use keeping in mind that this is a performance-critical operation?

I am happy with answers that point me in the right direction or pseudo code. No need to write a full replace method.

Thank you.

+4  A: 

Ok, some considerations to be made. First, recall that, on lists, tail does not create objects, and prepending (::) only creates one object for each prepended element. That's pretty much as good as you can get, generally speaking.

One way of doing this would be this:

def myReplace(input: List[Op], pattern: List[Op], replacement: List[Op]) = {
  // This function should be part of an KMP algorithm instead, for performance
  def compare(pattern: List[Op], list: List[Op]): Boolean = (pattern, list) match {
    case (x :: xs, y :: ys) if x == y => compare(xs, ys)
    case (Nil, Nil)                   => true
    case _                            => false
  }

  var processed: List[Op] = Nil
  var unprocessed: List[Op] = input
  val patternLength = pattern.length
  val reversedReplacement = replacement.reverse

  // Do this until we finish processing the whole sequence
  while (unprocessed.nonEmpty) {

    // This inside algorithm would be better if replaced by KMP

    // Quickly process non-matching sequences
    while (unprocessed.nonEmpty && unprocessed.head != pattern.head) {
      processed ::= unprocessed.head
      unprocessed = unprocessed.tail
    }

    if (unprocessed.nonEmpty) {
      if (compare(pattern, unprocessed)) {
        processed :::= reversedReplacement
        unprocessed = unprocessed drop patternLength
      } else {
      processed ::= unprocessed.head
      unprocessed = unprocessed.tail
      }          
    }
  }

  processed.reverse
}

You may gain speed by using KMP, particularly if the pattern searched for is long.

Now, what is the problem with this algorithm? The problem is that it won't test if the replaced pattern causes a match before that position. For instance, if I replace ACB with C, and I have an input AACBB, then the result of this algorithm will be ACB instead of C.

To avoid this problem, you should create a backtrack. First, you check at which position in your pattern the replacement may happen:

val positionOfReplacement = pattern.indexOfSlice(replacement)

Then, you modify the replacement part of the algorithm this:

      if (compare(pattern, unprocessed)) {
        if (positionOfReplacement > 0) {
          unprocessed :::= replacement
          unprocessed :::= processed take positionOfReplacement
          processed = processed drop positionOfReplacement 
        } else {
          processed :::= reversedReplacement
          unprocessed = unprocessed drop patternLength
        }
      } else {

This will backtrack enough to solve the problem.

This algorithm won't deal efficiently, however, with multiply patterns at the same time, which I guess is where you are going. For that, you'll probably need some adaptation of KMP, to do it efficiently, or, otherwise, use a DFA to control possible matchings. It gets even worse if you want to match both AB and ABC.

In practice, the full blow problem is equivalent to regex match & replace, where the replace is a function of the match. Which means, of course, you may want to start looking into regex algorithms.

EDIT

I was forgetting to complete my reasoning. If that technique doesn't work for some reason, then my advice is going with an immutable tree-based vector. Tree-based vectors enable replacement of partial sequences with low amount of copying.

And if that doesn't do, then the solution is doubly linked lists. And pick one from a library with slice replacement -- otherwise you may end up spending way too much time debugging a known but tricky algorithm.

Daniel
Thank you Daniel, a throughout informative answer.
Joa Ebert