views:

200

answers:

3

Say I'm running a service where users can submit a regex to search through lots of data. If the user submits a regex that is very slow (ie. takes minutes for Matcher.find() to return), I want a way to cancel that match. The only way I can think of doing this is to have another thread monitor how long a match is taking and use Thread.stop() to cancel it if necessary.

Member variables:

long REGEX_TIMEOUT = 30000L;
Object lock = new Object();
boolean finished = false;
Thread matcherThread;

Matcher thread:

try {
    matcherThread = Thread.currentThread();

    // imagine code to start monitor thread is here

    try {
        matched = matcher.find();
    } finally {
        synchronized (lock) {
            finished = true;
            lock.notifyAll();
        }
    }
} catch (ThreadDeath td) {
    // send angry message to client
    // handle error without rethrowing td
}

Monitor thread:

synchronized (lock) {
    while (! finished) {
        try {
            lock.wait(REGEX_TIMEOUT);

            if (! finished) {
                matcherThread.stop();
            }
        } catch (InterruptedException ex) {
            // ignore, top level method in dedicated thread, etc..
        }
    }
}

I've read java.sun.com/j2se/1.4.2/docs/guide/misc/threadPrimitiveDeprecation.html and I think this usage is safe since I'm controlling where ThreadDeath is thrown via synchronisation and handle it and the only damaged objects could be my Pattern and Matcher instances which will be discarded anyway. I think this breaks Thread.stop() because I'm not rethrowing the error, but I don't really want the thread to die, just abort the find() method.

I've managed to avoid using these deprecated API components so far, but Matcher.find() does not seem to be interruptible and can take a very long time to return. Is there any better way to do this?

+10  A: 

From Heritrix: (crawler.archive.org)

/**
 * CharSequence that noticed thread interrupts -- as might be necessary 
 * to recover from a loose regex on unexpected challenging input. 
 * 
 * @author gojomo
 */
public class InterruptibleCharSequence implements CharSequence {
    CharSequence inner;
    // public long counter = 0; 

    public InterruptibleCharSequence(CharSequence inner) {
        super();
        this.inner = inner;
    }

    public char charAt(int index) {
        if (Thread.interrupted()) { // clears flag if set
            throw new RuntimeException(new InterruptedException());
        }
        // counter++;
        return inner.charAt(index);
    }

    public int length() {
        return inner.length();
    }

    public CharSequence subSequence(int start, int end) {
        return new InterruptibleCharSequence(inner.subSequence(start, end));
    }

    @Override
    public String toString() {
        return inner.toString();
    }
}

Wrap your CharSequence with this one and Thread interrupts will work ...

Kris
+1 for clever hack to implement a missing feature!
Aaron Digulla
It would be slightly faster if you moved the exception bit out of charAt, although the real problem is likely to be inefficient patterns rather than large target text.
Tom Hawtin - tackline
VERY clever.... I would +5 if I could....
Jared
A: 

You have found a case where reality trumps documentation. Thread.stop() is the way to go.

Alternatively, you can copy the source for Java regexp and add a couple of Thread.currentThread().isInterrupted() but that seems like overkill.

Aaron Digulla
A: 

Another workaround would be to limit the region of the matcher, then call find(), repeating until the thread is interrupted or a match is found.

erickson