views:

741

answers:

4

I have a function that uses Pattern.compile and a Matcher to search a list of strings for a pattern. This function is used in multiple threads. Each thread will have a unique pattern passed to the Pattern.compile when the thread is created. The number of threads and patterns are dynamic, meaning that I can add more patterns and threads during configuration.

Do I need to put a "synchronize" on this function if it uses regex? Is regex in java thread safe?

TIA

+2  A: 

While you need to remember that thread safety has to take into account the surrounding code as well, you appear to be in luck. The fact that Matchers are created using the Pattern's matcher factory method and lack public constructors is a positive sign. Likewise, you use the compile static method to create the encompassing Pattern.

So, in short, if you do something like the example:

Pattern p = Pattern.compile("a*b");
Matcher m = p.matcher("aaaaab");
boolean b = m.matches();

you should be doing pretty well.

Follow-up to the code example for clarity: note that this example strongly implies that the Matcher thus created is thread-local with the Pattern and the test. I.e., you should not expose the Matcher thus created to any other threads.

Frankly, that's the risk of any thread-safety question. The reality is that any code can be made thread-unsafe if you try hard enough. Fortunately, there are wonderful books that teach us a whole bunch of ways that we could ruin our code. If we stay away from those mistakes, we greatly reduce our own probability of threading problems.

Bob Cross
what does this have to do with thread safety?
Jason S
@Jason S: thread locality is one very straightforward way to achieve thread safety even if the internal code isn't thread safe. If only one method could ever possibly access a particular method at a time, you've enforced thread safety externally.
Bob Cross
ok, so you are just saying that re-creating a pattern from a string at the point of use, is better than storing it to be efficient, at the risk of dealing with concurrency issues? i'll grant you that. I was confused with that sentence about factory methods and public constructors, that seems like a red herring w/r/t this topic.
Jason S
@Jason S, no, the factory methods and lack of constructors are some of the ways that you can reduce the threat of coupling with other threads. If the only way you can get the Matcher that goes with my Pattern is via p.matcher(), nobody else can side-effect my Matcher. However, I can still cause trouble for myself: if I have a public method that returns that Matcher, another thread could get at it and side-effect it. In short, concurrency is hard (in ANY language).
Bob Cross
+2  A: 

Thread-safety with regular expressions in Java

SUMMARY:

The Java regular expression API has been designed to allow a single compiled pattern to be shared across multiple match operations.

You can safely call Pattern.matcher() on the same pattern from different threads and safely use the matchers concurrently. Pattern.matcher() is safe to construct matchers without synchronization. Although the method isn't synchronized, internal to the Pattern class, a volatile variable called compiled is always set after constructing a pattern and read at the start of the call to matcher(). This forces any thread referring to the Pattern to correctly "see" the contents of that object.

On the other hand, you shouldn't share a Matcher between different threads. Or at least, if you ever did, you should use explicit synchronization.

adatapost
do you have a reference for this quote?
akf
@akf I found a near direct quote here: http://www.javamex.com/tutorials/regular_expressions/thread_safety.shtml I don't know if that's the original source.
Bob Cross
cool, thanks
akf
@akf, BTW, you should note that that's a discussion site (much like this one). I'd consider anything you find there no better or worse than information that you'd find here (i.e., it isn't The One True Word From James Gosling).
Bob Cross
I apologies for not adding link with my post.
adatapost
+8  A: 

From the Java API documentation for the Pattern class

Instances of this (Pattern) class are immutable and are safe for use by multiple concurrent threads. Instances of the Matcher class are not safe for such use.

If you are looking at performance centric code, attempt to reset the Matcher instance using the reset() method, instead of creating new instances. This would reset the state of the Matcher instance, making it usable for the next regex operation. In fact, it is the state maintained in the Matcher instance that is responsible for it to be unsafe for concurrent access.

Vineet Reynolds
Pattern objects are thread safe, but the `compile()` method might not be. There have been two or three bugs over the years that caused compilation to fail in multithreaded environments. I would recommend doing the compilation in a synchronized block.
Alan Moore
Yes, there have been concurrency bugs raised in the Pattern class, and your advice of sychronized access is appreciated. However, the original developers of the Pattern class intended to make the Pattern class as thread safe, and that is the contract that any Java programmer should be able to rely on. To be frank, I'd rather have thread local variables and accept the minimal performance hit than rely on thread safe behavior by contract (unless I've seen the code). As they say "Threading is easy, correct synchronization is hard".
Vineet Reynolds
A: 

A quick look at the code for Matcher.java shows a bunch of member variables including the text that is being matched, arrays for groups, a few indexes for maintain location and a few booleans for other state. This all points to a stateful Matcher that would not behave well if accessed by multiple Threads. So does the JavaDoc:

Instances of this class are not safe for use by multiple concurrent threads.

This is only an issue if, as @Bob Cross points out, you go out of your way to allow use of your Matcher in separate Threads. If you need to do this, and you think that synchronization will be an issue for your code, an option you have is to use a ThreadLocal storage object to maintain a Matcher per working thread.

akf