tags:

views:

231

answers:

5

I want to compare an URI String over different patterns in java and I want fastest code possible.

Should I use :

if(uri.contains("/br/fab") || uri.contains("/br/err") || uri.contains("/br/sts")

Or something like :

if(uri.matches(".*/br/(fab|err|sts).*"))

Note that I can have a lot more uri and this method is called very often.

What is the best answer between my choices ?

+1  A: 

I would expect contains() to be faster since it won't have to compile and iterate through a (relatively) complex regular expression, but rather simply look for a sequence of characters.

But (as with all optimisations) you should measure this. Your particular situation may impact results, to a greater or lesser degree.

Furthermore, is this known to be causing you grief (wrt. performance) ? If not, I wouldn't worry about it too much, and choose the most appropriate solution for your requirements regardless of performance issues. Premature optimisation will cause you an inordinate amount of grief if you let it!

Brian Agnew
+3  A: 

If you're going to use a regular expression, create it up-front and reuse the same Pattern object:

private static final Pattern pattern = Pattern.compile(".*/br/(fab|err|sts).*");

Do you actually need the ".*" at each end? I wouldn't expect it to be required, if you use Matcher.find().

Which is faster? The easiest way to find out is to measure it against some sample data - with as realistic samples as possible. (The fastest solution may very well depend on

Are you already sure this is a bottleneck though? If you've already measured the code enough to find out that it's a bottleneck, I'm surprised you haven't just tried both already. If you haven't verified that it's a problem, that's the first thing to do before worrying about the "fastest code possible".

If it's not a bottleneck, I would personally opt for the non-regex version unless you're a regex junkie. Regular expressions are very powerful, but also very easy to get wrong.

Jon Skeet
Let's say it's a theory question. Optimization can be interesting even if it's not a bottleneck.
Robert Fraser
@Robert: It can be, but this is likely to depend on the actual data. You could come up with a test showing one approach to be faster - and then with the OP's real data you could get the other answer. I think it's actually more important to learn the lesson that you should check for bottlenecks before you assume you need the "fastest code possible".
Jon Skeet
+2  A: 

They're both fast enough to be over before you know it. I'd go for the one that you can read more easily.

Ewan Todd
A: 

If the bit you are trying to match against is always at the beginning, or end, or is in some other way predictable then: neither!

For example, if urls are like http://example.com/br/fab or http://example.com/br/err all the time, then you could store "br/fab" and "br/err" etc in a HashSet or similar, and then given an incoming URL, chop off the last part of it and query the Set to see if it contains it. This will scale better than either method you gave (with a HashSet it should get no slower to lookup entries no matter how many there are).

If you do need to match against substrings appearing in arbitrary locations... it depends what you mean by "a lot more". One thing you should do regardless of the specifics of the problem is try things out and benchmark them!

ZoFreX
If it's only a few entries, using a HashSet would be slower than just using "endsWith" three times... A hashset would scale well in terms of number of entries to match *against*, but there's no indication that there will be a large number of these. It won't scale any better than the other methods for large numbers of URIs to check.
Jon Skeet
I was under the impression he meant that he might be matching against many more entries. Yes, if it's just lots of uris and few entries, any method will scale linearly.
ZoFreX
+1  A: 

Ok as many as suggested, I've done a test and it is faster to use contains. As Ewan Todd said, they both fast enough to don't really bother with that.

Thank you everyone.

Mike