views:

212

answers:

4

Hi

I have a quite general question about java and regular expression.

If we lock at embedded use, say mobile phones with J2ME or Android, how common is it that regexp is included and how resource hungry is it?

I mean regular expression is a powerful beast, and a lot of magic is done in the background to make it happen. And my question is if there maybe are to much magic? Or if it is safe to use it with care (like most things).

Thanks Johan


Update: Thanks DigitalRoss for pointing out that java.util.regex is a part of android.

+3  A: 

Something about regex solutions bother me too, probably too many code-golf solutions mapped to a barely-works-on-the-example-case regex.

But they rule, and I love them in vi(1).

java.util.regex is certainly a part of android.

Regular expressions use memory dynamically. Most don't use very much but here is an expression that can potentially use a lot. Apparently it first came from perl but is mostly floating around these days set up for a ruby test:

ruby -wle 'puts "Prime" unless ("1" * ARGV[0].to_i) =~ /^1$|^(11+?)\1+$/'  THENUMBER

So, say something like:

ruby -wle 'puts "Prime" unless ("1" * ARGV[0].to_i) =~ /^1$|^(11+?)\1+$/' 8191

Yes, it's a regular expression that tests primality.

DigitalRoss
+5  A: 

Regex is a programming lanaguage -- it's a way of defining a finite state machine, and there's really no upper limit to the complexity of that FSM, beyond your own sanity.

It's not "magic" - you can understand how RE matching works behind the scenes, and once you do that, you'll be in control of how resource-hungry your REs are.

Simple REs are very cheap, but it's possible to write expensive REs that have to look ahead and do lots of backtracking.

I thoroughly recommend Jeff Friedl's "Mastering Regular Expressions". It's not just for Perl, and you don't have to grind through the whole thing just lose the idea that RE is magic, and learn that it's a programming language you can optimise (or, indeed, write poorly perfoming code in).

slim
+1 for the language inside the language description.
Johan
+1  A: 

I think RE's are can be faster then any ad hoc string search/replace solution you could come up with so I would go ahead with RE's. However as many have pointed out -- a badly built regex or using it in situations it shouldn't be used can be bad.

rohit.arondekar
@rohit: actually, a badly designed regex (or a regex used in the wrong situation) can be *disastrously* worse than an ad hoc solution.
Stephen C
I honestly didn't know - edited my answer.
rohit.arondekar
+1 for the new answer :)
Johan
+1  A: 

Most definitely they are faster than

ad hoc string search/replace solution

and as well I would think they would be included in CLDC 1.1/MIDP 2.0 so ( if you find them there ) we can conclude that their footprint is negligible plus probably implemented in an optimized / built-in, thus being of virtually no cost to use them.

I now use .split("//p{Cntrl}") routinely to store / recover from disk - seems to be a built-in && no-cost tool.

Nicholas Jordan