views:

296

answers:

8

Is there any regular expression that will, for some input string, search for a match forever?

+1  A: 

Not in the sense you are describing, you can have some very inefficient regular expressions that take up loads of resources and end up killing the regex engine, this is not the same as halting.

I don't think halting really applies here, as the other commenters of this post have so astutely pointed out. http://en.wikipedia.org/wiki/Halting_problem

Robert Greiner
There isn't a way to make a program that, _for every possible program_ will tell you if it halts or not. But that doesn't mean you can't do that for a subset. Maybe regexes are one such subset, but I don't know.
obvio171
Referring to the halting problem here is not very useful; the algorithm used for RE matching is a particular algorithm, the interesting thing about the halting problem is to solve it for all program-input pairs.
Sinan Taifour
(wow! exact same second!)
Sinan Taifour
yeah, that's what I was trying to say in my answer.
Robert Greiner
+1  A: 

According to this question, every regular expression halts.

obvio171
+2  A: 

I imagine, it is not possible to find a regular expression that doesn't halt.

The size of your input is finite. The maximum size of any matched subgroup of the regular expression is, at max, the size of your input.

Unless the algorithm being used is pretty stupid (going over cases multiple times), the number of matched subgroups, will too, be finite.

So, it will halt.

Sinan Taifour
A: 

I can't imagine an input string that would be parsed forever, although an infinitely long string would be parsed forever. Given that a regular expression can describe a regular language, which is potentially an infinite set of words, then a regex can describe a language of infinite words, including words of infinite length. However, no input string can be infinitely long, so at some point it would have to halt.

For example, if a*b is accepted in the language, and you have an infinitely long string of 'a's, then yes, the regex would never halt. Practically, though, this is impossible.

bkritzer
+6  A: 

Formal regex is actually a method of describing a deterministic finite automaton for parsing strings. The regex "matches" if the DFA winds up in an accepting state at the end of input. Since the DFA reads its input sequentially, it will always halt when it reaches the end of the input, and whether or not there is a match is merely a matter of examining which state of the DFA it halts at.

Substring matching is effectively the same, except instead of being forced to halt at the end of one read-through of the string, the DFA would instead be forced to halt after reading each possible substring once - still a finite case. (Yes, most regex engines implement this in a bit more optimized manner than just throwing every possible substring at a DFA - but conceptually it the limit is still there).

Thus the only possible case in which the DFA would not halt is if the input were infinite, which is generally considered beyond the scope of the halting problem.

Amber
+23  A: 

For a finite input, there is no formal regular expression that will not halt.

Any formal regular expression can be translated into a Deterministic Finite Automata. A DFA reads the input one character at a time, and, at the end of the input, you are either in an accepting state or in a non-accepting state. If the state is accepting, then the input matches the regular expression. Otherwise, it doesn't.

Now, most "regular expression" libraries support things which are not regular expressions, such as back references. As long as you keep away from those features, and have a finite input, you are guaranteed a halt. If you don't... depending on exactly what you are using, you might well not be guaranteed a halt. Perl allows arbitrary code to be inserted, for instance, and arbitrary, turing-machine equivalent code is not guaranteed to halt.

Now, if the input is infinite, then trivial regular expressions can be found which will never halt. For example, ".*".

Daniel
+1 for mentioning back-references.
Brian
The only quibble: they are called deterministic finite automata, not definite. To contrast with (ironically, equivelant) non-deterministic finite automata.
Agor
@Agor: I *hate* it when I do that. I'm well aware of the correct name, but I always type the wrong name for some reasons. :-(
Daniel
A: 

Yes.

A regular expression can be represented by a finite state machine. Every time you receive an atomic input, it will cause any well-defined FSM to transition to a known state.

The exception is when you have infinite input, but this is not applicable to the halting problem because it deals with finite input. When you have a finite state machine and finite input, it is always possible to determine whether your machine will halt or not.

http://en.wikipedia.org/wiki/Finite_state_machine

Unknown
A: 

+1 for Daniel's answer: all finite inputs cause true regex's (i.e., without backreferences or other non-regex features) to halt, and regex's are equivalent to DFAs.

Bonus: Regular Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python, Ruby, ...)

http://swtch.com/~rsc/regexp/regexp1.html

Note that the two graphs at the top of the article have different scales on the y-axis: one is seconds, the other is microseconds!

Peyton