views:

208

answers:

2

The question Complexity of Regex substitution nears the question, but it is not the same. According to the reply by theprise, the complexity (of a DFA engine) is:

O(2^m + n) [where m is the length of the regex and n is the length of the string]

The red book "Algorithm Design Manual" on page 15-16 discusses the time for different algorithms. According to it, when the length of the algorithm is over 1,000,000, the algorithm of O(m^2) is hopeless. It takes 16.7 minutes to process, assuming the operation time of one nano second.

The statement in the book augmented my interest. Can you do a 1,000,000 characters long Regex under 16.7 minutes' processing time? Can you do a catastrophic Regex and the processing time still hold? I really doubt it.

What is the longest possible Regex with the operation time of one nano second in the polynomial time?

A: 

The actual complexity probably depends on the specific regular expression. A simple match for 1000000 'a's takes about 10 seconds:

import re

expr = 'a' * 1000000
re.match(expr, expr)

The complexity in this case seems to be about O(m), but more complex expressions surely would take longer.

sth
I'm sorry, but what is "pebably"?
Svante
If the complexity is greater than O(m), I don'T want to use your regex lib ;)
jfclavette
+2  A: 

This is a meaningless question. Regular Expressions are not an algorithm, they are a language. There are many implementations of this language that all have their own performance characteristics and each individual Regular Expression has its own costs. For instance, alternation /(a|b|c)/ is a parallelizable problem, so on engines that execute the expression in parallel will have better performance than those that those that don't.

This is similar to asking what is the best case for sorting. Some people will tell you O(n log n), but they would be wrong. There answer depends on the algorithm used. There are some sorts (such as radix sort) that have a worst case of O(n).

Chas. Owens