tags:

views:

250

answers:

1

I wonder how long expression string one can use in REGEXP function in MATLAB? For example to list many words to match, like 'abc|defg|hij|...'.

Worked fine for me with about 500 words (~3K characters), but with very large list (>300K) MATLAB just crashed without any error log. Anybody have an idea on the limit? May it depend in expression syntax?

I know I can use cell array of strings but in this case I cannot use arbitrary number of strings to match. Anyway, I don't need alternatives, just the limit please.

+1  A: 

This could be due to memory limitations and you can increase your java memory using java.opts. Search for java.opts and matlab to find out how to increase your working memory size.

Although, I don't believe this is the intended use of regexp.

The maximum length depends on the platform. See here for details.

dbrien
Agreed. The *theoretical* limit is probably the size of the string that holds the regex, but that's moot. It's just a really bad idea to use a regex like this.
Alan Moore
On my quite old machine with 32-bit XP I can create 5e7-character string (100Mb in memory). Although regexp crashed with expression string ~1e6 chars (after the string was successfully stored in memory). So look like it not only memory problem to hold the string, but regexp implementation, how it uses memory while running. I agree it's not the best use of regexp, more academic interest.
yuk