ansaurus

Question

Regular expression algorithm for integer lists.

Answer 1

A:

I doubt it, mostly because it is so ambiguous. Just looking at the example that you provided, do you mean to match this:

{1, 2, 2, 2, 2, 5}

or this:

{12, ..., 5}

Sure, you could possibly improve the syntax slightly to fix this, but you would likely end up with a very messy syntax.

It would be just too complicated, and I'm sure that there are much better ways of doing it (list comprehensions, LINQ, etc).

a_m0d 2010-07-15 04:38:03

It's not really "ambiguous", one could easily have a syntax like `sequence(1, repeated(2), 5)` etc.

polygenelubricants 2010-07-15 04:54:34

@polygenelubricants - sure, that would work (would be an interesting exercise to implement it) but I really don't think that the string pattern would work.

a_m0d 2010-07-15 05:03:44

Hi, Thanks for the reply. Let me make it more clear. Usually regular expression algorithms works for ASCII characters. The Σ = {ASCII code}. I want to have an algorithm that can support self defined Σ. Such as Σ' = {integer}. Say I have a list a = {1,2,2,2,2,5}, in which contains a 1, followed by four 2 and followed by one 5. Then I specify the regex pattern is (1)(2)*(5). Every number in the parentheses denotes a number. And I check whether the list match the regex pattern or not. Thanks again.

ausgoo 2010-07-15 06:46:25

Answer 2

A:

You can use something like marge(), where marge will just make a string/character sequence having all members of an array-

a.marge().match("12*5");

Sadat 2010-07-15 04:43:27

That's nice for single digits. Sure, you can use a separator, `join` the array, and work on a `"1,2,2,2,2,5"` string with a regex (which isn't the best idea, by the way), but somehow I think the OP needs more than that, maybe to combine regex abilities on a collection. Something like `"[:even:]*(?=[:negative:])"`

Kobi 2010-07-15 05:00:03

@kobi, that would be really good, if found

Sadat 2010-07-15 05:22:19

Hi, Thanks for the reply. I guess I need more than that. Of course I can change the Integer[] to one String, and use the regular expression matching on this String, but I think using Integer[] should be faster than using String. That's the only concern. Currently, the regex algorithms works for the ASCII characters. So, the symbol set is Σ = {ASCII code}. What I want is we have a symbolc set Σ' = {integer}. So, every single character becomes the integer. Thanks again.

ausgoo 2010-07-15 06:37:20

@ausgoo - and then what? Most modern regex flavors **don't use** ASCII characters, but Unicode characters, and you can map them to a number (`\u263A` or `\x{263A}`). What's next? How will you capture, search, validate or replace?

Kobi 2010-07-15 06:50:08

Hi, I just want to check whether the list matchs the given pattern or not. Using java language syntax. Say we have an arraylist which contains a bunch of numbers. I want to check whether the sequence of the number matches the following pattern (1)(2)*(5). I can specify the pattern as sequence(1,repeat(2),5). I then can get the result via calling list.match(sequence(1,repeat(2),5)); Something like that. Thanks again.

ausgoo 2010-07-15 07:14:27

Answer 3

A:

int[] a = {1,2,2,2,2,5}; 
a.match("12*5");

Assume that you are trying to match "122225" against regular expression "12*5". Generate string from in using snprintf in C/C++ or .toString() in Java etc. should be clean and simple.

Not recommend you to get a special algorithm or tool for this.

ttchong 2010-07-15 04:48:46

What language are you using that provides a `match()` function on lists / arrays?

a_m0d 2010-07-15 04:53:03

@a_m0d - that code is just copied from the question.

Kobi 2010-07-15 04:54:59

@a_m0d: I copy the code from the question for easy reference at my answer.

ttchong 2010-07-15 05:00:50

oh, sorry, I thought that this was a sample implementation

a_m0d 2010-07-15 05:02:06

Answer 4

+1 A:

I've done something like that before, though I had to basically write my own engine for it. There's nothing magic about ASCII (or Unicode or any other character set), and when they teach regular expressions in school they usually use a tiny set of arbitrary symbols (like Σ = {a, b}) to keep things simple. The algorithms still work the same.

Most of the features of Perl-style regex engines are specific to characters. Some features like ^ and $ still work fine. Some like [:alnum:] make no sense at all. And others like [3-5] can be adapted to work with non-character strings.

One tricky bit (already noted by polygenelubricants and others) is that Perl regexes work well because the thing you're using to describe the language, and the thing you're matching, are both character strings -- the syntax doesn't work nearly as well for non-character-string alphabets. So /[3-5]/ in characters might need to be [3,4,5] (a list of integers), and so you need to build the language from expressions, rather than strings (unless you want to write your own parser!).

Why aren't most regex libraries generic on alphabet? Beats me -- it's a tremendously useful tool, and seems a terrible waste to apply it only to character strings. LINQ is nice but I'm not sure how it would help here.

Ken 2010-07-15 05:33:29

Hi, thanks for the reply. Yeah, I think my understanding is also to make regular expression apply for a own defined symbols like (like Σ = {a, b}) . The reason I don't want to use String is I think String is slow compared with integer lists.

ausgoo 2010-07-15 06:32:08

ansaurus

tags:

views:

answers:

Regular expression algorithm for integer lists.

related questions