tags:

views:

82

answers:

1

The text book teaches us to write regular expressions using the epsilon (ε) symbol, but how can I translate that symbol directly to code without having to completely rework my regular expression?

For instance, how would I write this regex which would catch all lowercase strings that either begin or end in a (or both).

Not 100% sure this is correct but...

((a|epsilon)[a-z]*a) | (a[a-z]*(a|epsilon))

So some strings that should match include:

a //single "a" starts or ends with "a"

aa //starts and ends with "a"

ab //starts with "a"

ba //ends with "a"

aba //starts and ends with "a"

aaaaaaaa //starts and ends with "a"

abbbbbbb //starts with "a"

bbbbbbba //ends with "a"

abbbbbba //starts and ends with "a"

asdfhgdu //starts with "a"

onoineca //ends with "a"

ahnrtyna //starts and ends with "a"

I only what to exchange epsilon for the correct symbol, I do not want to modify any part of the rest of the expression. Also I want to be clear, I am not actually checking for the epsilon symbol, I want to have a choice of a character or nothing (well not nothing... epsilon).

Does such a symbol exist?

Is what I want possible?

+4  A: 

Just omit the , since it denotes the empty string:

([1-9]|)[0-9]*

There’s also a shortcut for this particular case:

([1-9]?)[0-9]*

The ? means zero or one occurrences of the preceding token.

Konrad Rudolph
@Konrad Rudolph thanks, the example I gave in my question originally was not a good one, please look at my revised question. Also, I am aware of `?`, but I was not sure if that was as close to an epsilon as I could get or not.
typoknig
@typoknig: Once again, just omit the symbol! Your new example introduces nothing that invalidates this technique, and in fact this technique will **always** work. Apart from that, your expression can be made much simpler: `a[a-z]*|[a-z]*a`
Konrad Rudolph
@Konrad Rudolph thanks for your input, and I see now that the regex in my example could have been simpler. The reason I wanted to use `epsilon` instead of some other symbol is that `epsilon` is what is used in my text book, so when I am discussing these expressions with others in class I want to be on the same page... using the same methods and symbols. To that end I wanted the to use epsilon in a coded regex so I could check my work as I went along.
typoknig
@Konrad Rudolph in the regex checker I use ( http://gskinner.com/RegExr/ ) an "empty string" after an "`|`" is not picked up, but that is not to say that it wouldn't work in some/most code. Thanks for your help.
typoknig
@typoknig: I suspect that that’s a bug since all major regex engines do in fact support this (I checked …, `foo|` works everywhere while the gskinner RegExr marks it as an error).
Konrad Rudolph