views:

694

answers:

5

Hi, I'm currently building a toy assembler in c# (going through The Elements Of Computing Systems book).

I need to match a very simple pattern, I thought this would be a good time to learn some regex but I'm struggling!

In the following examples I'd just like to match the letters before the '='

M=A

D=M

MD=A

A=D

AD=M

AMD=A

I've come up with the following:

([A-Z]{1,3})=

However this also matches the '=' which I don't want.

I also tried:

([A-Z^\=]{1,3})=

But I still have the same problem - it a matches the '=' sign as well.

I'm using this site to test my regexes.

Any help would be really appreciated. Thank you in advance.

+3  A: 

You need a positive lookahead assertion:

([A-Z]{1,3})(?==)
RichieHindle
thanks for that!!
bplus
+2  A: 

What you want is called a zero-width, lookahead assertion. You do:

(Match this and capture)(?=before this)

In your case, this would be:

([A-Z^]{1,3})(?==)
Conspicuous Compiler
A: 

The following will group everything before the "=" and everything after.

([^=]*)=([^=]*)

it reads something like this:

match any amount of characters thats not a "=", followed by a "=", then any amount of characters thats not a "=".

Nippysaurus
I tried your regex in http://www.nregex.com/nregex/default.aspxIt didn't seem to work- could be something up with the regex engine that site uses? Anyway I've marked an answer now so not to worry. Thanks though.
bplus
The problem with this regular expression might be that, if it's multiline, the second wildcard will match the part after the current equal sign, the newline, and then the characters before the next equal sign. You'd want to add the delimiter character inside the second pair of square brackets.
Conspicuous Compiler
A: 

Just to thank you guys. Saved me a lot of time.

A: 

You can also put the equals sign in a non-capturing parans with (?: ... )

([ADM]{1,3})(?:=)

It's been a bit since I did this chapter of the book but I think that since you need both parts of the expression anyway, I did a split on the = resulting in myArray[0] == M, myArray[1] == A

Dinah
The non-capturing parens won't do anything useful. The equals sign will still be "captured" as part of the overall match, which is what the OP was trying to avoid.
Alan Moore