I want to be able to predicate pattern matches on whether they occur after word characters or after non-word characters. In other words, I want to simulate the \b word break regex char at the beginning of the pattern which flex/lex does not support.
Here's my attempt below (which does not work as desired):
%{
#include <stdio.h>
%}
%x inword
%x nonword
%%
[a-zA-Z] { BEGIN inword; yymore(); }
[^a-zA-Z] { BEGIN nonword; yymore(); }
<inword>a { printf("'a' in word\n"); }
<nonword>a { printf("'a' not in word\n"); }
%%
Input :
a
ba
a
Expected output
'a' not in word
'a' in word
'a' not in word
actual output:
a
'a' in word
'a' in word
I'm doing this because I want to do something like the dialectizer and I have always wanted to learn how to use a real lexer. Sometimes the patterns I want to replace need to be fragments of words, sometimes they need to be whole words only.