I've done something like that before, though I had to basically write my own engine for it. There's nothing magic about ASCII (or Unicode or any other character set), and when they teach regular expressions in school they usually use a tiny set of arbitrary symbols (like Σ = {a, b}) to keep things simple. The algorithms still work the same.
Most of the features of Perl-style regex engines are specific to characters. Some features like ^
and $
still work fine. Some like [:alnum:]
make no sense at all. And others like [3-5]
can be adapted to work with non-character strings.
One tricky bit (already noted by polygenelubricants and others) is that Perl regexes work well because the thing you're using to describe the language, and the thing you're matching, are both character strings -- the syntax doesn't work nearly as well for non-character-string alphabets. So /[3-5]/
in characters might need to be [3,4,5]
(a list of integers), and so you need to build the language from expressions, rather than strings (unless you want to write your own parser!).
Why aren't most regex libraries generic on alphabet? Beats me -- it's a tremendously useful tool, and seems a terrible waste to apply it only to character strings. LINQ is nice but I'm not sure how it would help here.