I was wondering if there is such a thing as regular expressions for sequential data that isn't a string.
I know that regular expressions essentially boil down to DFAs, but I'm more interested in higher-level languages for specifying these DFAs.
I was wondering if there is such a thing as regular expressions for sequential data that isn't a string.
I know that regular expressions essentially boil down to DFAs, but I'm more interested in higher-level languages for specifying these DFAs.
You can argue that a grammar is a form of regular expression for things that are more complex than just strings. In principle, you can devise regular expressions on other tokens than just characters. As one option, you could argue that a regex for Unicode is such a creature - it certainly isn't matching simple bytes as the classic regex does.
Hi,
you can use normal context-free parser generators (such as Yacc / Bison) to generate regular language parsers even if the "atoms" are not characters. By hooking into the scanner function, you can then make the grammar to parse "anything" regardless of whether it is a string or not.
Another thing is that in the domain of logics, there are "temporal logics" such as LTL and CTL, which are basically subset of regular expressions for "events". Logical formulae in LTL are translated typically to finite state automata.
There is absolutely nothing in the theory of regular expressions that prevents them from being applied to something else than just strings of characters. It's just that most regular expression engine implementations don't allow that.
However, if you have a regular expression engine that allows you to treat a string as un-encoded 8-Bit data (sometimes called BINARY
, 8BIT
or ASCII-8BIT
), then you can use that engine to parse byte-oriented binary data.
Ragel is a state machine compiler that is specifically designed for parsing binary protocols. You write your state machine in a high-level (regexp-like) DSL and Ragel then compiles that into your target language – Ragel currently supports C, C++, Objective-C, D, Java and Ruby.
Most functional programming languages have powerful pattern matching facilities baked right into the language itself. those facilities can be used to pattern match binary data. One example of this is Erlang's support for building and pattern matching binary data structures.
OMeta is a pattern matching and pattern transformation language that is basically a superset of regular expressions, on steroids. It supports matching of not only strings of characters but also arrays and lists of integers and arbitrary objects.