If a Ruby regular expression is matching against something that isn't a String, the to_str
method is called on that object to get an actual String to match against. I want to avoid this behavior; I'd like to match regular expressions against objects that aren't Strings, but can be logically thought of as randomly accessible sequences of bytes, and all accesses to them are mediated through a byte_at()
method (similar in spirit to Java's CharSequence.char_at()
method).
For example, suppose I want to find the byte offset in an arbitrary file of an arbitrary regular expression; the expression might be multi-line, so I can't just read in a line at a time and look for a match in each line. If the file is very big, I can't fit it all in memory, so I can't just read it in as one big string. However, it would be simple enough to define a method that gets the nth byte of a file (with buffering and caching as needed for speed).
Eventually, I'd like to build a fully featured rope class, like in Ruby Quiz #137, and I'd like to be able to use regular expressions on them without the performance loss of converting them to strings.
I don't want to get up to my elbows in the innards of Ruby's regular expression implementation, so any insight would be appreciated.