I'm trying to write a parser for the EDI data format, which is just delimited text but where the delimiters are defined at the top of the file.
Essentially it's a bunch of splits() based on values I read at the top of my code. The problem is theres also a custom 'escape character' that indicates that I need to ignore the following delimiter.
For example assuming * is the delimiter and ? is the escape, I'm doing something like
use Data::Dumper;
my $delim = "*";
my $escape = "?";
my $edi = "foo*bar*baz*aster?*isk";
my @split = split("\\" . $delim, $edi);
print Dumper(\@split);
I need it to return "aster*isk" as the last element.
My original idea was to do something where I replace every instance of the escape character and the following character with some custom-mapped unprintable ascii sequence before I call my split() functions, then another regexp to switch them back to the right values.
That is doable but feels like a hack, and will get pretty ugly once I do it for all 5 different potential delimiters. Each delimiter is potentially a regexp special char as well, leading to a lot of escaping in my own regular expressions.
Is there any way to avoid this, possibly with a special regexp passed to my split() calls?