I have some data files to import into a database with some "unique" delimiters:
Field Separator (FS): SOH (ASCII character 1)
Record Separator (RS) : STX (ASCII character 2) +’\n’
I'd like to import the files into Postgres using the COPY command but while I can specify a custom field delimiter, it can't handle the record separator.
I can't just strip out the \002 from the data either, because if there is a newline in one of the fields (and there are) it will incorrectly case the COPY to think it is a new record when in fact it is not.
One important thing to note: it's not important that newlines in fields are preserved, it's fine if they are just converted into a space.
With this in mind, I was thinking of using something like "sed" to convert newlines into spaces, then convert \002 into newlines. However, since sed is a line-based tool it doesn't seem to see the newlines at the end of each line and can't do a search/replace on them.
Are there any other unix command-line tools that could do the job?
EDIT: I guess what I'm really asking for is a unix utility that can process a file (perform search/replace) as "binary" without splitting it up into lines