How to do search/replace in a "binary" file from the command line

I have some data files to import into a database with some "unique" delimiters:

Field Separator (FS): SOH (ASCII character 1)

Record Separator (RS) : STX (ASCII character 2) +’\n’

I'd like to import the files into Postgres using the COPY command but while I can specify a custom field delimiter, it can't handle the record separator.

I can't just strip out the \002 from the data either, because if there is a newline in one of the fields (and there are) it will incorrectly case the COPY to think it is a new record when in fact it is not.

One important thing to note: it's not important that newlines in fields are preserved, it's fine if they are just converted into a space.

With this in mind, I was thinking of using something like "sed" to convert newlines into spaces, then convert \002 into newlines. However, since sed is a line-based tool it doesn't seem to see the newlines at the end of each line and can't do a search/replace on them.

Are there any other unix command-line tools that could do the job?

EDIT: I guess what I'm really asking for is a unix utility that can process a file (perform search/replace) as "binary" without splitting it up into lines

Multiple passes isn't the problem - I can do this just by piping several unix commands together. The problem I'm seeing is that "sed" consumes the data on a line-by-line basis which effectively hides the final "\n" from the substitution command - it's untouchable.

Marc Novakowski 2008-12-19 03:41:31

ansaurus

tags:

views:

answers:

How to do search/replace in a "binary" file from the command line

related questions