I have a text file with two non-ascii bytes (0xFF and 0xFE):
??58832520.3,ABC
348384,DEF
The hex for this file is:
FF FE 35 38 38 33 32 35 32 30 2E 33 2C 41 42 43 0A 33 34 38 33 38 34 2C 44 45 46
It's coincidental that FF and FE happen to be the leading bytes (they exist throughout my file, although seemingly always at the beginning of a line).
I am trying to strip these bytes out with sed, but nothing I do seems to match them.
$ sed 's/[^a-zA-Z0-9\,]//g' test.csv
??588325203,ABC
348384,DEF
$ sed 's/[a-zA-Z0-9\,]//g' test.csv
??.
Main question: How do I strip these bytes?
Bonus question: The two regex's above are direct negations, so one of them logically has to filter out these bytes, right? Why do both of these regex's match the 0xFF and 0xFE bytes?
Update: the direct approach of stripping out a range of hex byte (suggested by two answers below) seems to strip out the first "legit" byte from each line and leave the bytes I'm trying to get rid of:
$sed 's/[\x80-\xff]//' test.csv
??8832520.3,ABC
48384,DEF
FF FE 38 38 33 32 35 32 30 2E 33 2C 41 42 43 0A 34 38 33 38 34 2C 44 45 46 0A
Notice the missing "5" and "3" from the beginning of each line, and the new 0A added to the end of the file.
Bigger Update: This problem seems to be system-specific. The problem was observed on OSX, but the suggestions (including my original sed statement above) work as I expect them to on NetBSD.
A solution: This same task seems easy enough via Perl:
$ perl -pe 's/^\xFF\xFE//' test.csv
58832520.3,ABC
348384,DEF
However, I'll leave this question open since this is only a workaround, and doesn't explain what the problem was with sed.