views:

235

answers:

6

I've got a bunch of binary files, each containing an embedded string near the end of the file but at different places (only occurs once in each file). I need to extract the part of the file starting at the location of the string till the end of the file and dump it into a new file.

eg. If the file's contents is "AWREDEDEDEXXXERESSDSDS" and the string of interest is "XXX", then the part of the file I need is "XXXERESSDSDS".

What's the easiest way to do this in bash?

A: 

Would strings and grep do you want?

e.g.

strings -n 3 myfilename | grep XXX
Colin Pickard
It only returns the string, not the bit that follows. I need everything from the start of the string till the end of the file.
ilitirit
+1  A: 

In PERL, there is a variable built in that specifically refers to the part of the string after the matched regular expression. That would be the method I would use. It is not just Bash and utilities, but PERL is so commonly installed that you should be OK.

Grant Johnson
Most text oriented utilities in the unix standard command line handle binary data poorly and/or incorrectly as they make assumptions like no '\0' characters in the file. This is why you will have more success using a program like Perl or Python which has no such limitations.
msw
A: 
 strings -n3 file_binary | awk '/XXX/{gsub(/.*XXX/,"");print}'
ghostdog74
Prints a single blank line on my system.
ilitirit
This output stops at the next newline character!
ypnos
... `awk '/XXX/{gsub(/.*XXX/,"");p=1}p{print}'`
vladr
+1  A: 

Following is a small hack shell solution that is not very performant. But it works.

Write the script file tail.sh as follows:

#!/bin/sh
dd bs=1 if=$1 of=$2 skip=`grep --binary-files=text -m1 -b -o $3 $1 | cut -d ':' -f 1 | head -1`

Call tail.sh INPUTNAME OUTPUTNAME PATTERN

p.s.: sorry forgot one option to grep in first post

ypnos
Gives me this error: "dd: invalid number `'". By the way, this was on a test file. I let it run for a few minutes on a 9mb file and it didn't complete.
ilitirit
Well as I said it is very slow. Maybe it was even slower for you as the grep didn't work right. better try again.
ypnos
Now it gives me the error dd: invalid number `\r'
ilitirit
I cannot reproduce your error. are you using bash?
ypnos
A: 

Try this:

grep -ao string.* filename

Since you have binary data, you might want to redirect the output to a file.

grep -ao string.* filename > binary.out

Or pipe it through hexdump or similar for testing:

grep -ao string.* filename | hd
Dennis Williamson
Thanks, it fails when it hits a newline character though.
ilitirit
This output stops at the next newline character!
ypnos
A: 

I came up with this solution:

ls -1 *.bin | xargs strings -n4 --radix=d -f | grep "string" | awk '{sub(/:/, ""); print $2 " " $1 " " $1".";}' | xargs -l1 split -b && rm *.aa

ls -1 *.bin Print only the filenames with the extension "bin" in a list format

xargs strings -n4 --radix=d -f List all the strings in the file and their positions and include the filename in the output

grep "string" Print lines containing "string" (it only occurs once in each file)

awk '{sub(/:/, ""); print $2 " " $1 " " $1".";}' Remove the colon after the filename added by strings, and print the position of the string, the filename, and the filename with a period (this line is used as the arguments for the split command

xargs -l1 split -b Execute the split command for each line using the output of awk as the rest of the arguments

rm *.aa Delete the first parts of the split files. "aa" is the default suffix for the part of the split files.

There are probably better/faster/safer ways of doing this but it's fine for my purposes.

ilitirit