ansaurus

Question

Answer 1

A:

Not having done binary seds before, I can't be sure, but that looks like it will replace all occurrences of what you want with those same occurrences, but leave the rest of the file as is. In other words, I don't think it will change the file at all.

I usually just code up a stdio filter program for small jobs, something like this (filter.c):

#include <stdio.h>
int main(void) {
    int saving = 0;
    int ch, lastch = -1;
    while ((ch = getchar()) != EOF) {
        if (saving) {
            if ((lastch == 0xff) && (ch == 0xd9))
                saving = 0;
            putchar (ch);
        } else {
            if ((lastch == 0xff) && (ch == 0xd8)) {
                saving = 1;
                putchar (lastch);
                putchar (ch);
            }
        }
        lastch = ch;
    }
    return 0;
}

Compile that then just run your input through it:

gcc -o filter filter.c
./filter <inputfile >outputfile

This is a pretty standard filter program which just starts off by echoing nothing. When it finds the character sequence 0xff/0xd8, it starts echoing. When it finds 0xff/0xd9, it stops.

Keep in mind this is what you asked for in the text - no account is taken as to whether it has hex digits only (as per your regex). If this is a problem, the filter program becomes a little more difficult inasmuch as you'll need to store all characters up to the closing 0xff/0xd9 and only output the lot if they were all valid hex digits.

Changing 0xff to 'x', 0xd8 to 'y', 0xd9 to 'z' (all to make debugging easier), then piping in :

"hello1xyhello2xzhello3xyhello4xzhello5"

gives you:

xyhello2xzxyhello4xz

as you would expect.

paxdiablo 2010-04-09 03:51:34

Answer 2

+1 A:

sed might be able to do it, but it could be tricky. Here's a Python script that does the same thing (note that it edits the file in-place, which is what I assume you want to do based on your sed script):

import re

f = open('file.jpeg', 'rb+')
data = f.read()
match = re.search('(\xff\xd8[0-9A-fa-f]+)\xff\xd9', data)
if match:
    result = match.group(1)
    f.seek(0)
    f.write(result)
    f.truncate()
else:
    print 'No match'
f.close()

Adam Rosenfield 2010-04-09 03:52:16

awesome alternative, thank you!

Ryan 2010-04-09 04:14:37

Answer 3

+4 A:

Is there a good way to do this

yes of course, use an image editing tool such as those from ImageMagick (search the net for linux jpeg , exif editor etc) that knows how to edit jpg metadata. I am sure you can find one tool that suits you. Don't try to do this the hard way. :)

ghostdog74 2010-04-09 04:05:04

agree, this is essentially random binary data so you've got a 1 / (2 ** 16) of getting a false positive when searching for any 2 byte sequence. That's about once every 65K of data.

snoopy 2010-04-09 05:24:49

exiftool (http://search.cpan.org/dist/Image-ExifTool/exiftool) is the killer application for media metadata.

daxim 2010-04-09 08:00:13

Just copying my above comment down here:FYI, the purpose of this question was for doing manual file carving in a RAID 5 scenario. When grabbing stripes and chunks you will get data before and after the jpg (or any other file). This was meant to clean it.

Ryan 2010-04-09 15:22:35

Answer 4

A:

Also, this Perl might work (not tested, caveat emptor)... if Python is not installed :)

open(FILE, "file.jpg") || die "no open $!\n";
while (read(FILE, $buff, 8 * 2**10)) {
    $content .= $buff;
}
@matches = ($content =~ /(\xFF\xD8[:xdigit:]+?\xFF\xD9)/g;
print STDOUT join("", @matches);

You need to add binmode(FILE); binmode(STDOUT); on DOS or VMS after the open() call - not needed on Unix.

DVK 2010-04-09 04:07:50

I will give this a shot when I can, thank you for this alternative!

Ryan 2010-04-09 04:14:57

Why the downvote? If this has a bug/doesn't work, please tell me details and i'll fix. If you think this is off-topic, re-read OP: "Even if not using sed?". If you're an anti-Perl bigot, don't be a coward and explain yourself

DVK 2010-04-09 06:02:58

sorry DVK - that was me. I've been bitten by bugs myself when trying to grep for short patterns in binary data. Just think there's a good chance of this mismatching, either on one or other of the anchors or completely picking up a random 'phantom pattern'. I just think that Sooner or later the OP is likely to end up with the odd scrambled jpeg and wonder why! Also downvoted others for the same reason.

snoopy 2010-04-09 06:21:35

If you're saying that OP has an XY problem, please present a better solution than a regex before downloading regex solutions as "bad". If this answer has a bug, please point it out. If there's a specific pattern where regexp approach would fail, please clarify that as an answer (again XY)

DVK 2010-04-09 06:29:08

Also, please note that this solution does NOT change the jpg file. Merely outputs found strings (which I'm guessing might be metadata) to standard out for later redirect/consumption

DVK 2010-04-09 06:31:47

ansaurus

tags:

views:

answers:

binary sed replacement

related questions