tags:

views:

74

answers:

3

Hello,

I am doing a regex search on binary files, and I've just discovered a problem, every so often, a 64 byte checksum is used, which throws my searches out. What I want to know is; is there a way to ignore these 64bytes, regardless of where they appear in my data?

My regex is \x18\xC0\x40[\x42\x43][\x00\x01]\x00\x00\x00

my problem is illustrated below;

0230000000FF45198085B918C0404301

FFFFFFFFFFFFFFFFC03CCFFFFFFFFFFF

FFFFFFFFFFFFFFFF3C0CFFFFFFFFFFFF

FFFFFFFFFFFFFFFF0300F0FFFFFFFFFF

FFFFFFFFFFFFFFFF030F0FFFFFFF4700

000000B9000000003C8085B9EDDF0000

In my example, my regex (values needed in bold) obviously doesn't pick up my pattern match. This can happen at any point in the required data as well.

An observation for the checksum data is it always ends 4700, and it is always 8 bytes of FF, followed by 3-4 bytes of values, followed by 4-5 bytes of FF again.

Any help would be greatly appreciated, thanks James

+1  A: 
\x18\xC0\x40[\x42\x43][\x00\x01][^\x00\x00\x00]*\x00\x00\x00
Erik
tried this, but it doesnt work.... used power grep as well but no luck
James
which language / RegExpParser are you using?
Erik
`[^\x00\x00\x00]*` - What do you expect that to do?
Alan Moore
[] - group tag / [^] matches on anything except the following sequence / []* matches never or one time => so: [^\x00\x00\x00]* should match on anything except "\x00\x00\x00".
Erik
[] is character class so [^\x00\x00\x00] is equivalent to [^\x00]
M42
@Erik — You need a negative look-ahead for that: `(?:(?!\x00\x00\x00).)*` (I think)
Ben Blank
@Ben: yes, that's right.
Alan Moore
+2  A: 

You should probably use two passes for your search. In the first pass you delete all these checksum block, which should be easy enough to identify, in the second pass you do your actual search.

Otherwise, you'd have to allow for a checksum block after each letter of your expression, resulting in a very long and hard to read one.

Jens
+2  A: 

Try this :

\x18\xC0\x40[\x42\x43][\x00\x01](?:\x00{8}[\x00-\xFF]*?\x47\x00)\x00{3}

Updated, this will work if checksum is everywhere. I inserted linefeeds for readability

\x18(?:\x00{8}[\x00-\xFF]*?\x47\x00)
\xC0(?:\x00{8}[\x00-\xFF]*?\x47\x00)
\x40(?:\x00{8}[\x00-\xFF]*?\x47\x00)
[\x42\x43](?:\x00{8}[\x00-\xFF]*?\x47\x00)
[\x00\x01](?:\x00{8}[\x00-\xFF]*?\x47\x00)
\x00(?:\x00{8}[\x00-\xFF]*?\x47\x00)
\x00(?:\x00{8}[\x00-\xFF]*?\x47\x00)
\x00
M42
This worked, thanks. Problem is; it will only work if the checksum falls at that point in the data. I need to account for checksum happening at ANY point in the data. I think I'm either going to have a very large regex, or remvove the checksums on a first pass as Jens says.
James
Hmmm, cant get this to work. I will keep tweaking it. Using your example it takes a long time to run, so I dont think it will be usable. I have tried improving the checksum search part as follows (note the first part is 8 FF not 8 00) (?:\xFF{8}[\x00-\xFF]{54}\x47\x00)This works on its own in power grep and finds all the checksums, but when I do the complete search, I get no results.
James