views:

273

answers:

3

Hi,

I'd like to find patterns and sort them by number of occurrences on an HEX file I have.

I am not looking for some specific pattern, just to make some statistics of the occurrences happening there and sort them.

DB0DDAEEDAF7DAF5DB1FDB1DDB20DB1BDAFCDAFBDB1FDB18DB23DB06DB21DB15DB25DB1DDB2EDB36DB43DB59DB32DB28DB2ADB46DB6FDB32DB44DB40DB50DB87DBB0DBA1DBABDBA0DB9ADBA6DBACDBA0DB96DB95DBB7DBCFDBCBDBD6DB9CDBB5DB9DDB9FDBA3DB88DB89DB93DBA5DB9CDBC1DBC1DBC6DBC3DBC9DBB3DBB8DBB6DBC8DBA8DBB6DBA2DB98DBA9DBB9DBDBDBD5DBD9DBC3DB9BDBA2DB84DB83DB7DDB6BDB58DB4EDB42DB16DB0DDB01DB02DAFCDAE9DAE5DAD9DAE2DAB7DA9BDAA6DA9EDAAADAC9DACADAC4DA92DA90DA84DA89DA93DAA9DA8CDA7FDA62DA53DA6EDA

That's an excerpt of the HEX file, and as an example I'd like to get:

XX occurrences of BDBDBD

XX occurrences of B93D

Is there a way to mine the file to generate that output?

A: 

You can use Regular Expressions to make a pattern to search for.

The regex needed would be very simple. Just use the exact phrase you're searching for. Then there should be a regular expression function in the language you're using (you didn't specify) that can count the number of matches.

Use that to create a simple counter.

Crowe T. Robot
Sorry for not specifying the language. This would be either objective-c/Cocoa or PHP which are the two I can code something on.
Cy.
+1  A: 

This is a pretty classic CS problem. The code in general is non-trivial to implement as it will require at least one full parse of the sequence, and depending on your efficiency and memory/processor constraints might require several. See here.

You will need to partition your input string in some way to ensure that you get a good subsequence across it.

If there is a specific problem we might be able to help more, but the general strategy is in the Wikipedia article above.

GrayWizardx
Can you tell me what CS mean, please?
Cy.
Uh-oh! Huston, I think we have a problem.
pavium
Computer Science
Martin Wickman
Thanks wic, I am Spanish and not very used to the acronyms used at the educational world @ Pavium
Cy.
+2  A: 

Sure. Use a sliding window to create the counts (The link is for Perl, but it seems general enough to understand the algorithm). Your patterns are named N-grams. You will have to limit the maximal pattern, though.

Yuval F
Thank you, this solved my issue!
Cy.