views:

899

answers:

9

I work a lot with serial communications with a variety of devices, and so I often have to analyze hex dumps in log files. Currently, I do this manually by looking at the dumps, looking at the protocol spec, and writing down the results. However, this is tedious and error-prone, especially whem messages contain hundreds of bytes and contain mixtures of big-endian and little-endian data, ASCII, Unicode, compression, CRCs, . . . .

I have written a few Python scripts to assist with the more common cases. But there are lots of protocols to deal with, and it doesn't make sense to spend the time writing a custom script unless I know I'll have a lot of dumps to analyze.

What I'd like is some sort of utility that can automate this activity. So, for example, if I have a textual hex dump like this:

7e ff 00 7b  00 13 86 04
00 41 42 43  44 56 ef 7e

and some sort of description of the message format, like this:

# Field         Size        Byte Order  Output Format
Flag            1                       hex
Address         1                       hex
Control         1                       hex
DataType        1                       decimal
LineIndex       1                       decimal
PollAddress     2           msb         hex
DataSize        2           lsb         decimal
Data            (DataSize)              ascii
CRC             2           lsb         hex
Flag            1                       hex

I'd get output like this:

Flag            0x7e
Address         0xff
Control         0x00
DataType        123
LineIndex       0
PollAddress     0x1386
DataSize        4
Data            "ABCD"
CRC             0xef56
Flag            0x7e

Hardware-based protocol analyzers often have fancy features for doing this kind of thing, but I need to work with textual log files.

Does any such utility or library exist?


Some good answers have come up since I set up the bounty. I guess bounties work!

Wireshark and HexEdit both look promising; I'll take a look at those, and will proabably award the bounty to whichever one suits my needs. But I'm still open to other ideas.

A: 

I'm pretty sure I saw something like that on CPAN. I could be more vague if you like. :-)

Update: It's not exactly what you want, but have a look at Parse::Binary::FixedFormat

Paul Tomblin
Uh-oh, looks like my bounty posting has increased the negative rating for this answer.
Kristopher Johnson
It went down one after you posted the bounty, but that make me post the update, and then it went up one, so I'm net positive on rep.
Paul Tomblin
+2  A: 

Wireshark is quite good at opening network protocols.

iny
Can Wireshark analyze a hex dump from a text log file, or only the stuff it captures itself?
Kristopher Johnson
Wireshark usually expects things to be in PCAP format. However, wrapping things in PCAP dump format isn't very tricky (at least it wasn't very tricky to write a PCAP dump reader).
Vatine
+1  A: 

Typically, I use emacs hexl-mode to view binary files as a "text-dump". When I need more specific output, I just do as you and write a parser in C++.

Bill Perkins
+1  A: 

In my job we were designing network and serial protocols to control embedded hardware. I also got tired of reading dumps wrong, and writing scripts for each protocol, so I wrote a library to do exactly what you describe. You could give it a text file description of the protocol, and it had a gui supporting check boxes for setting single bits, radio buttons for choosing between the valid combinations of bits, and drop-down lists when there were a lot of choices. You could edit the hex view of the data, the binary view of each field, or even point and click at the fields, and all the other views would update. It saved us a ton of time. It's a little quick and dirty, but I'd post it if it wasn't owned by my employer. The point is, it wasn't very hard to write, and once I went away from scripts for each protocol and to one program that could understand a description of the protocol, things were great. We stopped screw ups relating to misreading a dump, and adding new protocols became trivial. Plus the textual description of the protocol went straight into the development specs so the software guys would know what to do with the hardware. I encourage you to take a crack at it.

James Caccese
+2  A: 

I suppose you need a good hex editor. Have a look at hexedit. I have used the free version in the past and it is good, but I don't know if it offers what you are looking for. Basically you want to be able to define a struct and then be able to decode hex data against it. I suppose a good hex editor would support this. Check the paid version of HexEdit or google for another editor; there are many available.

kgiannakakis
+1  A: 

One possible starting point would be libPDL, a C++ library.

Another option may be NetPDL.

Vatine
+1  A: 

You should use the Tcl binary commands for stuff like this. What follows is the starting point for your example above. Tcl is really easy to learn and write scripts in. If you're doing serial comm stuff you owe it to yourself to learn at least the basics.

bash$ tclsh
% binary scan [binary format H* 7eff007b00138604004142434456ef7e] \
  H2H2H2ccH4sa4h4H2 \
  flag1 addr ctl datatype lineidx polladdr datasize data crc flag2
10
% puts "$flag1 $addr $ctl $datatype $lineidx \
  $polladdr $datasize $data $crc $flag2"
7e ff 00 123 0 1386 4 ABCD 65fe 7e

When you did your byte-order stuff you switched around the bytes but not the bits, so I'm not really sure what you were looking for there. Anyway, this will get you started.

Zac Thompson
+1  A: 

Have a look at hexworkshop

I have been using it for years to analyze hex dumps. It has a structure Viewer that lets you define data structure a in C/C++ style and then displays the data in that format.

Charles Faiga
+1  A: 

WinHex supports displaying/editing user-defined record formats. There are some examples at http://www.x-ways.net/winhex/templates/index.html

Andrew Medico