ansaurus

Question

Answer 1

+1 A:

If I understand your description correctly, whole file consists of a number of such "blocks" of a fixed structure?

In that case, I suggest scanning one by one, and skipping ones not of interest to you. So, your each step should do the following:

Read 8 bytes (using IO#readbytes or a similar method)
From the read header, extract the size (first 4 bytes), and the tag (second 4)
1. If the tag is the one you need, skip following 16 bytes and read size-24 bytes.
2. If the tag is not of interest, skip following size-16 bytes.
Repeat.

For skipping bytes, you can use IO#seek.

Mladen Jablanović 2010-08-06 19:06:20

One annoying aspect of the format is that atoms(blocks) can be nested.

BaroqueBobcat 2010-08-06 20:55:44

Nothing a nice piece of recursion couldn't solve! ;)

Mladen Jablanović 2010-08-07 05:49:41

There is this library, haven't used it though: http://github.com/arbarlow/ruby-mp4info

BaroqueBobcat 2010-08-07 17:34:35

This has to be the right way forward, this way I don't have to scan through all the atoms I don't actually need (seems most mov files have the metadata at the *end* of the file). Just gotta figure out which atoms to push into! Oh, and figure out why some don't fit the pattern…

JP 2010-08-07 20:08:04

Answer 2

A:

Theoretically you can use regexes against any arbitrary data, including binary strings. HTH.

rogerdpack 2010-08-06 23:37:45

Nothing theoretical about it, as long as the regex engine you're using has an 8-bit mode where one byte equals one character.

Jan Goyvaerts 2010-08-07 11:55:43

Answer 3

A:

In Ruby you can use the /n flag when creating your regex to tell Ruby that your input is 8-bit data.

You could use /(.{4})tvsh(.{4})data(.{8})([\x20-\x7F]+)/n to match 4 bytes, tvsh, 4 bytes, data, 8 bytes, and any number of ASCII characters. I don't see any reason why this regex would be significantly slower to execute than hand-coding a similar search. If you don't care about the 4-byte and 8-byte blocks, /tvsh.{4}data.{8}([\x20-\x7F])/n should be nearly as fast as a literal text search for tvsh.

Jan Goyvaerts 2010-08-07 11:55:06

ansaurus

tags:

views:

answers:

Searching Binary Data in Ruby

related questions