views:

94

answers:

2

I needed to create a custom file format with embedded meta information. Instead of whipping up my own format I decide to just use Lua.

texture
{
   format=GL_LUMINANCE_ALPHA;
   type=GL_UNSIGNED_BYTE;
   width=256;
   height=128;
   pixels=[[
<binary-data-here>]];
}

texture is a function that takes a table as it sole argument. It then looks up the various parameter by name in the table and forward the call on to my C++ routine. Nothing out of the ordinary I hope.

Occasionally the files fail to parse with the following error:

my_file.lua:8: unexpected symbol near ']'

What's going on here?
Is there a better way to store binary data in Lua?


Update

It turns out that storing binary data is a lua string is non-trivial. But it is possible when taking care with 3 sequences.

  • Long string literals cannot have an embedded closing long bracket.
    This one is pretty obvious.

  • Long string literal cannot end with ]== that matches the closing long bracket. This one is more subtle. Luckily the script will refuse to compile if you get this wrong.

  • Cannot embed \n or \r.
    Lua's built in line-end processing messes these up. This problem is much more subtle. The script will compile fine but it will yield the wrong data. 0x13 replaced with 0x10, some 0x13's just missing.

To get around these limitation I split the binary data up then concat the various parts back together. I use a python script to generate output like this:

input:='XXXX\nXX]]XX\r\nXXXX]='

texture
{
   format = RGB;
   lg_width = 8;
   lg_height = 7;
   pixels= '' ..
      [[XXXX]] ..
      '\n' ..
      [=[XX]]XX]=] ..
      '\r\n' ..
      [==[XXXX]=]==];
}
+1  A: 

The binary data needs to be encoded into printable characters. The simplest method for decoding purposes would be to use C-like escape sequences for all bytes. For example, hex bytes 13 41 42 1E would be encoded as '\19\65\66\30'. Of course, then the encoded data is three to four times larger than the source binary.

Alternatively, you could use something like Base64, but that would have to be decoded at runtime instead of relying on the Lua interpreter. Personally, I'd probably go the Base64 route. There are Lua examples of Base64 encoding and decoding.

Another alternative would be have two files. Use a well defined image format file (e.g. TGA) that is pointed to by a separate Lua script with the additional metadata. If you don't want two files to move around then they could be combined in an archive.

Judge Maygarden
I chose to use the Lua raw strings after reading the remark that they allow for embedded NULLs. NULL isn't printable so I assumed it was allowed to embed the others as well. Usually (>90%) it works. quotes from §2.1 of the manual, *"Strings in Lua can contain any 8-bit value, including embedded zeros"* and speaking of `long bracket` form strings *"They can contain anything except a closing bracket of the proper level."*
caspin
The former quote specifically refers to escaped values (i.e. \ddd) while the latter probably assumes use of a text editor--and thus exclusive use of printable characters--to generate the source code.
Judge Maygarden
How are you putting the binary data into the script file?
Judge Maygarden
With python. I could do it with any programming language. My script accepts the image file on the command line. Grabs and filters the image data. Writes the lua code up to the `[[\n` then dumps the image data. Finally it writes the closing `]];\n}`
caspin
I don't know this for sure, but I imagine that the Lua interpreter chokes on non-printable characters. Yes, Lua strings can contain non-printable characters, but that doesn't mean the Lua interpreter can. I would still advise using a text encoding and then decode into a binary at runtime.
Judge Maygarden
Even C compilers can't do what you are asking. The closest thing is to use the common 'bin2c' program to turn a binary into an array of characters using hexadecimal literals. See http://lua-users.org/wiki/BinToCee
Judge Maygarden
+2  A: 

Lua is able to encode most characters in long bracket format including nulls. However, Lua opens the script file in text mode and this causes some problems. On my Windows system the following characters have problems:

Char code(s)      Problem
--------------    -------------------------------
13 (CR)           Is translated to 10 (LF)
13 10 (CR LF)     Is translated to 10 (LF)
26 (EOF)          Causes "unfinished long string near '<eof>'"

If you are not using windows than these may not cause problems, but there may be different text-mode based problems.


I was only able to produce the error you received by encoding multiple close brackets:

a=[[
]]] --> a.lua:2: unexpected symbol near ']'

But, this was easily fixed with the following:

a=[==[
]]==]
gwell
That did it! thanks. Besides checking that there isn't a `]]` in the binary data, I need to check that the data doesn't end with `]`. When I find either of those I add some `=`s to the string delimiter.
caspin
Well, that almost did it. `[==[` still produces an error when end of the data is `]==`. So now I check that the first part of the delimiter(`]=*`) doesn't show up at the end of the binary data. When it does I keep adding =s until `]={n}]` isn't found in the string and `]={n}` doesn't show up at the end.
caspin