views:

651

answers:

8

Hello guys, I need a tool to parse Lua table expressions. If all else fails, I will eventually just code a small Lua module to convert tables to XML, but for the time being, I am interested in a Ruby library doing that, but failing that, I would accept tool in any language, provided I can look at its source.

Here is an example snippet (it's a WoW addon output):

CT_RaidTracker_RaidLog = {
{
    ["PlayerInfos"] = {
        ["Nyim"] = {
            ["race"] = "Orc",
            ["guild"] = "Excubitores Noctae",
            ["sex"] = 2,
            ["class"] = "HUNTER",
            ["level"] = 70,
        },
        ["Zyrn"] = {
            ["race"] = "BloodElf",
            ["guild"] = "Excubitores Noctae",
            ["sex"] = 2,
            ["class"] = "WARLOCK",
            ["level"] = 70,
        },
...

Basic idea is, nested associative arrays. Any help or pointer will be examined, any idea is appreciated. Thank you for your attention.

EDIT #1

Due to the disputes, let me clarify what did I try. I complemented the string/regex replacing chain provided by one of the participants, like so:

str.gsub(/--.+$/, "").gsub("=", ":").gsub(/[\[\]]/,"").gsub('" :','":').gsub(/,\s*\n(\s*)}/, "\n\\1}")

I (1) added removal of Lua comments, (2) replaced one of the regex replacers: when you have the last element in an object/array, it still has a comma after it, so that must be covered and the comma properly removed.

Guys, do you notice the double opening curly braces? JSON doesn't like having anonoymous objects. It looks like that:

"xxx" = {
  {
    ["aaa"} = {
      ["bbb"] = {
        "ccc" = 7
        "ddd" = "a string"
        "eee" = "a date/time pattern"
      }
    },
    ["qqq"} = {
      "hm" = "something"
    }
  },
  {
    ["aaa"] = {
    -- ...
    },
    ["qqq"] = {
    -- ...
    }
  }
}

Basically on the root level, we actually have a list/array of similar objects, both having "aaa" and "qqq" section, to follow the example. However, in Lua that is obviously allowed, while in JSON it isn't. Because the opening curly braces are treated like "start an object" but that object doesn't have a name.

I tried to detect that case with regex and replace curly braces with "[]" pairs. While the resulting regex worked, the problem was the same: OK, we define an array of similar objects instead, but the declaration of the array is still nameless.

A possible solution would be instead of detecting and replacing those braces with [], to christen the objects with indexes, like: "0" = { "aaa" = {...} }, "1" = { "aaa" = {... } }, etc. That (hopefully final) workaround will probably make it work... Will report back again. ;)

+3  A: 

It's simple to write a Lua program that outputs tables in XML but it depends on how you want the XML formatted. See also LuaXML, which has xml.save (but is written in C) and this question.

lhf
That's my final resort. ;) Might prove the best solution, though.
dimitko
If you could tell the format of the XML you want, I can post Lua code for the conversion. My point is that the easiest way to parse Lua data files is to load them in Lua.
lhf
Well at the moment I have no 100% clearness of the *exact* schema of the data (googling like mad though), however what's wrong in recursing the nested associative arrays and issuing the proper XML on the way?
dimitko
Nothing wrong but the exact XML does depend on what you want to output.
lhf
Having in mind that the schema is pretty much non-existant (or not locatable by Google at least), I would rather first get a blind conversion going and then I will inspect the XML to find out how am I supposed to consume it.
dimitko
+2  A: 

It is probably going to be simpler to use JSON than xml on this case.

The translation from lua tables is nearly 1-to-1 (change = to :, and remove [ and ] from the keys). This is the JSON equivalent of your example:

{
  "PlayerInfos": {
    "Nyim": {
      "race": "Orc",
      "guild": "Excubitores Noctae",
      "sex": 2,
      "class": "HUNTER",
      "level": 70
    },
    "Zyrn": {
      "race": "BloodElf",
      "guild": "Excubitores Noctae",
      "sex": 2,
      "class": "WARLOCK",
      "level": 70
    },

...

Besides, Rails has built-in JSON-parsing (via JSON::parse).

In order to read it from a ruby app, you would have to do something similar to this:

require 'json' # This is already included on Rails apps

info = JSON::parse(File.read("PlayerInfos.json"))

Then the player infos would be available at:

player_infos = info["PlayerInfos"]

There's also a java JSON parser, but I have no experience with it.

egarcia
Very nice idea, mate, I was thinking about that too, only one problem though -- I must come up with a RegExp-based replaces of course, which by itself is easy stuff, but for the moment I didn't succeed in finding a *short* description of Lua tables format (TLDR variants are all over, though).
dimitko
Probably it is going to be even simpler to use lua itself to parse the source file and generate a json file.
egarcia
That was my point.
lhf
Thing is, I can do that easily in home environment, but the hosting I will put it on, doesn't have Lua. :/ Thus, I need Ruby (or Java, or PHP too actually) means to do it..
dimitko
+4  A: 

I'm probably stating the obvious, but Lua can certainly parse Lua tables. And you can "embed" Lua in pretty much any mainstream language including Java and Ruby (scroll down the link for Java and Ruby bindings). By embed, I mean parsing source files, calling Lua functions and exploring tables, may be even calling functions written in your host language from Lua. It's possible that these binding libraries are more work than exporting your tables to XML/JSON, but it's worth looking at them at least

Edit: level 70? That's so last decade ;)

sbk
+1  A: 

You mention you can only use Java, Ruby or PHP for the parsing of this. An option is to use a tool like ANTLR to generate a little parser for you.

The ANTLR grammar:

grammar Test;

parse
  :  Identifier '=' table EOF
  ;

table
  :  '{' (entry (',' entry)* ','?)? '}'
  ;

entry
  :  key ('=' value)?
  |  String
  |  Number
  ;

key
  :  '[' (String | Number) ']'
  |  Identifier
  ;

value
  :  String 
  |  Number
  |  Identifier
  |  table
  ;

Identifier
  :  ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')*
  ;

String
  :  '"' ~'"'* '"'
  ;

Number
  :  '0'..'9'+
  ;

Space
  :  (' ' | '\t' | '\r' | '\n') {skip();}
  ;

generates a parser that can take input like:

Table = {
  ["k1"] = "v1",
  ["k2"] = {["x"]=1, ["y"]=2},
  ["k3"] = "v3"
}

and transform it into:

alt text

(click here for a full resolution of the image)

Writing an XML from that tree structure is child's play.

But like I said, Lua tables can look quite different from the grammar I posted above: strings can look like:

'a string'
[===[ also ]==] a string ]===]

and keys and values can consist of expressions. But if the trees always look like how you posted it, it might be an option for you.

Good luck!

Bart Kiers
Wow. Going to try it a-s-a-p. :)I figure if I can get away with ANTLR and deduct a hardcoded functions in Ruby to do the job, but probably the best way to go is produce the ANTLR-generated parser and then try re-code it myself (if at all necessary).Thumbs up!
dimitko
That's a great deal of work for something that would be a tiny bitty Lua program. And Lua is very easy to pick up.
Shadowfirebird
I certainly love ANTLR and text parsing... but I think this is a bit overkill.Moreover, if the OP is able to install ANTLR on the server, he's going to be able to install lua too... and use lua itself to do the export.
egarcia
@egarcia, err, you're mistaken (or I don't understand what you mean): you don't need to install ANTLR. You use ANTLR to generate the parser only once. That generated parser can have various target languages, both Java and Ruby are supported, both of which dimitko stated in the original question he could use.
Bart Kiers
@Shadowfirebird, sure, Lua is simple, but to quote dimitko: *"..but the hosting I will put it on, doesn't have Lua. :/ Thus, I need Ruby or Java..."*. As I mentioned in my previous remark: ANTLR can produce a parser for both Ruby and Java. As to your remark *"great deal of work"*: I disagree, the grammar I posted above is all you need to generate a parser that handles recursively nested table without any hassle. I could be wrong, but I guess you never worked with ANTLR to come to that conclusion. I'm not saying that to put you down, but once you know a bit of ANTLR, ... t.b.c.
Bart Kiers
... you'll see that it's not just a tool for parsing complicated languages, it's perfect to parse simple languages (with a recursive nature) and performing some sort of translation from one language to another: a table-like structure to XML in this case.
Bart Kiers
@bart - I might be mistaken, but you still need the ANTLR "runtime" installed - the generated parser depends on libraries that must be installed on the server.
egarcia
@egarcia, yes, good point. You indeed need one (or more) .rb files present on the server. Note that you don't need to install them: just copying them by FTP is enough.
Bart Kiers
Have to check on ANTLR runtimes for Ruby/PHP. I would surely give first prio to a proper parser, plus I know ANTLR community as insanely helpful, plus it is very likely somebody in there already tackled the problem. While using chain-regex replaces is very clever and makes you feel powerful, smart and all, I have enough experience as a developer on 3 languages to try and find the "proper" implementations first.
dimitko
Researched it a bit and it seems to me the above grammar may be incomplete - and since I am no Lua expert, I won't risk it. I could successfully generate Ruby code for parsing ANY Lua, but that seems too big of a performance issue. So I will stick to the risky approach with the chain regex replaces.
dimitko
@dimitko, it is definitely incomplete, like I said in my answer. It all depends on what the exact input can be. But when going the regex-way, remember that it has quite some flaws. You will not be able to properly handle a table entry like `["k[e]y"] = 123` (note the square brackets inside the string literal), to name just one shortcoming.
Bart Kiers
I am well aware of that, mate. Unfortunately that undertaking was supposed to be short, and starting to dig it, it actually consumed quite the effort. So having in mind that I actually will be processing a small subset of the possible Lua table syntax, I will probably just live with the possible risks for the time being.
dimitko
+3  A: 

Skipping the first line and then some ad hoc transformation to JSON.

s=File.readlines("test.luatable")[1..-1].join
JSON.parse(s.gsub("=", ":").gsub(/[\[\]]/,"").gsub('" :','":').gsub(/,\n(.+)\}/,"\n\\1}"))
=> {"PlayerInfos"=>{"Nyim"=>{"guild"=>"Excubitores Noctae", "class"=>"HUNTER",  
    "level"=>70, "sex"=>2, "race"=>"Orc"}, "Zyrn"=>{"guild"=>"Excubitores Noctae", 
    "class"=>"WARLOCK", "level"=>70, "sex"=>2, "race"=>"BloodElf"}}}
Jonas Elfström
Gonna try that very soon as well! I too figured RegExps should be the way to go, but I needed the hard-coded knowledge. Thanks! Will report back when I get the real results. :)
dimitko
+1. I think this is the simplest, most correct answer.
egarcia
I have shown slightly modified version in `EDIT #1` section in the original post.
dimitko
Ultimately, that is the answer to my problem. Albeit a bit risky and that it took some polishment, this is the approach which eventually costs least effort and time to do the job. I have done at least 10 different tests, and the modified regex chain + Ruby code now reliably consume the info. Thanks a bunch! :)
dimitko
A: 

Untested code. And, I'm just following Sbk's link here, so I don't deserve any credit at all.

require 'rubyluabridge'

def lua_data_to_ruby_hash(data)
    luastate = Lua::State.new
    luastate.eval "d = #{data}"
    return luastate.d.to_hash
end
Shadowfirebird
I get that you don't have Lua on the server, but really, it's trivial to install. You have to decide if you want to do a major parsing job in Ruby, with a lot of code, or whether you install Lua.
Shadowfirebird
... or a quick an dirty parse with a couple regexps in ruby and a JSON parse, with no installs.
egarcia
@egarcia, no, regex cannot parse nested tables.
Bart Kiers
@Bart K. But they can remove and replace characters very easily :)
egarcia
@egarcia, yes, I'm not disputing that. But arbitrary nested structures (like these Lua tables) is not something regex can parse/match.
Bart Kiers
What Bart said.
Shadowfirebird
As mentioned in my edit to the original post, eventually I came to realize there is one other issue than just replace and make it proper JSON -- namely the fact that Lua doesn't mind recursing object definitions without giving any name before the opening curly brace. That is invalid in JSON, I am afraid. Also, as you well know, guys, hosting is not something you can really influence; they say "we do have Ruby and 43 gems for it, we do have PHP with these libraries", etc., etc., and if they say "sorry, we don't have the Lua dynamic libraries and Ruby bindings for them, that's a no-no.
dimitko
Ah well, if that's the way it is... sorry...
Shadowfirebird
A: 

Can I point out that Lua does not have regex capabilities, just pattern matched text replacement.

MrBones
I don't need Lua having regex capabilities. I need Ruby code which can recognize Lua table and export it to a structured and well-supported format like XML and YAML. From there on, my Ruby code can read the data and process it accordingly.
dimitko
+2  A: 

Try this code

function toxml(t,n)
        local s=string.rep(" ",n)
        for k,v in pairs(t) do
                print(s.."<"..k..">")
                if type(v)=="table" then
                        toxml(v,n+1)
                else
                        print(s.." "..v)
                end
                print(s.."</"..k..">")
        end
end

toxml(CT_RaidTracker_RaidLog,0)
lhf
Sorry for adding a new answer but formatted code is not allowed in comments.
lhf
For all I know, I might end up doing that manually at home and periodically uploading to the web service. Thanks, will try it when I get the time today or tomorrow.
dimitko