views:

1177

answers:

9

Many of us have been indoctrinated in using XML for storing data. It's benefits and drawbacks are generally known, and I surely don't want to discuss them here. However in the project I'm writing in C++, I'm also using Lua. I've been very surprised how well Lua can be used to store and handle data. Yet, this aspect of Lua is less recognized, at least in the game programming world.

I'm aware that XML has it's advantages in cases like sending data over the internet, and in places where safety comes into play (using data downloaded from the net for example, or loading user-editable configuration files) and finally in cases where the same data is being read by programs in different languages.

However once I learned how nice and easy it is to handle data using Lua (especially having luabind to back you up!), I started to wonder is there any reason to use XML to store game data, if we already use Lua anyway?

Blizzard, while using Lua for scripting the UI, still stores the layout in XML. Is the reason something that is only UI related?

What are the drawbacks of using Lua as a data storage language?

A: 

I usually stay away from XML, mainly due to its verbosity. Most of the time I use CSVs or SQLite for data storage. I use scripting languages like Lua, Python or Scheme to provide an interface for the user to extend the software.

Vijay Mathew
CSV's and SQLite may not be verbose, but not easily tool-human readable either...
Kornel Kisielewicz
+3  A: 

If the lua scripts are user accessible, then the subset of lua you are using for data specification must be enforced, otherwise what you have is not just a data language, but a security hole. The problem of enforcing such a subset is not trivial.

Are you aware of json? That might be more the sort of thing you're after.

there's two implementations here: http://json.luaforge.net/

and here http://www.chipmunkav.com/downloads/Json.lua

if json isn't powerful enough, another option is YAML, a superset of JSON, though I couldn't tell you if there are any decent implementations of that in Lua.

Breton
Security isn't an issue in my case, as I'll allow modding of the engine anyway. However, while JSON isn't something I was looking for, you pointed me to something that has really grabbed my interest for other purposes :)
Kornel Kisielewicz
+5  A: 

I would say the biggest disadvantage is that it's harder for other tools to manipulate that data. If you store the data directly in Lua, then you need to write a Lua parser in order to manipulate that data automatically, while every environment has XML parsers and generators. This isn't a problem just for tools in other languages; if you want to write a GUI for editing your configuration, do you have tools which can parse, modify, and write out the configuration data in a way which doesn't affect merging in your version control system? That can be important for big projects with lots of people who may be editing the configuration concurrently.

That said, this same idea is what led to JSON, which is a subset of JavaScript used as a data format. But there are a lot of tools currently which support JSON, and probably not many that support Lua's syntax.

Another problem that comes up occasionally when you have your code be your configuration is that people start writing code to generate configuration, or to add abstractions to your configuration file. And then any users who might want to customize your program could wind up confused, having to learn how to program and a whole programming language rather than a relatively simple configuration language.

Brian Campbell
I'm not worried about the downsides that I mentioned in my own post. It's the question of whether there are other ones that I haven't thought of that is bugging me.
Kornel Kisielewicz
Oops, I missed the part where you already mentioned that problem.
Brian Campbell
Added a bit more detail to go beyond what you had already mentioned. I've worked a lot recently with code as configuration in a multimedia engine (Scheme, not Lua, but the same basic idea), and it has its advantages and disadvantages, so I figured I'd mention some of the issues we've run into.
Brian Campbell
Lua parses itself, and is it's greatest tool :> -- check the example here: http://www.lua.org/pil/12.htmlConsidering that they'd need to write Lua scripts that are already in the engine, the last argument is not valid in this case -- especially, that lua-based data files look a lot more user-friendly than XML files.
Kornel Kisielewicz
I imagine that scheme can be scary for a non-prepared user though :)))))))))
Kornel Kisielewicz
Hmm. In that example, it looks like you need to need to hand-roll your serialization, and it won't necessarily retain the original formatting (if, say, someone hand edited, and then the tool wrote it out with different formatting), which can be important if you need to be able to check it into a version control system. Anyhow, I'm not saying this is a bad idea; we chose the same basic idea for our system, because having your configuration be actual code can be quite powerful. You asked for disadvantages in the question, so I'm pointing some out, but there are plenty of advantages as well.
Brian Campbell
If you're using C, just link with and load the Lua engine. If you're using Python, Ruby, or Haskell, just write a binding. Done.
Justice
+2  A: 

If not security, then consider discipline. If the full range of LUA is available in a data file, the possibility remains of adding logic or behavior to your data file. Having these entities that are both data and behavior can complicate the management of a large project: Suppose for instance you built a game engine, and a whole game. Now you want to strip out all the specific content, reuse the game engine to make a new game. If content/data is safely partitioned from behavior, then this is fairly straightforward. If in a moment of weakness you decided one day that the best solution to a problem is to store a function as data, then things get a bit weird.

You might be able to enforce such discipline within your team, but of the possibility is there for a user, it will get exploited. Then down the road you decide to change the data format, and those user extensions are not portable because they include lua which is not data!

Such entities are more difficult to manipulate programmatically on a massive scale.

And there's all the other issues that come from not properly seperating concerns. If you allow your data and code to mix, it will all probably still work, but the more tangled up things get, the more difficult they will be to reason about.

Breton
Valuable remarks, thank you!Yet still, the data loader can enforce discipline, like the old "define" trick locking freeform declaration of global functions/variables. And, for instance, you can still enforce that a separate Lua VM loads the data files (and that VM is stripped out of any scripting API), while the other loads the script files.
Kornel Kisielewicz
+3  A: 
Norman Ramsey
Is it easily convertable "provided that...", or just simply easily convertable in any case, even given the full range of expressible entities in lua, such as virtual tables?
Breton
Thanks, the first point is one that I didn't really think about, but I doubt that it will be an issue, as in case of data I'm using it similarily to XML. The second point also won't be a problem -- I don't plan on storing meshes as Lua structures (yes, that would be an unwise idea).
Kornel Kisielewicz
@Breton: easily convertible provided the result can be represented at all. And since XML is, in the end, text, stuff is likely to be representible. But edited answer anyway.
Norman Ramsey
+4  A: 

This might not be the kind of answer you expected, but it might help you make your decision.

Blizzard (WoW) uses XML to define UI. It's kinda like XAML in C#, just a lot less powerful and most addons just use XML to bootstrap the addon and then build UI in lua code.

Also WoW actually stores addon "Saved Variables" in .lua files.

In my opinion it doesn't mater that much. Choose something you like and which is easy to use for those who are going to extend your engine.

The good thing about XML is that there are A LOT of tools and code already written to test, write and parse XML which means it could save you some time. For example XML Schema's are very useful for validating user written files (security is just a side effect, the good thing is that if it passes your schema, the data is most likely 100% safe and ready to be plugged into your engine) and there quite a few validators already written for you to use.

Then again some users are scared from XML files (even though they are very readable, maybe too readable) and would prefer something "simpler". If it's just for storage (not configuration) then no one is going to edit those file anyway in most cases. XML will also take more space then lua var dump (shouldn't matter, unless you have a lot data).

I don't think you can go wrong here. Blizzard is using lua for storage and I quite like like how it works.

Maiku Mori
Actually this is exactly the kind of answer that I wanted to get :). Thank you! Yes, I agree that validation and other XML tools are a great benefit, and I didn't think of that before.
Kornel Kisielewicz
To clarify this a bit: Blizzard uses XML for files that are sourced from external tools. When games are persisting their own data, for their own consumption, its better to store the data in a format thats native to the game engine.
Chris Becke
+5  A: 

Thank you for your answers so far! I'll take the liberty of summing up the points for future reference.

Drawbacks of using Lua to store data, compared to XML

  1. Security both when transferring data and when storing it, especially when receiving data from an unknown source
  2. Portability, and ease of access to data by other tools
  3. Less useful existing tools like XML Schema validators
  4. Less support for parsing the data
  5. Problems with circular references
  6. Less restrictive -- harder to enforce proper conventions, usage and design
  7. Slower parsing and probably slower lookup

Benefits of using Lua to store data, compared to XML

  1. Using a single language for both scripting and data (no need for separate files, no need for special cases when loading)
  2. Less code and dependencies because of the usage of a single language
  3. More verbose and (arguably) more human readable
  4. A lot more flexible, and extensible (it's a programming language after all)
  5. Data is "executed" and can be accessed when needed as it sits in the virtual machine
  6. Takes less space, especially if compiled to bytecode

If I missed something in the first list, please point it out!

Kornel Kisielewicz
+2  A: 

I have done a few projects that used Lua as the data storage/config language

The key element in the decision to use it was "Were we already using Lua on that project?"

Another thing is that its portable to anywhere you can compile the lua interpreter. Its the same on all the platforms - there are no special xml libraries

JSON is a good alternative.

The only way ill use XML these days is with annotated code-gen for the serializers .NET xml or JAXB for example).

sylvanaar
+1  A: 

Remember that the Lua VM and language are very flexible. You can Use function environments to implement whatever form of safety you want, and then use policy to avoid running "malicious" code. Simply eliminate EVERYTHING from the environment you loadstring() your data with and the only dangerous thing your "data" can do is run a loop and burn CPU.

The other thing to remember is that you can always convert a Lua table into XML and vice-versa, though the mapping of elements and attributes onto Lua tables will result in some weird patterns in tables.

Armentage
+1: thank you for the added valuable comment!
Kornel Kisielewicz