The best thing about lua is that it has a lightweight VM, and after the chunks get precompiled running them in the VM is actually quite fast, but still not as fast as a C++ code would be, and I don't think calling lua every rendered frame would be a good idea.
I'd put the game state in C++, and add functions in lua that can reach, and modify the state. An event based approach is almost better, where event registering should be done in lua (preferably only at the start of the game or at specific game events, but no more than a few times per minute), but the actual events should be fired by C++ code. User inputs are events too, and they don't usually happen every frame (except for maybe MouseMove but which should be used carefully because of this). The way you handle user input events (whether you handle everything (like which key was pressed, etc) in lua, or whether there are for example separate events for each keys on the keyboard (in an extreme case) depends on the game you're trying to make (a turn based game might have only one event handler for all events, an RTS should have more events, and an FPS should be dealt with care (mainly because moving the mouse will happen every frame)). Generally the more separate kinds of events you have, the less you have to code in lua (which will increase performance), but the more difficult it gets if a "real event" you need to handle is actually triggered by more separate "programming level events" (which might actually decrease performance, because the lua code needs to be more complex).
Alternatively if performance is really important you can actually improve the lua VM by adding new opcodes to it (I've seen some of the companies to do this, but mainly to make decompilation of the compiled lua chunks more harder), which is actually not a hard thing to do. If you have something that the lua code needs to do a lot of times (like event registering, event running, or changing the state of the game) you might want to implement them in the lua VM, so instead of multiple getglobal
and setglobal
opcodes they would only take one or two (for example you could make a SETSTATE opcode with a 0-255 and a 0-65535 parameter, where the first parameter descibes which state to modify, and the second desribes the new value of the state. Of course this only works if you have a maximum of 255 events, with a maximum of 2^16 values, but it might be enough in some cases. And the fact that this only takes one opcode means that the code will run faster). This would also make decompilation more harder if you intend to obscure your lua code (although not much to someone who knows the inner workings of lua). Running a few opcodes per frame (around 30-40 tops) won't hit your performance that badly. But 30-40 opcodes in the lua VM won't get you far if you need to do really complex things (a simple if-then-else can take up to 10-20 or more opcodes depending on the expression).