I am writing a programming language text parser, out of curiosity. Say i want to define an immutable (at runtime) graph of tokens as vertices/nodes. These are naturally of different type - some tokens are keywords, some are identifiers, etc. However they all share the common trait where each token in the graph points to another. This property lets the parser know what may follow a particular token - and so the graph defines the formal grammar of the language. My problem is that I stopped using C++ on a daily basis some years ago, and used a lot of higher level languages since then and my head is completely fragmented with regards to heap-allocation, stack-allocation and such. Alas, my C++ is rusty.
Still, I would like to climb the steep hill at once and set for myself the goal of defining this graph in this imperative language in a most performant way. For instance I want to avoid allocating each token object separately on the heap using 'new' because I think if I allocate the entire graph of these tokens back-to-back so to speak (in a linear fashion like elements in an array), this would benefit the performance somehow, per locality of reference principle - I mean when the entire graph is compacted to take up minimal space along a 'line' in memory, rather than having all its token objects at random locations, that is a plus? Anyway, like you see, this is a bit of a very open question.
class token
{
}
class word: token
{
const char* chars;
word(const char* s): chars(s)
{
}
}
class ident: token
{
/// haven't thought about these details yet
}
template<int N> class composite_token: token
{
token tokens[N];
}
class graph
{
token* p_root_token;
}
The immediate question is: what would be the procedure to create this graph object? It's immutable and it's thought structure is known at compile time, that's why I can and want to avoid copying stuff by value and so on - it should be possible to compose this graph out of literals? I hope I am making sense here... (wouldn't be the first time I didn't.) The graph will be used by the parser at runtime as part of a compiler. And just because this is C++, I would be happy with a C solution as well. Thank you very much in advance.