I've pretty much finished coding a SIC assembler for my systems programming class but I'm stumped on the tokenizing part.
For example, take this line of source code:
The format (free format) is: {LABEL} OPCODE {OPERAND{,X}} {COMMENT}
The curls indicate that the field is optional.
Also, each field must be separated by at least one space or tab.
ENDFIL LDA EOF COMMENT GOES HERE
The code above is a bit easier to organize but the following snippet is giving me difficulties.
RSUB COMMENT GOES HERE
My code will read in the first word of the comment as if it were an OPERAND.
Here is my code:
//tokenize line
if(currentLine[0] != ' ' && currentLine[0] != '\t')
{
stringstream stream(currentLine);
stream >> LABEL;
stream >> OPCODE;
stream >> OPERAND;
stream.str("");
if(LABEL.length() > 6 || isdigit(LABEL[0]) || !alphaNum(LABEL))
{
errors[1] = 1;
}
else if(LABEL.length() == currentLine.length())
{
justLabel = true;
errors[6] = 1;
return;
}
}
else
{
stringstream stream(currentLine);
stream >> OPCODE;
stream >> OPERAND;
stream.str("");
}
My professor requires that the assembler be tested with two versions of the source code--one with errors and one without.
The RSUB OPCODE isn't dependent on an OPERAND so I understand that everything after the RSUB OPCODE can be considered a comment, but If the erroneous source code contains a value in the OPERAND field or if an OPCODE which is dependent on an OPERAND is missing the OPERAND value, how do I compensate for this? I need to flag these as errors and print out the erroneous OPERAND value (or lack thereof).
My question is: How do I prevent the comment portion of the code from being considered an OPERAND?