tags:

views:

277

answers:

2

I've pretty much finished coding a SIC assembler for my systems programming class but I'm stumped on the tokenizing part.

For example, take this line of source code:

The format (free format) is: {LABEL} OPCODE {OPERAND{,X}} {COMMENT}

The curls indicate that the field is optional.

Also, each field must be separated by at least one space or tab.

ENDFIL   LDA  EOF   COMMENT GOES HERE

The code above is a bit easier to organize but the following snippet is giving me difficulties.

  RSUB    COMMENT GOES HERE

My code will read in the first word of the comment as if it were an OPERAND.

Here is my code:

//tokenize line
    if(currentLine[0] != ' ' && currentLine[0] != '\t')
    {
     stringstream stream(currentLine);
     stream >> LABEL;
     stream >> OPCODE;
     stream >> OPERAND;
     stream.str("");


     if(LABEL.length() > 6 || isdigit(LABEL[0]) || !alphaNum(LABEL))
     {
      errors[1] = 1;
     }
     else if(LABEL.length() == currentLine.length())
     {
      justLabel = true;
      errors[6] = 1;
      return;
     }
    }
    else
    {
     stringstream stream(currentLine);
     stream >> OPCODE;
     stream >> OPERAND;
     stream.str("");
    }

My professor requires that the assembler be tested with two versions of the source code--one with errors and one without.

The RSUB OPCODE isn't dependent on an OPERAND so I understand that everything after the RSUB OPCODE can be considered a comment, but If the erroneous source code contains a value in the OPERAND field or if an OPCODE which is dependent on an OPERAND is missing the OPERAND value, how do I compensate for this? I need to flag these as errors and print out the erroneous OPERAND value (or lack thereof).

My question is: How do I prevent the comment portion of the code from being considered an OPERAND?

A: 

In the assembly languages (as in other programming languages) that I've seen, there's a delimiter that marks a comment: for example a semicolon before the comment:

ENDFIL LDA EOF ;COMMENT GOES HERE
RSUB ;ANOTHER COMMENT GOES HERE

In your syntax however, can you tell whether something is a comment by the amount of whitespace which precedes it on the line, e.g. by the fact that there are two (not just one) whitespace events between the opcode and the comment?

{LABEL}<whitespace>OPCODE<whitespace>{OPERAND{,X}}<whitespace>{COMMENT}
ChrisW
That's the thing, there is no delimiter that I could use to differentiate the comment field from the operand field. In reference to whitespace, my professor stated in his spec sheet: "Source code is in free format and the only rule is that each field should be separated by at least a space or tab". I'm now sure how do deal with something like this.
Mikey D
Could it be that you're supposed to know that RSUB doesn't have any operator, and that therefore anything after RSUB must be a comment?
ChrisW
You're probably right. Maybe I'm just over-complicating things. Thanks a lot!
Mikey D
A: 

How can you tell if text in a certain line is an operand or a comment? Is it based on the context? For example, if the OPCODE is "RSUB", then you would know that there is no OPERAND required? Then you should perform some magic on the OPERAND based on what OPCODE is read:

if (OPCODE == "RSUB") OPERAND.clear();
1800 INFORMATION
Right, I wish it were that simple. I need to run erroneous source code as part of my project. If there is an operand present after the RSUB opcode then I need to flag an error and print out the operand. If I clear the operand field then I'd have nothing to print out.
Mikey D
In that case, you might have to count the whitespace between the opcode and the operand. If there is more than one, then you should assume it is a comment
1800 INFORMATION