views:

170

answers:

6

Hi! I have been reading SO for some time now, but I truly cannot find any help for my problem.

I have a c++ assignment to create an IAS Simulator.

Here is some sample code...

0   1   a
1   2   b
2   c
3   1
10  begin
11  . load a, subtract b and offset by -1 for jump+
11  load M(0)
12  sub M(1)
13  sub M(3)
14  halt

Using c++, I need to be able to read these lines and store them in a "memory register" class that I already have constructed...

For example, the first line would need to store "1 a" in register zero.

How can I parse out the number at the line beginning and then store the rest as a string?

I have setup storage using a class that is called using mem.set(int, string);. int is the memory location at the beginning of the line and string is the stored instruction.

Edit: Some Clarifications:

+1  A: 

I'd suggest taking a look at the <ifstream> library.

Amber
A: 

Something like this may be a good use of the Boost.Spirit library. It's an EBNF parser generator in C++, like flex and yacc without the extra compilation steps.

Noah Roberts
I can only use standard libraries--which is somewhat daunting.
Yes, that is daunting. You should consider work elsewhere. Working for firms with NDIH (not done in house) syndrome is just too painful.
Noah Roberts
A: 
#include <iostream> // #include <fstream> for file objects as others suggest
#include <string>
#include <map>
using namespace std;

map<int, string> my_program;

int line_num;
string line_text;
while ( cin >> line_num ) { // or any input stream such as a file
    getline( cin, line_text ); // standard function defined in <string>
    my_program[ line_num ] = line_text; // store line for next phase
}

This will read lines of the file until either the end is encountered, or a line which begins with something besides a number. Use cin.eof() to verify that the entire file was read, if you care.

Of course, since map sorts its contents, the lines will be in numerical order for the next phase.

Potatoswatter
Detail: getline should be error-checked too. And for extra terseness, take advantage of operator>> returning the stream:`while(getline(std::cin>>line_num, line_text)){...`
Éric Malenfant
@Eric: Sounds reasonable, but that would strip a final line with just a number. `getline` is guaranteed to clear the string if `(cin)` (which is guaranteed) and to return as much input as possible, even if the final state is `(!cin)`.
Potatoswatter
@Éric Malenfant: by the way, is it ok to address you as Eric? Does SO's mail system handle missing diacritics?
Potatoswatter
Recommending `using namespace std;`? How unusual. The compromise is, of course, saying `using std::string; using std::map;`, and so on. Having an explicit list of imports is surely better than the "import bloody everything" statement.
Jon Purdy
@Jon: that is a very old argument. My usual response is that `using namespace std;` is results in more maintainable code as there are fewer dependencies to specify, less incremental overhead to using standard tools, and less confusion about accidental aliased names. Use in a header file is unfriendly to other programmers, though.
Potatoswatter
@Potatoswatter: I guess it depends on how you want to handle a line with a number only: Should it be really interpreted as a number plus an empty string or as a syntax error?(And yes, the accent is necessary, I was only notified of your second comment)
Éric Malenfant
A: 

Here's the easy part:

std::string text_to_parse;
unsigned int register_number;

void Parse_Line(std::istream& data_file)
{
  // Read in the register number.
  if(data_file >> register_number)
  {
    // Read the remaining line as a string {variable}
    getline(data_file, text_to_parse);

    //  Now do something with the text...
  }
  return;
}

An issue with your data file, is that it is not following an easy grammar or syntax. For example, you have two text lines that start with 11. Line 10 is not a "memory register" line, but an instruction line. Also, line 2 and 3 don't follow the same grammar as 0 and 1.

For more assistance, please post the grammar rules (preferable in BNF syntax or ASCII art).

Thomas Matthews
My apologies! I meant to include the IAS grammar.http://www.cs.uwyo.edu/~seker/courses/2150/iascode.pdfI only need to complete the data transfer instructions (but I think I have those handled...)If there are any lines that are stated more than once, the LAST LINE is loaded. The two lines that start with 11 are a comment and an instruction...The comment will be overwritten by the instruction, since they are loaded line-by-line... All of these instructions and "memory registers" are merely loaded into my memory class.
A: 

Splitting the leading number and the rest of the line is not too difficult of a task. Use something like getline to read one line at a time from your input file, and store the line in a string char cur_line[]. For each line, try something like this:

  • Declare a pointer char* pString and an integer int line_num
  • Use the strstr function to find the first whitespace character, and assign the result to pString.
  • Move pString forward one character at a time until it points to a non-whitespace character. This is the start of the string containing the "rest of the line".
  • Use atoi on cur_line to convert the first entry in the string to an integer, and store the results in line_num
  • Now, you should be able to call your function like mem.set(line_num, pString)

Interpreting those strings is going to be much more difficult, however...

Edit: As Mike DeSimone mentions, you can combine the strstr and atoi steps above if you use one of the strto* functions instead of atoi.

bta
This worked so well. Thanks! It took me forever to get `strstr` to work correctly.
A: 

If the first part of the line is always a number, look at the strtoul function. From the man page:

strtoul -- convert a string to an unsigned long integer

LIBRARY

Standard C Library (libc, -lc)

SYNOPSIS

 #include <stdlib.h>
 unsigned long strtoul(const char *restrict str, char **restrict endptr, int base);

DESCRIPTION

The strtoul() function converts the string in str to an unsigned long value. The conversion is done according to the given base, which must be between 2 and 36 inclusive, or be the special value 0.

The string may begin with an arbitrary amount of white space (as determined by isspace(3)) followed by a single optional + or - sign. If base is zero or 16, the string may then include a 0x prefix, and the number will be read in base 16; otherwise, a zero base is taken as 10 (decimal) unless the next character is 0, in which case it is taken as 8 (octal).

The remainder of the string is converted to an unsigned long value in the obvious manner, stopping at the end of the string or at the first character that does not produce a valid digit in the given base. (In bases above 10, the letter A in either upper or lower case represents 10, B represents 11, and so forth, with Z representing 35.)

If endptr is not NULL, strtoul() stores the address of the first invalid character in *endptr. If there were no digits at all, however, strtoul() stores the original value of str in *endptr. (Thus, if *str is not \0 but **endptr is \0 on return, the entire string was valid.)

RETURN VALUES

The strtoul() function returns either the result of the conversion or, if there was a leading minus sign, the negation of the result of the conversion, unless the original (non-negated) value would overflow; in the latter case, strtoul() returns ULONG_MAX. In all cases, errno is set to ERANGE. If no conversion could be performed, 0 is returned and the global variable errno is set to EINVAL.


The key here is the endptr parameter. It sets a pointer to where you need to continue parsing. If endptr == str, then you know the line didn't start with a number.

I like the strto___ family of functions a lot more than the ato__ functions because you can set the base (including the context-sensing "base 0") and because the endptr tells me where to continue from. (And for embedded applications, strto___ is a lot smaller footprint than __scanf functions.)

EDIT: Sorry to miss your comment. To use endptr, write code like:

char* restOfLine = NULL;
unsigned long result = strtoul(lineBuffer, 10, &restOfLine);
if(restOfLine == NULL || restOfLine == lineBuffer)
{
     /* Handle error. */
}
else
{
    // Use result, and do further parsing starting at restOfLine.
}

Usually, the "handle error" clause returns or breaks or throws an exception or does something else to bail out of further processing, so you wouldn't need an explicit else clause.

Mike DeSimone
This is working very well to parse the line numbers. Currently, I'm feeding endptr NULL. What's with the double pointer (**). How do I set up a variable to catch that.