tags:

views:

55

answers:

2

My issue is I am trying my first attempt at writing a very basic lexical analyzer for ascii text files. so far, it reads and compares to my token list properly, however I am unable to grab the final token without a space or pressing enter. I've tried using the delimiter ^Z ASCII 26 as another selection before comparing the string to my token list. This failed to work. I've also tried moving the f->eof() check to below the comparison location to see if it will snag it then check the eof flag. I've had no luck. could anyone possibly enlighten me? The code is below for the read method. m_TokenList is just a vector of type string.

void CelestialAnalyzer::ReadInTokens(ifstream *f){
 vector<string> statement;
 vector<string> tokens;
 string token;
 char c;
 do{
 f->get(c);   // Read in each character
if(f->eof())
 break;

if(c == '\n' || c == ' ' || c == '^Z' || c == '\r'){ // 26 ASCII ^Z (end of file marker)
 for(unsigned int i=0; i<m_TokenList.size(); i++){
  if(!token.compare(m_TokenList[i])){
  tokens.push_back(token);
  token.clear();
 }
}
} else {
 token.push_back(c); // Add it to the token array
}
} while (true);





f->close();

for(unsigned int i=0; i<tokens.size(); i++){
cout << "Found Token: " << tokens[i].c_str() << endl;
}

}

The m_TokenList is initialized as

CelestialAnalyzer::CelestialAnalyzer(){
m_TokenList.push_back("KEY");  // Prints data
m_TokenList.push_back("GETINPUT"); // Grabs user data
m_TokenList.push_back("+");   // Addition/Concation
m_TokenList.push_back("-");   // Subtraction
m_TokenList.push_back("==");  // Equator
m_TokenList.push_back("=");   // Assignment
m_TokenList.push_back(";");   // End statement
m_TokenList.push_back(" ");   // Blank
m_TokenList.push_back("{");   // Open Grouping
m_TokenList.push_back("}");   // Close Grouping
m_TokenList.push_back("(");   // Parameter opening
m_TokenList.push_back(")");   // Parameter closing
for(unsigned int i=48; i<=57; i++){
 string s; s.push_back((char)i);
 m_TokenList.push_back(s); s.clear();
}
}

A test file for reading is this simple example. 1 + 2 = KEY

It will register all but 'KEY' unless there is a space or a newline after it.

A: 

What about double 'new line'? As I know, in several messenger protocol regard \r\n\r\n with the end of the message. I think it's pretty reasonable. :)

Hongseok Yoon
No luck ,just tried '\r\n' and '\r\n\r\n'.Either way, I am only comparing 1 character at a time.It would only pick up the first one right?
Justin Sterling
A: 

Why don't you just delete:

if(f->eof()) break;

and use

if(f->eof() || c == '\n' || c == ' ' || c == '^Z' || c == '\r'){

then break afterwards? That way, when you hit EOF, you will add whatever remaining token you have.

Alternately, you could just check if the token is nonempty after you break out of the loop, and add it in that case.

Fixed it with bool eof = false; do{ f->get(c); // Read in each character if(c == '\n' || c == ' ' || c == '^Z' || c == '\r' || f->eof()){ // 26 ASCII ^Z (end of file marker) for(unsigned int i=0; i<m_TokenList.size(); i++){ if(!token.compare(m_TokenList[i])){ tokens.push_back(token); token.clear(); } } } else { token.push_back(c); // Add it to the token array } if(f->eof()) eof = true; } while (!eof);
Justin Sterling