There's something really weird going on: strcmp() returns -1 though both strings are exactly the same. Here is a snippet from the output of the debugger (gdb):
(gdb) print s[i][0] == grammar->symbols_from_int[107][0]
$36 = true
(gdb) print s[i][1] == grammar->symbols_from_int[107][1]
$37 = true
(gdb) print s[i][2] == grammar->symbols_from_int[107][2]
$38 = true
(gdb) print s[i][3] == grammar->symbols_from_int[107][3]
$39 = true
(gdb) print s[i][4] == grammar->symbols_from_int[107][4]
$40 = true
(gdb) print s[i][5] == grammar->symbols_from_int[107][5]
$41 = false
(gdb) print grammar->symbols_from_int[107][4]
$42 = 0 '\0'
(gdb) print s[i]
$43 = (char * const&) @0x202dc50: 0x202d730 "Does"
(gdb) print grammar->symbols_from_int[107]
$44 = (char * const&) @0x1c9fb08: 0x1c9a062 "Does"
(gdb) print strcmp(s[i],grammar->symbols_from_int[107])
$45 = -1
Any idea what's going on?
Thanks in advance,
Onur
Edit 1: Here are some snippets of my code:
# include <unordered_map> // Used as hash table
# include <stdlib.h>
# include <string.h>
# include <stdio.h>
# include <vector>
using namespace std;
using std::unordered_map;
using std::hash;
struct eqstr
{
bool operator()(const char* s1, const char* s2) const
{
return strcmp(s1, s2) == 0;
}
};
...
<some other code>
...
class BPCFG {
public:
char *symbols; // Character array holding all grammar symbols, with NULL seperating them.
char *rules; // Character array holding all rules, with NULL seperating them.
unordered_map<char *, int , hash<char *> , eqstr> int_from_symbols; // Hash table holding the grammar symbols and their integer indices as key/value pairs.
...
<some other code>
...
vector<char *> symbols_from_int; // Hash table holding the integer indices and their corresponding grammar symbols as key/value pairs.
void load_symbols_from_file(const char *symbols_file);
}
void BPCFG::load_symbols_from_file(const char *symbols_file) {
char buffer[200];
FILE *input = fopen(symbols_file, "r");
int symbol_index = 0;
while(fscanf(input, "%s", buffer) > 0) {
if(buffer[0] == '/')
strcpy(symbols + symbol_index, buffer+1);
else
strcpy(symbols + symbol_index, buffer);
symbols_from_int.push_back(symbols + symbol_index);
int_from_symbols[symbols+symbol_index] = symbols_from_int.size()-1;
probs.push_back(vector<double>());
hyperprobs.push_back(vector<double>());
rules_from_IntPair.push_back(vector<char *>());
symbol_index += strlen(symbols+symbol_index) + 1;
}
fclose(input);
}
This last function (BPCFG::load_symbols_from_file) seems to be the only function I modify symbols_from_int in my whole code. Please tell me if you need some more code. I'm not putting everything because it's hundreds of lines.
Edit 2: OK, I think I should add one more thing from my code. This is the constructor of BPCFG class:
BPCFG(int symbols_length, int rules_length, int symbol_count, int rule_count):
int_from_symbols(1.5*symbol_count),
IntPair_from_rules(1.5*rule_count),
symbol_after_dot(10*rule_count)
{
symbols = (char *)malloc(symbols_length*sizeof(char));
rules = (char *)malloc(rules_length*sizeof(char));
}
Edit 3: Here is the code on the path to the point of error. It's not compilable, but it shows where the code stepped through (I checked with next and step commands in the debugger that the code indeed follows this route):
BPCFG my_grammar(2000, 5500, 194, 187);
my_grammar.load_symbols_from_file("random_50_1_words_symbols.txt");
<some irrelevant code>
my_grammar.load_rules_from_file("random_50_1_words_grammar.txt", true);
<some irrelevant code>
my_grammar.load_symbols_after_dots();
BPCFGParser my_parser(&my_grammar);
BPCFGParser::Sentence s;
// (Sentence is defined in the BPCFGParser class with
// typedef vector<char *> Sentence;)
Edge e;
try {
my_parser.parse(s, e);
}
catch(char *e) {fprintf(stderr, "%s", e);}
void BPCFGParser::parse(const Sentence & s, Edge & goal_edge) {
/* Initializing the chart */
chart::active_sets.clear();
chart::passive_sets.clear();
chart::active_sets.resize(s.size());
chart::passive_sets.resize(s.size());
// initialize(sentence, goal);
try {
initialize(s, goal_edge);
}
catch (char *e) {
if(strcmp(e, UNKNOWN_WORD) == 0)
throw e;
}
<Does something more, but the execution does not come to this point>
}
void BPCFGParser::initialize(const Sentence & s, Edge & goal_edge) {
// create a new chart and new agendas
/* For now, we plan to do this during constructing the BPCFGParser object */
// for each word w:[start,end] in the sentence
// discoverEdge(w:[start,end])
Edge temp_edge;
for(int i = 0;i < s.size();i++) {
temp_edge.span.start = i;
temp_edge.span.end = i+1;
temp_edge.isActive = false;
/* Checking whether the given word is ever seen in the training corpus */
unordered_map<char *, int , hash<char *> , eqstr>::const_iterator it = grammar->int_from_symbols.find(s[i]);
if(it == grammar->int_from_symbols.end())
throw UNKNOWN_WORD;
<Does something more, but execution does not come to this point>
}
}
Where I run the print commands in the debugger is the last
throw UNKNOWN_WORD;
command. I mean, I was stepping with next on GDB and after seeing this line, I ran all these print commands.
Thank you for your interest,
Onur