ansaurus

Question

Answer 1

+1 A:

If the file has words of length 20 or more, file >> word_list[i] will write past the end of the allocated buffer, which can result in the error you saw. This is called a buffer overflow.

This is also an issue when writing to temp_list, but in that case the buffer overflow is less damaging since it will probably just overwrite the memory used for the next word.

One way to fix this is to use an array of std::string instead of char * - allocation will be handled automatically that way.

interjay 2010-01-17 17:31:42

Answer 2

+4 A:

I'm guessing that one of your words is longer then the value specified in x.

When this happens, you are going to overflow your malloc buffer.

If you allocate N bytes, you need to make sure you write no more then N bytes.

Using operator>> and char buffers is a recipe for disaster. operator>> will keep reading/writing until it reaches the word separator. Since operator>> doesn't know how big the char * buffer is, it will overflow the buffer when the word is longer then the buffer. If you want to use operator>> to extract words, use std::string.

What is happening

A very common way to implement malloc is to have bookkeeping data in between the buffers returned from malloc. When you overwrite this data, the assumptions that malloc has made about the structure of the data no longer exist.

So, malloc has something like this:

+------------------+-------------+------------------+-------------+-----------
| malloc internals | user buffer | malloc internals | user buffer | etc...
+------------------+-------------+------------------+-------------+-----------

So, if you allocated 8 bytes to user buffer, but then write 12 bytes, you've just trashed the first 4 bytes of the next malloc internals record.

R Samuel Klatchko 2010-01-17 17:32:13

Ahhh. Thanks. I've begun redesigning from scratch based on Roger Pate's advice, but this makes much more sense.

Dane Larsen 2010-01-18 10:08:29

Answer 3

+5 A:

There are three major issues with that code, two of them here:

while (!file.eof())
{   
    file >> temp_list[index];
    index++;
}

You cannot test file.eof() to see if the next operation would fail, only if the previous hit eof, and that's generally only useful if it failed, so change it to:

while (file >> temp_list[index]) {
    index++;
}

As extractions (>>) return the stream and the stream is testable directly, this code is now testing the stream on every iteration and only incrementing index if the extraction was successful.

Now when extracting into a char array, input streams stop at whitespace, but they don't know the maximum length they can store unless you tell them. This same error later in the code is probably why you see what you do, because I suspect you're reading far more data than you expect, and thus trampling all over your memory. Fixed:

while (file >> std::setw(15) >> temp_list[index]) {
    index++;
}

However, the last major problem is you allocate resources and leak them, so use vector and string instead:

#include <fstream>
#include <iostream>
#include <string>
#include <vector>

void init() {
  typedef std::vector<std::string> C; // for later convenience
  C words;
  {
    ifstream file ("en_US.dic");
    if (!file) {
      std::cerr << "could not open file\n";
      // handle error: throw an exception, call abort(), etc.
    }
    for (std::string word; file >> word;) {
      words.push_back(word);
    }
    // if you want to read lines instead:
    //for (std::string line; std::getline(file, line);) {
    //  words.push_back(line);
    //}
  }
  // now use words[0] through words[words.size() - 1]
  std::cout << "Read " << words.size() << " words:\n";
  for (int i = 0; i < words.size(); ++i) {
    std::cout << "  " << words[i] << '\n';
  }
  std::cout << "Output again:\n";
  for (C::const_iterator i = words.begin(); i != words.end(); ++i)
  {
    std::cout << "  " << *i << '\n';
  }
}

Roger Pate 2010-01-17 18:15:28

+1 For noting the EOF issue, and suggesting `std::string` and `std::vector`.

Thomas Matthews 2010-01-17 19:49:27

Wow! Thanks a ton! I knew very little about vectors before this. My experience with the stl is limited, so this should throw me in a bit further. Thanks again!

Dane Larsen 2010-01-18 06:51:54

Answer 4

A:

You may want to change your design here. Dictionaries are huge.
Do you need to haul all the words (data) into memory?

Since dictionaries are huge, they are designed so they don't need to be entirely in memory at the same time. Professional dictionaries have index tables which are smaller than the whole data file. The principle idea is that index tables are small and can be hauled into memory and kept in memory rather than hauling in all the data at once.

I optimized a program by keeping the initial index table into memory. The result of the first index table is a file offset to another table (or the name of another file). This secondary table would be hauled into memory, if necessary, and so on until the exact item was found.

See the following topics (search the web):

B+ Tree
Index tables
Block I/O
File Offsets

Thomas Matthews 2010-01-17 19:58:27

Basically, I'm making a program for scrabble, so I need to search the entire dictionary pretty much every time. Thanks for the references though.

Dane Larsen 2010-01-18 10:17:52

Answer 5

A:

This will really mess up:

for(int i = 0; i < listlen; i++)
{
    word_list[i] = new char[21];
    file >> word_list[i];
}

If any of the words are bigger than 20 characters (+1 for the '\0'). Then basically you would be scribbling on memory that wsa used by the memory manager. This will cause all sorts of problems with subsequent allocations and de-allocations.

It worked in the previous loop because the buffer was contiguous:

char temp_list[60000][15];

Though a word from one line may have overlapped onto the next line, this would not have been a problem unless you were actually reading a large word into temp_list[59999] (which would overlap onto another variable).

Martin York 2010-01-17 20:13:51

I think you're right. I'm completely redesigning the program though, so I shouldn't have any problems.Thanks!

Dane Larsen 2010-01-18 10:19:24

ansaurus

tags:

views:

answers:

Very strange malloc error.

related questions