views:

119

answers:

5

I am trying to read a large list of English words from a text file to array of strings. The number of words is 2016415, and maximum length of a word is 69 characters.

If I define array like "char data[2016415][70]; " then I get stack overflow when I run the program.

So I am trying to use calloc() instead, however I can't understand how should I type-cast it so that it becomes equivalent to "char data[2016415][70];".

The following program returns "passing arg 1 of `fgets' makes pointer from integer without a cast" warning during compiling. And when I execute it, it gets "Exception: STATUS_ACCESS_VIOLATION" problem.

Can you help me?

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void){
char *data;  //data[2016415][70];

int i;
FILE *fsol;

fsol = fopen("C:\\Downloads\\abc\\sol2.txt","r");

data = (char*) calloc(2016415,70);

for(i=0;i<2016415;i++){
    fgets(data[i] , 70 , fsol);
}

fclose(fsol);

return 0;

}

+1  A: 

Okay, sorry about the previous suggestion. I forgot how horrible arrays can be. This one is tested with a small data set of 10 words, but it should scale to your word count. Note that fgets() seems to pull in the line endings as part of the preceding word.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_WORD_CNT 2016415
#define MAX_WORD_LEN 70

int main(void)
{
    char *data;  //data[2016415][70];

    int i;
    FILE *fsol;

    fsol = fopen("C:\\Downloads\\abc\\sol2.txt","r");

    data = (char*) calloc(MAX_WORD_CNT, MAX_WORD_LEN);

    // check for valid allocation
    if (data == NULL)
    {
        return 1;
    }

    for(i=0; i<MAX_WORD_CNT; i++)
    {
        fgets(&data[i * MAX_WORD_LEN], MAX_WORD_LEN, fsol);
    }

    fclose(fsol);

    return 0;
}
Amardeep
This just crashes when `fgets` tries to dereference the null pointer you're giving it.
Anon.
It shouldn't be a NULL pointer if it gets past the `if` statement.
Amardeep
Still not it. `calloc` fills its allocation with zeroes - hence, `data[i]` will always be zero. Bam, NPE.
Anon.
Okay, give this edited (and tested) version a try.
Amardeep
Thanks, this works fine.
Sunny88
@Anon, there's nothing to BAM. relax and take your time ;)
Nyan
+1  A: 

calloc just allocates a big swath of memory - not an array of pointers to other arrays.

fgets expects a pointer to the memory location it should dump it's stuff at.

So instead of giving it the contents of data[i], you want to give it the address of data[i] so it can put its stuff there.

fgets(&data[i], 70, fsol);

You'll probably also need to adjust your loop so that it goes up by 70-odd characters at a time rather than one.

Anon.
Thanks! I used fgets( and it works fine now.
Sunny88
A: 

Here's how I would allocate the array

char **data = malloc(MAX_WORD_CNT * sizeof(char *));
for(int i = 0; i < MAX_WORD_CNT; i++)
    data[i] = malloc(MAX_WORD_LEN);

you might want to add some error checking for malloc though.

Bwmat
A: 

data is a pointer to char (also addressable as an array of char), so data[i] is a single char. fgets expects a pointer to char but you're passing it a single char; that's why you're getting the warning, you're trying to use a char (integer) as a pointer.

When you run the program, it then takes that single char argument and interprets it as a pointer to char, hence the access violation because the value of that char is not a valid address.

So, in your loop you should pass fgets a pointer into data and increment that by 70 with each iteration. You can use the "pointer to an array element" form &data[i] and increment i, or the simple pointer form, with another pointer variable initially set to data, and itself incremented.

Paul Richter
A: 

The answer is simple: you DON'T cast it. Casting the results of malloc/calloc/etc. has no purpose but can have the side-effect of hiding a major bug if you forgot to include stdlib.h. The return type of these allocation functions, which is void *, will automatically be cast to whatever you need.

If you really want to know the type, it's (char (*)[70]). But please don't actually obfuscate your program by writing that. (Unless you're actually writing C++, in which case you should have tagged your question C++ and not C, or better yet used new.)

R..