ansaurus

Question

Answer 1

A:

You can open the file, put the cursor at the end of the file, store the offset, and go back to the top of the file, and make the difference.

Aif 2009-08-16 19:16:33

Answer 2

+1 A:

You can use fseek for text files as well.

fseek to end of file
ftell the offset
fseek back to the begining

and you have size of the file

Darth 2009-08-16 19:17:35

Answer 3

+3 A:

You cannot determine the size of a file in characters without reading the data, unless you're using a fixed-width encoding.

For example, a file in UTF-8 which is 8 bytes long could be anything from 2 to 8 characters in length.

That's not a limitation of the file APIs, it's a natural limitation of there not being a direct mapping from "size of binary data" to "number of characters."

If you have a fixed-width encoding then you can just divide the size of the file in bytes by the number of bytes per character. ASCII is the most obvious example of this, but if your file is encoded in UTF-16 and you happen to be on a system which treats UTF-16 code points as the "native" internal character type (which includes Java, .NET and Windows) then you can predict the number of "characters" to allocate as if UTF-16 were fixed width. (UTF-16 is variable width due to Unicode characters above U+FFFF being encoded in multiple code points, but a lot of the time developers ignore this.)

Jon Skeet 2009-08-16 19:19:14

I hadn't realized that... so I should read the whole file, incrementing a counter? Wouldn't that be pretty slow?

Javier Badia 2009-08-16 19:26:58

Or use fstat(2). See http://www.gnu.org/s/libc/manual/html_node/Reading-Attributes.html

scvalex 2009-08-16 19:44:04

@reyjavikvi: Do you want fast, or do you want accurate? There's just logically no way of doing it *without* reading the file's data if you're using a variable width encoding - unless something else has done it first (such as the operating system) and cached the data.

Jon Skeet 2009-08-16 19:48:01

(I've been assuming that you *are* interested in the number of characters instead of the number of bytes, by the way... and that you've got a variable width encoding. If you really just want to know the file size in bytes, that's a different and far simpler matter.)

Jon Skeet 2009-08-16 19:49:20

@jbcreix: My point is that many platforms - including Java and .NET - use UTF-16 code points as "characters". For example, if you want to read a file which contains 120 UTF-16 code points in, you allocate a character array of size 120, and if the file is encoded in UTF-16 you can predict that size based on the file size in bytes. You can argue all you want about whether or not that's a good idea (I wasn't giving it as "advice", btw) but it's the way that major systems are implemented. I'll edit the answer to make this clearer though...

Jon Skeet 2009-08-17 05:25:13

Answer 4

A:

Kind of hard with no sample code, but fstat (or stat) will tell you how big the file is. You allocate the memory required, and slurp the file in.

xcramps 2009-08-16 19:20:02

Answer 5

+1 A:

If you're developing for Linux (or other Unix-like operating systems), you can retrieve the file-size with stat before opening the file:

#include <stdio.h>
#include <sys/stat.h>

int main() {
   struct stat file_stat;

   if(stat("main.c", &file_stat) != 0) {
      perror("could not stat");
      return (1);
   }
   printf("%d\n", (int) file_stat.st_size);

   return (0);
}

HTH, flokra

EDIT: As I see the code, I have to get into the line with the other posters:

The array that takes the arguments from the program-call is constructed this way:

[0] name of the program itself
[1] first argument given
[2] second argument given
[n] n-th argument given

You should also check argc before trying to use a field other than '0' of the argv-array:

if (argc < 2) {
   printf ("Usage: %s arg1", argv[0]);
   return (1);
}

HTH, flokra

flokra 2009-08-16 19:22:07

Answer 6

+1 A:

Hi, I'm pretty sure argv[0] won't be an text file.

phoku 2009-08-16 19:47:08

Answer 7

+1 A:

argv[0] is the path to the executable and thus argv[1] will be the first user submitted input. Try to alter and add some simple error-checking, such as checking if fp == 0 and we might be ble to help you further.

Håkon 2009-08-16 19:54:54

Answer 8

+8 A:

The root of the problem is here:

FILE* fp = fopen(argv[0], "r");

argv[0] is your executable program, NOT the parameter. It certainly won't be a text file. Try argv[1], and see what happens then.

Roddy 2009-08-16 19:55:00

Wow, thanks. I feel stupid now.

Javier Badia 2009-08-16 20:23:22

@reyjaviki - good :-) It'll be my turn next...

Roddy 2009-08-16 20:28:03

Answer 9

A:

Give this a try (haven't compiled this, but I've done this a bazillion times, so I'm pretty sure it's at least close):

char* readFile(char* filename)
{
    FILE* file = fopen(filename,"r");
    if(file == NULL)
    {
        return NULL;
    }

    fseek(file, 0, SEEK_END);
    long int size = ftell(fp);
    rewind(fp);

    char* content = calloc(size + 1, 1);

    fread(content,1,size,file);

    return content;
}

Imagist 2009-08-16 19:59:09

Answer 10

A:

Another approach is to read the file a piece at a time and extend your dynamic buffer as needed:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define PAGESIZE 128

int main(int argc, char **argv)
{
  char *buf = NULL, *tmp = NULL;
  size_t bufSiz = 0;
  char inputBuf[PAGESIZE];
  FILE *in;

  if (argc < 2)
  {
    printf("Usage: %s filename\n", argv[0]);
    return 0;
  }

  in = fopen(argv[1], "r");
  if (in)
  {
    /**
     * Read a page at a time until reaching the end of the file
     */
    while (fgets(inputBuf, sizeof inputBuf, in) != NULL)
    {
      /**
       * Extend the dynamic buffer by the length of the string
       * in the input buffer
       */
      tmp = realloc(buf, bufSiz + strlen(inputBuf) + 1);
      if (tmp)
      {
        /**
         * Add to the contents of the dynamic buffer
         */
        buf = tmp;
        buf[bufSiz] = 0;
        strcat(buf, inputBuf);
        bufSiz += strlen(inputBuf) + 1;
      }
      else
      {
        printf("Unable to extend dynamic buffer: releasing allocated memory\n");
        free(buf);
        buf = NULL;
        break;
      }
    }

    if (feof(in))
      printf("Reached the end of input file %s\n", argv[1]);
    else if (ferror(in))
      printf("Error while reading input file %s\n", argv[1]);

    if (buf)
    {
      printf("File contents:\n%s\n", buf);
      printf("Read %lu characters from %s\n", 
       (unsigned long) strlen(buf), argv[1]);
    }

    free(buf);
    fclose(in);   
  }
  else
  {
    printf("Unable to open input file %s\n", argv[1]);
  }

  return 0;
}

There are drawbacks with this approach; for one thing, if there isn't enough memory to hold the file's contents, you won't know it immediately. Also, realloc() is relatively expensive to call, so you don't want to make your page sizes too small.

However, this avoids having to use fstat() or fseek()/ftell() to figure out how big the file is beforehand.

John Bode 2009-08-17 03:46:52

ansaurus

tags:

views:

answers:

How to copy text file to string in C?

related questions