ansaurus

Question

Answer 1

+2 A:

If the data is coming from disk, you could be IO bound.

If that is the case, get a faster disk (but first check that you're getting the most out of your existing one...some Linux distributions don't optimize disk access out of the box (hdparm)), stage the data into memory (say by copying it to a RAM disk) ahead of time, or be prepared to wait.

If you are not IO bound, you could be wasting a lot of time copying. You could benefit from so-called zero-copy methods. Something like memory map the file and only access it through pointers.

That is a bit beyond my expertise, so you should do some reading or wait for more knowledgeable help.

BTW-- You might be getting into more work than the problem is worth; maybe a faster machine would solve all your problems...

NB-- It is not clear that you can memory map the standard input either...

dmckee 2009-04-09 01:28:20

Sometimes it comes from the disk, sometimes it is fed through stdin, but in both cases the the time spent in fgets is roughly the same. Even creating a RAM disk for the file doesn't speed things up much.

dreamlax 2009-04-09 01:38:14

After edit: the problem is that this application will be run on end user's computer, that's why performance is quite important.

dreamlax 2009-04-09 01:49:43

Answer 2

+1 A:

You might try minimizing the amount of time you spend reading from the disk by reading large amounts of data into RAM then working on that. Reading from disk is slow, so minimize the amount of time you spend doing that by reading (ideally) the entire file once, then working on it.

Sorta like the way CPU cache minimizes the time the CPU actually goes back to RAM, you could use RAM to minimize the number of times you actually go to disk.

GMan 2009-04-09 01:43:26

Stdio already is buffered, isn't it?

Paul Tomblin 2009-04-09 01:53:22

I think so but I'm sure it's less than a megabyte, so reading more than that should still help.

GMan 2009-04-09 02:29:19

Answer 3

+1 A:

Depending on your environment, using setvbuf() to increase the size of the internal buffer used by file streams may or may not improve performance.

This is the syntax -

setvbuf (InputFile, NULL, _IOFBF, BUFFER_SIZE);

Where InputFile is a FILE* to a file just opened using fopen() and BUFFER_SIZE is the size of the buffer (which is allocated by this call for you).

You can try various buffer sizes to see if any have positive influence. Note that this is entirely optional, and your runtime may do absolutely nothing with this call.

Hexagon 2009-04-09 07:13:04

Answer 4

+2 A:

Use fgets_unlocked(), but read carefully what it does first
Get the data with fgetc() or fgetc_unlocked() instead of fgets(). With fgets(), your data is copied into memory twice, first by the C runtime library from a file to an internal buffer (stream I/O is buffered), then from that internal buffer to an array in your program

dmityugov 2009-04-09 14:26:48

Thanks for the suggestion, but I forgot to mention I am using Mac OS X. fgets_unlocked is not available since it is a GNU extension. I will look into using fgetc_unlocked.

dreamlax 2009-04-30 05:11:44

Well, OS X is running GCC, you should get the GNU extensions, right?

Martin Cote 2009-05-18 03:09:24

@Martin: It is not an extension of the GNU compiler, but the GNU C runtime library.

dreamlax 2009-08-06 08:13:35

Answer 5

+1 A:

Read the whole file in one go into a buffer.

Process the lines from that buffer.

That's the fastest possible solution.

Blank Xavier 2009-04-09 14:30:48

Answer 6

+3 A:

You don't say which platform you are on, but if it is UNIX-like, then you may want to try the read() system call, which does not perform the extra layer of buffering that fgets() et al do. This may speed things up slightly, on the other hand it may well slow things down - the only way to find out is to suck it and see.

anon 2009-04-09 14:31:09

This turned out to be the fastest method of all. I eventually went down this route. It was simpler than I had thought to do "my own buffering" and it turned out to be much, much faster (almost 4 times) than using `fgets()`.

dreamlax 2010-01-16 02:44:25

Answer 7

A:

If the OS supports it, you can try asynchronous file reading, that is, the file is read into memory whilst the CPU is busy doing something else. So, the code goes something like:

  start asynchronous read
loop:
  wait for asynchronous read to complete
  if end of file goto exit
  start asynchronous read
  do stuff with data read from file
  goto loop
exit:

If you have more than one CPU then one CPU reads the file and parses the data into lines, the other CPU takes each line and processes it.

Skizz

Skizz 2009-04-29 14:41:58

Answer 8

A:

Look into fread(). It reads much faster for me, especially if buffer for fread is set to 65536. Cons: you have to do a lot of work and essentially write your own getline function to convert from binary read to text. Check out: file I/O

2009-05-18 02:46:49

ansaurus

tags:

views:

answers:

Read a line of input faster than fgets?

related questions