views:

349

answers:

10

I am preparing a lecture on files for the more junior students in programming class. One of the points I want to elaborate are good practices with files.

What are the thing to keep in mind when using files in any programming language?

A: 

Remember to close them when you're done with them.

Lasse V. Karlsen
A: 

A file is always insecure

Like any other input, files can increase security risks. Additionnally, it might be malformed, either because it was created by a dated application or because the end-user attempted to modify them himself.

Either entirely ignore such a file or get as much as you can and discard the rest.

Alex Brault
A: 

Always close the file and dispose of all resources when you are finished with them.

Use read binary on binary files and read on text files. (Can't remember the number of times I've helped people whose code didn't read the whole file because they were reading a binary file with a text construct and the file happened to have a ^Z in the middle).

nzpcmad
A: 

Do:

  • Delete temporary files when no longer need them

Don't:

  • Don't create random files in user directories (for example My Documents). Use temporary/program folder for that
  • Don't keep sensitive information in plain text files
aku
The files in the home directory is a platform specific thing. I can see how this would be frowned upon in Windows, but in *nix systems, dotfiles are used a lot for this type of thing.
Jason Baker
+1  A: 

Always mount a scratch monkey.

http://catb.org/jargon/html/S/scratch-monkey.html

Paul Tomblin
+4  A: 
  • Files can be considered to contain a set of records, each either of a fixed length or ending with a delimiter.

  • Files are generally optimized for sequential access, not random-access. It's hard to insert data into the middle of a file, and it's typically faster to process files linearly (like a cassette tape) than randomly (like a CD in "shuffle" mode).

  • Random-access files usually contain fixed-length records, most of which contain empty space, making them larger than sequential-access files.

  • Files are temperamental and unpredictable creatures. They can change length, disappear, change access permissions, etc. between accesses, so validate your operations carefully and check return codes.

  • Files can be used as buffers if you read from the beginning (tail) and write to the end (head).

  • Flush your buffers!

Since no one's had the poor taste to say it yet: this should give your students a handle on the subject.

Adam Liss
I think you should say "files are contiguous, it's hard to insert data into the middle of a file". Whether or not files are random-access is a property if the underlying storage media. Often enough, fseek works great. What makes it hard to insert in the middle are their logical contiguous nature.
Johannes Schaub - litb
Good point; my original thought was too vague. I hope the edit is clearer -- thanks!
Adam Liss
A: 

If you're writing server-side or concurrent code, pay close attention to file locking: Too little and you'll have data corruption, too much and your app will deadlock.

Tim Howland
A: 

Vista has new permissions stuff in the programs directory so programs creating files may have issues when installed on vista (easy to fix but annoying all the same).

Rod
A: 
  • Don't flush after every single write.
  • don't worry about making small writes and reads. the OS is pretty good at buffering these. Don't try to reinvent your own buffering sceme. in the best case it will do nothing. it the worst it will actually work against the buffering the OS does.
  • don't create directories with thousands of files.
  • If there's a change a human would want to look or edit the data. make sure it is human readable.
  • Before considering pulling your own format, consider XML.
shoosh
+1  A: 

I've found that junior programmers often have poor intuition or learned incorrect lessons about the speed of accessing files.

Very new programmers assume that files are very fast and need help understanding why reading one byte at a time from an unbuffered file is a bad idea. Similarly, accessing directory information can be very slow and should be cached if possible.

Unfortunately, some more experienced programmers learn the wrong lesson and assume that everything always needs to be cached in RAM or it will be too slow. Modern operating systems have very sophisticated disk caches, so the second time you access the same part of a file might be significantly faster.

Finally, interactive programs should do all file operations in another thread, so your application doesn't slow to a crawl or stop working when the disk is busy, or when a remote volume is temporarily unavailable.

dmazzoni