I am preparing a lecture on files for the more junior students in programming class. One of the points I want to elaborate are good practices with files.
What are the thing to keep in mind when using files in any programming language?
I am preparing a lecture on files for the more junior students in programming class. One of the points I want to elaborate are good practices with files.
What are the thing to keep in mind when using files in any programming language?
Like any other input, files can increase security risks. Additionnally, it might be malformed, either because it was created by a dated application or because the end-user attempted to modify them himself.
Either entirely ignore such a file or get as much as you can and discard the rest.
Always close the file and dispose of all resources when you are finished with them.
Use read binary on binary files and read on text files. (Can't remember the number of times I've helped people whose code didn't read the whole file because they were reading a binary file with a text construct and the file happened to have a ^Z in the middle).
Do:
Don't:
Files can be considered to contain a set of records, each either of a fixed length or ending with a delimiter.
Files are generally optimized for sequential access, not random-access. It's hard to insert data into the middle of a file, and it's typically faster to process files linearly (like a cassette tape) than randomly (like a CD in "shuffle" mode).
Random-access files usually contain fixed-length records, most of which contain empty space, making them larger than sequential-access files.
Files are temperamental and unpredictable creatures. They can change length, disappear, change access permissions, etc. between accesses, so validate your operations carefully and check return codes.
Files can be used as buffers if you read from the beginning (tail) and write to the end (head).
Flush your buffers!
Since no one's had the poor taste to say it yet: this should give your students a handle on the subject.
If you're writing server-side or concurrent code, pay close attention to file locking: Too little and you'll have data corruption, too much and your app will deadlock.
Vista has new permissions stuff in the programs directory so programs creating files may have issues when installed on vista (easy to fix but annoying all the same).
I've found that junior programmers often have poor intuition or learned incorrect lessons about the speed of accessing files.
Very new programmers assume that files are very fast and need help understanding why reading one byte at a time from an unbuffered file is a bad idea. Similarly, accessing directory information can be very slow and should be cached if possible.
Unfortunately, some more experienced programmers learn the wrong lesson and assume that everything always needs to be cached in RAM or it will be too slow. Modern operating systems have very sophisticated disk caches, so the second time you access the same part of a file might be significantly faster.
Finally, interactive programs should do all file operations in another thread, so your application doesn't slow to a crawl or stop working when the disk is busy, or when a remote volume is temporarily unavailable.