views:

164

answers:

6

This might be a really trivial one.

Is File storage OS dependant ?

Why do text Files change when moved from Dos to Unix, is it that the Editor in Unix interpret certain characters differently or does the File itself change when moved from Dos to Unix and hence the utility Dos to Unix.

Why a Java Class File can be moved from Dos to Unix and that does not change ?

What is Platform Independent storage ?

+1  A: 

There's a fundamental difference in the way that bytes and characters are stored. See:

http://www.joelonsoftware.com/articles/Unicode.html

for a description of various character sets and how they differ between various operating system (plus a whole lot more).

Java Class files are binary and always stored in Big Endian. This means that no matter what operating system they are moved between they will always be the same.

Jon
How is Big or little Endian related to File storage ? I can understand that a Java Class File might have Integers, Floats that can have an affect on different Endian systems. But how does it matter when you are reading a File. For eg. if I am reading a text File, why would Endianness matter ?
Geek
Endianness is just a convention for ordering the bytes. When the file is written, it's either stored in little, middle or big endian - depending on the processor. If we read that back we have to know how it was written, to do this we often have a Byte Order Mark. See http://stackoverflow.com/questions/701624/difference-between-big-endian-and-little-endian-byte-order
Jon
A: 

No, files do not change. Only the conventions for editing them.

What can change is the filesystem structure and metadata used to catalogue and list directories etc (ie; timestamps). Also files will naturally be encrypted on an encrypted filesystem but filesystem complexities are nearly always transparent to an application reading the file through system calls (they would be relevant if you were writing a partition resizer or other low-level disk tool).

To clarify, there is nothing in the linux OS that requires vim or emacs to use the unix convention. In fact many editors and applications can detect the newline encoding and adapt. It's up to the software how to treat files, not the OS.

SpliFF
A: 

There are two differences:

Newlines

  • Unix: \n
  • Mac OS before X: \r
  • Windows: \r\n

Little/big endianness

The edianess might be different, but this does often only matter for unicode and binary data.

Georg
you're confusing the question. he asked what changes in the file. the answer is nothing. there will still be DOS newlines in a file copied to a unix machine.
SpliFF
Windows: \r\n, not \n\r
laalto
If nothing changes why do I have to do a dos2Unix ?
Geek
because your EDITOR changed. it's an editing convention adopted by editors on that platform, nothing more.
SpliFF
+1  A: 

File Storage is not OS independant, even though the contents may be the same, the way it is interpreted is different. A point in case is many years ago I had to port (Business Basic) programs and data from a Datageneral minicomputer to DOS. The files came across with 8bit encoding and had to be translated (to 7 bit) before it could be "understood" on DOS.

mm2010
I think I agree with you. Different file formats required to be stored differently and the OS should be capable of storing and reading them in that way.
Geek
+1  A: 

Short answer: it depends.

Text files do not change by themselves when moved from DOS to Unix: Try moving them using a USB key for instance.

They may change when copied using a thrid party software: ftp has an option to handle text conversions.

Moreover, this depends not only on OS but also on filesystem. On pre OS X macs (HFS filesystem), files were stored on two forks: data and resources. When copied on a filesystem without fork concept, file could be copied as a single file containing both data and resource forks (AppleSingle) or as two files in two separate directories (AppleDouble).

mouviciel
Hi mou, can you please elaborate..
Geek
A: 

In general the filesystem will store the file the way it was asked to write it. The program that reads the file will interpret the bytes from the file.

For example, in DOS (and Windows) the newline consists of two bytes, but on Unix it's only one byte. But this is only an standard. Programs on Unix can read files with two-byte newlines. These programs just need to know how what the newlines are.

Peter Stuifzand