Differences between unix and windows files

+5 A:

This is only a difference in text files, where UNIX uses a single Line Feed (LF) to signify a new line, Windows uses a Carriage Return/Line Feed (CRLF) and Mac uses just a CR.

Binary files there should be no difference (i.e. a JPEG on a windows machine will be byte for byte the same as the same JPEG on a unix box.)

samjudson 2008-08-20 09:22:07

+4 A:

There could also be a difference in character encoding for national characters. There is no "unix-encoding" but many linux-variants use UTF-8 as the default encoding. Mac OS (which is also a unix) uses its own encoding (macroman). I am not sure, what windows default encoding is.

But this could be another source of trouble (apart from the different linebreaks).

What are your problems? The linebreak-related problems can be easily corrected with the programs dos2unix or unix2dos on the unix-machine

Mo 2008-08-20 09:22:15

+2 A:

If you are just interested in the content of text files, then yes the line endings are different. Take a look at something like dos2unix, it may be of help here.

(Of course there are many other things that make unix and windows files different, but I don't think you're interested in those other differences right now.)

pauldoo 2008-08-20 09:23:29

+1 A:

In addition to the new-line differences, the byte-order mark can cause problems if files are treated as Unicode on Windows.

McDowell 2008-08-20 09:28:41

+3 A:

@samjudsen

OS X uses LF, the same as UNIX - MacOS 9 and below did use CR though

Cebjyre 2008-08-20 09:34:24

+1 A:

As pauldoo suggests, tools like dos2unix can be very useful. Note that these may be on your linux/unix system as fromdos or tofrodos, or perhaps even as the general purpose toolbox recode.

However, another set of problems that you may come across can be related to single/multi-byte character encodings. If you see strange unexpected chars (not at end-of-line) then this could be the reason. Especially if you see square boxes, question marks, upside-down question marks, extra characters or unexpected accented characters.

Running the command locale on your *nix box will tell you what the system locale is. If this is different to the encoding used in the text files that have been transferred over from the windows machine, then this can sometimes cause issues, depending on the usage of those files. You can use the very powerful recode command to try and convert between the different charsets as well as any line ending issues. recode -l will show you all of the formats and encodings that the tool can convert between. It is likely to be a VERY long list.

When writing to files or reading from files (that you are in control of), it is often worth specifying the encoding to use, as most Java methods allow this. However, also ensuring that the system locale matches can save a lot of pain.

Cheekysoft 2008-08-20 09:40:26

+2 A:

In addition to the answers given, you may find issues with the different file systems:

On unix, files that start with a . are hidden. On windows, it's a filesystem flag that you probably don't have easy access to. This may result in files that are supposed to be hidden now becoming visible on the client machines.
File permissions vary between the two. You will probably find, when you copy files onto a unix system, that the files now belong to the user that did the copying and have limited rights. You'll need to use chown/chmod to make sure the correct users have access to them.

Marcus Downing 2008-08-20 09:42:10

+1 A:

@sadie Also, may need to use chgrp to set group ownerships. I know this may be done through chown, but chgrp will allow you to maintin the current user definition while changing groups.

ZombieSheep 2008-08-20 09:50:05

+1 A:

I'd just like to take this opportunity to thank everybody who helped make EBCDIC die.

Greg Hewgill 2008-08-20 09:52:46

ansaurus

tags:

views:

answers:

Differences between unix and windows files

related questions