(Of course, all of the following applies only to x86 and x86-64 assembly language, for IA-32 and AMD64 processors and operating systems.)
The other answers currently visible are all correct, but, in my opinion, miss the point. AT&T versus Intel syntax is a complete non-issue; any decent tool will work with both syntaxes or have a counterpart or replacement that does. And they assemble the same anyway. (Protip: you really want to use Intel syntax. All the official processor documentation does. AT&T syntax is just one giant headache.) Yes, finding the right flags to pass to the assembler and linker can be tricky, but you'll know when you've got it and you only have to do it once per OS (if you remember to write it down somewhere!).
Assembly instructions themselves, of course, are completely OS-agnostic. The CPU does not care what operating system it's running. Unless you're doing extremely low-level hackery (that is, OS development), the nuts and bolts of how the OS and CPU interact are almost totally irrelevant.
The Outside World
The trouble with assembly language comes when you interact with the outside world: the OS kernel, and other userspace code. Userspace is trickiest: you have to get the ABI right or your assembly program is all but useless. This part is generally not portable between OSes unless you use trampolines/thunks (basically another layer of abstraction that has to be rewritten for every OS you intend to support).
The most important part of the ABI is whatever the calling convention is for C-style functions. They're what are most commonly supported, and what you're probably going to be interfacing with if you're writing assembly. Agner Fog maintains several good resources on his site; the detailed description of calling conventions is particularly useful. In his answer, Norman Ramsey mentions PIC and dynamic libraries; in my experience you usually do not have to bother with those if you do not want to. Static linking works fine for typical uses of assembly language (like rewriting core functions of an inner loop or other hotspot).
The calling convention works in two directions: you can call C from assembly or assembly from C. The latter tends to be a bit easier but there's not a big difference. Calling C from assembly lets you use things like the C standard library output functions, while calling assembly from C is typically how you access an assembly implementation of a single performance-critical function.
System Calls
The other thing your program will do is make system calls. You can write a complete and useful assembly program that never calls external C functions, but if you want to write a pure assembly language program that doesn't outsource the Fun Stuff to someone else's code, you are going to need system calls. And, unfortunately, system calls are totally and completely different on every OS. Unix-style system calls you'll need include (but are most assuredly not limited to!) open
, creat
, read
, write
, and the all-important exit
, along with mmap
if you like allocating memory dynamically.
While every OS is different, most modern OSes follow a general pattern: you load the number of the system call you want into a register, typically EAX
in 32-bit code, then load the parameters (how you do that varies wildly), and finally issue an interrupt request: it's INT 2E
for Windows NT kernels or INT 80h
for Linux 2.x and FreeBSD (and, I believe, OSX). The kernel then takes over, executes the system call, and returns execution to your program. Depending on the OS, it might trash registers or stack as part of the system call; you'll have to make sure you read the system call documentation for your platform to be sure.
SYSENTER
Linux 2.6 kernels (and, I believe, Windows XP and newer, though I have never actually attempted it on Windows) also support a newer, faster method to make a system call: the SYSENTER
instruction introduced by Intel in newer Pentium chips. AMD chips have SYSCALL
, but few 32-bit OSes use it (though it's the standard for 64-bit, I think; I haven't had to make direct system calls from a 64-bit program so I'm not sure on this). SYSENTER
is significantly more complicated to set up and use (see, for example, Linus Torvalds on implementing SYSENTER
support for Linux 2.6: "I'm a disgusting pig, and proud of it to boot.") I can personally attest to its peculiarity; I once wrote an assembly function that issued SYSENTER
directly to a Linux 2.6 kernel, and I still don't understand the various stack and register tricks that got it to work... but work it did!
SYSENTER
is somewhat faster than issuing INT 80h
, and so its use is desirable when available. To make it easier to write both fast and portable code, Linux maps a VDSO called linux-gate
into the address space of every program; calling a special function in this VDSO will issue a system call by the fastest available mechanism. Unfortunately, using it is generally more trouble than it's worth: INT 80h
is so much simpler to do in a small assembly routine that it's worth the small speed penalty. Unless you need ultimate performance... and if you need that, you probably don't want to call into a VDSO anyway, and you know your hardware, so you can just do the horribly unsafe thing and issue SYSENTER
yourself.
Everything Else
Other than the demands imposed by interacting with the kernel and other programs, there are very, very few differences between operating systems. Assembly exposes the soul of the machine: you can work as you like, and inside your own code you are not bound by any particular calling convention. You have free access to the FPU and SSE units; you can PREFETCH
directly to stream data from memory into the L1 cache and make sure it's hot for when you need it; you can munge the stack at will; you can issue INT 3
if you want to interface with a (properly configured; good luck!) external debugger. None of these things depend on your OS. The only real restriction you have is that you are running at Ring 3, not Ring 0, and so some processor control registers will be unavailable to you. (But if you need those, you're writing OS code, not application code.) Other than that, the machine is laid bare to you: go forth and compute!