views:

433

answers:

8

Just wondering how the world of assembly works, and I was reading about the assembly language on wiki and this quote struck me:

It implements a symbolic representation of the numeric machine codes and other constants needed to program a particular CPU architecture.

I always thought assembly was a fixed language based on your CPU (with different compilers and languages based on said CPU) so that for your CPU you could only use this type of assembly to talk to your hardware.

But based on that quote, there could be other languages that use other symbols to represent the same numeric machine code.

So, are there any other languages that talk straight to the hardware that aren't assembly? Or am I getting it wrong?

+4  A: 

You are getting it wrong (or possibly right - it's difficult to tell from your question). Assembly language is a symbolic (easy for humans to read) representation of the binary patterns of instructions for a particular CP architecture. One does occasionally come across references to "portabe assembler" (Scott Nudds, anyone?) but these are really slightly higher level languages.

anon
What I'm thinking is, aren't there other languages that call the same instructions but with a different syntax (on the same CPU)
Ólafur Waage
Yes - most CPU architectures actually have several different assemblers which use subtly different symbolic representations of the same machine instructions
anon
Exactly what i was thinking about. Thanks.
Ólafur Waage
Another thing that "portable assembler" might sometimes mean, is an assembler language for a bytecode which is not a hardware machine code. For instance, jasm is a Java (dis)assembler, which addresses the JVM instruction set directly rather than through the Java Programming Language.
Steve Jessop
... So it's a higher-level language in the sense that it doesn't directly address the hardware (unless your hardware has Jazelle, in which case it does), and in the case of Java it also offers some very sophisticated ops. But it needn't - an ARM emulator allows "portable ARM assembly" code...
Steve Jessop
The name of Scott Nudds shall not be invoked in any online forums.
Jimmy J
Portable Assembler? Isn't that called "C"? I mean arguably C is a preprocessor for an assembler. Or can be. And it's basically PDP assembler anyway.
Peter Wone
C is "portable assembler" in the same sense that a modern laser printer is a "typesetting-free printing press" - only in loose metaphor. Yes, you get (almost) direct access to memory and can arbitrarily decide what that memory means, but the grammar allows for abstractions that cannot exist in ASM.
Jeff Shannon
+1  A: 

Assembly intermixed with C is used a lot. Some CPUS (like the 8052 chip) come with a higher level language burned in ROM. These languages have special statements that allow interaction with hardware at a low level.

A family of CPUS are generally designed to use the same machine codes which means the same assembly language. A specific CPU may have more cache, pipelines, etc but otherwise can run the same machine code as the other CPUS in the same family.

So software compiled to one CPU will run on all of them. One of the most popular is the i386 instruction set which found powering nearly all Windows machine. There is a 16 bit predecessor, and a 64 bit successor.

RS Conley
+2  A: 

Assembly languages are very closly related to the hardware architecture of the target system.

To a large extent there is a one to one mapping from asm code to machine instruction -- thats the whole point really -- so you can manipulate the hardware at the level of individual instructions.

They also allow you to access and manipulate memory in a manner that matches the machines memory architecture (monolithis, segemnted, virtual etc.).

Assemblers vary greatly some do litle more than translate three letter codes to 4 byte instructions, others, like the venerable OS/390 assembly language are sophisticated programming nevironments in thier own right.

Having said all this most modern chips are emulating ancient instruction sets so you are really not that close to the wire anyway, and, the better C compilers are aware of the underlying micro-architectures (things like pipelines, how many integer instructions ar e executed every cycle etc.) so a good C compiler will nearly always out perform mediocre assembly code!

James Anderson
+5  A: 

You could use a different set of symbols to represent the machine codes. But nobody bothers, because you wouldn't gain much.

ARM has an instruction called ADD. In ARM assembler, "ADD r0, r0, #1" represents the 4-bytes of machine code which constitute an instruction to increment register 0.

Whatever you call that instruction, you can't change the set of instructions available and still call it ARM assembler. It's still fundamentally the same programming language whether you call the ADD operation "ADD", or "SUM", or "PLUS", or "ADDITION". Since it's easier to use existing references if everyone uses the same names for everything, that's what happens.

One useful change might be to represent the instruction as "INC r0", since ARM doesn't have an INC instruction, and it's a common operation. This leads to macros in assembler languages. These genuinely do change the language, but once you have macros which emit multiple ARM instructions, you start to lose the WYSIWYG nature of assembly. Eventually you start to think that maybe you might as well just write C. I speak from experience (it wasn't ARM, but it was a macroised assembler).

One common difference is case - if you felt like being pedantic, you could argue that there are two different versions of ARM assembler language, one in uppercase and one in lowercase (or argue that there's one language, with multiple symbols for the same thing). Different disassemblers of the same machine code sometimes output different formats. Sometimes these are different enough that a particular assembler won't cope with all of them, or assemblers will offer their own conveniences which are incompatible with another assembler on the same platform. But really, it's all the same thing, and if you're bothering to draw the distinction, it's generally because you've been bitten in the ass rather than because anything good is happening...

Steve Jessop
Jeff Shannon
+1  A: 

... So that for your CPU you could only use this type of assembly to talk to your hardware.

All languages eventually convert to instructions that are executed on real hardware, whether that is done fairly directly as with an assembler or through a high level of abstraction as with C. The tricky bit is actually getting the machine instructions to manipulate the hardware in ways that you want since one point of higher level languages is to shield you from the hardware details.

Some languages, like C, are designed with the intent to manipulate hardware directly and so they include keywords like volatile to prevent the compiler from otherwise optimizing away references to device registers. These may be written and not read back so that the compiler thinks the value saved is never used again. Or it may be necessary to read a device register though the value is never used. There are also miscellaneous instructions for such operations as enabling and disabling interrupts that an ordinary program will not generate.

This may also require linker support so that memory locations (for memory mapped I/O) can be located at the correct addresses for device registers. However some processors use distinct instructions for I/O and there must be some facility for inserting them in the code stream, so in many cases it may not be possible to access H/W unless there is explicit language support.

And finally, with most modern operating systems like Windows and Linux, applications are run in virtual memory where program addresses do not match physical address and the programs are usually denied access to the hardware. Code that tries to access hardware when the OS has not granted it specific permissions will generate an interrupt, return to the OS and no longer execute.

HankB
+3  A: 

Here is an example from Clozure Common Lisp. It allows to write inline assembly code in Lisp. The following defines a function %safe-get-ptr written in its x86 assembler notation:

(defx86lapfunction %safe-get-ptr ((src arg_y) (dest arg_z))
  (check-nargs 2)
  (save-simple-frame)
  (macptr-ptr src imm0)
  (leaq (@ (:^ done) (% fn)) (% ra0))
  (movq (% imm0) (@ (% :rcontext) x8664::tcr.safe-ref-address))
  (movq (@ (% imm0)) (% imm0))
  (jmp done)
  (:tra done)
  (recover-fn-from-rip)
  (movq ($ 0) (@ (% :rcontext) x8664::tcr.safe-ref-address))
  (movq (% imm0) (@ x8664::macptr.address (% dest)))
  (restore-simple-frame)
  (single-value-return))

It is still assembly. Besides that there are lots of languages which have low-level constructs to set/read values from memory or registers, etc.

The CPU does not execute assembly language. Assembly language is only some (more or less direct) textual representation of the specific CPU machine code.

Rainer Joswig
Only Lisp could make assembly language harder to read/learn
kibibu
A: 

Your question was:

So, are there any other languages that talk straight to the hardware that aren't assembly? Or am I getting it wrong?

I'm surprised no one's mentioned Register Transfer Language, or any of the hardware description languages, such as Verilog or VHDL.

RTL isn't a programming language per se, and is generally hardware-neutral (assembly is definitely NOT neutral, it's targeted to a specific architecture).

VHDL and Verilog are most commonly used for programmable logic, which I think qualifies as "talking straight to the hardware". Soft cores are often implemented in programmable logic, so you could use one of these to implement (for example) an ARM processor, which itself could be programmed in assembly....

Fun stuff.... makes me wish I could go back & do all my EE/CE work again....

Dan
If it is hardware-neutral language, than it can't talk directly to the machine. Though I appreaciate you mention RTL here.
phresnel
I wouldn't say Verilog/VHDL "talk straight to the hardware" - they describe logic operations, and go through a multiple-step compilation process (including chip-specific libraries and some hefty nondeterministic simulation) to produce a binary file that gets loaded into the programmable logic chip.
Jeff Shannon
+1  A: 

Sure, there are lots of languages that talk directly to the hardware that are not assembly. For example, on the Burroughs B5000, the CPU was programmed in a variant of ALGOL, on the Lisp Machine, the CPU executed Lisp code directly, on the early Smalltalk workstations the CPU executed Smalltalk bytecode directly. Researchers have built CPUs based on graph-reduction engines that execute Lambda Calculus directly. There's more than one company that build Java processors, which are of course programmed in JVM bytecode.

Jörg W Mittag