Is there ever a situation when ASM just isn't low-level enough? After all, assembler still has to be assembled. Has anyone ever written a program in binary? I'm just wondering if there's ever a theoretical reason why doing so might be practical or even if it's possible on modern computers.
It used to be not too uncommon to go from binary to assembler to understand a dump.
But not using assembler? I can't think of any reason. Assembler is already programming the bare metal. It's only benefit is to enable the use of labels such as "add" for the actual (binary) instruction. etc.
Historical reason: You are running a machine that requires its boot code to be toggled in on the front panel. (And yes, this was done. Regularly in the first couple of generation of machines.)
Not-what-you-were-looking-for modern reason: When you are writing an assembler you have to figure out the process.
You got it — if no [dis]assembler is available. I've been in firmware hacking situations where I spent enough time looking at raw PowerPC instruction streams to be able to recognize and hand-assemble a few kinds of instructions. (I ended up porting a disassembler: http://homepage.mac.com/potswa/source/DisDave.sit, if you can manage to install it.)
Some ISAs are much simpler than others. RISCs follow simple formats and it's easy to orient yourself because instructions are typically the same length and aligned to word boundaries. x86-64, on the other hand, is full of variable-length encodings and prefix codes.
In FPGA projects or when custom circuitry is involved, it is very common to devise some kind of instruction stream and hand-encode it in binary.
In a post apocalyptic world where all keyboards and monitors have been destroyed and the only way to program tetris into your computer is through toggles on your front-panel, yes.
But seriously why would anyone want to do such a thing?
Edit: obviously there are people out there designing processors that have to program in binary, until they can get an assembler running on their processors, but they are a very small group of people.
When you're hacking binary formats by hand, as A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux does.
Well, you might use hex to program some basic bootload instructions in a RAM or a ROM instead of using an assembler, if you were the chip developer. I've done so for a softcore I wrote.
Realistically, after you've done that, the next step is to write a basic assembler in Perl or something.
Dynamic code generation:
If you have a very simple problem to solve, and performance is important it is often a good idea to analyze the problem-space and generate a specialized function on the fly to solve the problem.
One practical example: High performance math with sparse matrices.
This often involves multiplying arrays of numbers, thousands to millions of times. Since lots of the matrix elements may be zero or one you can save a significant amount of time if you remove all of the trivial multiplications.
To do this a little code-generator can analyze the matrices and generate the machine code for the matrix arithmetic on the fly. How to to this can range from using a JIT library (or built-in language feature) to very simple schemes.
For the sparse matrix multiplications case you may get great performance by just gluing pre-built code snippets for the different cases together. This can be done in 50 lines of C-code.
- taking advantage of undocumented opcodes (still there on several modern day processors!) had to do this not too long ago on 6502 based processors.
- when flashing a program into home-built circuits with a microcontroller. Microcontrollers useful for all sorts of things these days.
Back in 1997 I used to do this on TI-83 calculators when I was at school and didn't have access to a link cable.
Normally at that time, you would just write an assembly program, use TASM to build it, and then transfer it to the calculator via a link cable. But if I was bored and wanted to put something small together, I had memorized enough of the byte instructions to be able to type them in for certain things.
Side Note Of course this was fun if there was a bug in the program, because it could easily corrupt the entire calculator's RAM. So then you would have to hold down the ON button and/or remove the AAA batteries and hope that was enough to restore the calc (sans any programs that were in memory). Otherwise to do a hard reset, you would have to use a screwdriver to unscrew a special backup battery. Good times...
I recall reading that Woz wrote the first Apple BASIC (Apple I? Apple II?) in machine language. Before they had storage devices, you needed to enter hex codes in the monitor.
Even if you find yourself skipping the assembler and going straight to machine code, you won't be using binary, but hex instead.
In school, I had to patch code in-memory using a debugger without the benefit of an assembler. While entertaining, this is a skill with virtually no value outside of embedded systems debugging.
Also, consider that opcode mnemonics used in assembly should have a 1:1 correspondence with actual opcodes (thus the term "mnemonic"), so you won't be able to do anything by pounding out machine code by hand that you couldn't do in assembly. The assembler's role is to convert mnemonics to opcodes (also determining which version of a specific instruction should be used - immediate vs indirect MOVs, for instance), labels to addresses, and similar tasks.
It's good to know what's going on inside of the assembler, but this will almost never come up unless you're looking for a bug in an assembler, hacking an embedded gadget or MacGyvering your way out of a really, really weird situation.
If you're creating an interpreter. Perhaps you have the interpreter completed, but not the parser. You could test out the interpreter by writing the to-be-interpreted program in pure binary.
I didn't have any assembler to my eight bit Atari, so I wrote the machine code directly. To start the code from BASIC you either write the code as decimal data bytes or as a string. (Yes, you could actually write code in a string, the only character code from the 256 that you couldn't type in was 155 - the code for return. Luckily there is no 6502 machine code instruction with that value, so it was only a problem when a branch happened to be 101 bytes backwards (-101 = 155).)
I still remember a common piece of code to start a timer:
104 (pla)
169, 7 (lda #7)
162, 6 (ldx #6)
160, 10 (ldy #10)
76, 92, 228 (jmp 0xE45C)
In recent years I have attended some size optimisation assembly competitions. Eventhough most of the code is assembly, you still have to know exactly which instructions the assembler produces so that you know how many bytes they are. Also, sometimes you use tricks like having some bytes be used both as data and as code, or having some bytes being different instructions depending on whether you enter the first byte or enter in the middle of an instruction. Then you write instructions as data bytes in the middle of the assembly code.
When I was in training during my Navy days, (some time around 1986) we has a computer that we were given to learn electronics troubleshooting, not programming troubleshooting, that was programmed by entering binary information into the front of the computer, and we had to tell the instructor what they broke in the machine based on the results as well as troubleshooting of the hardware. As far as I know there might still be one of those machines around.
I wish I could find my source code for it, I actually wrote a simulator of the machine and a compile for the language for the machine. It was amazing how much work you could get done with 1024 bytes of memory! :)
There are a few times you benefit from working with raw machine code, not just assembly language. For example, consider sending a binary file via email, but with an email program that didn't know how to decode attachments. At one time, a few people wrote small programs that could decode the rest of an attachment, but everything in the program was a printable character. So, you decode your attachment, you'd save the body of the email as whatever.com
, and then execute it. It would decode the attachment and write a binary file you could then execute.
For another example, years ago on Fidonet there was a rather simple challenge: write a program that simply prints out a number that increments each time it's run -- but (the part that made it tricky) it's not allowed to use any external files or other storage to do the job. To keep this from getting too boring, it was also a code-golf kind of thing, though the measured size was executable bytes, not source code. Quite a few of the entries to this challenge used self-modifying code that depended heavily on exactly how instructions were encoded and such.
Looking for a second, I see I still have the source code to one of my attempts:
.model tiny,c
.286
.code
.startup
main proc
mov si,offset count
inc byte ptr [si]
mov al, [si]
mov bx,4090h
shr al, 4
call convert
lodsb
and al,0fh
mov byte ptr end_convert, 08bh
convert:
add al,bl
daa
adc al,bh
daa
int 29h
end_convert:
ret
db 0d6h
; mov dx, si
mov ah,3ch
xor cx, cx
int 21h
xchg bx, ax
mov dx,offset main
mov cx,offset the_end - offset main
int 21h
ret
main endp
count:
db 0
name:
db 'c.com', 0
the_end:
end
I'd better quit now, before I'm responsible for anybody have apoplectic fits (hoping I'm not too late...)
A really cool example is this famous polyglot, which is a valid DOS .COM file among other things because the ASCII in its source code doubles as binary x86 instructions! http://ideology.com.au/polyglot/polyglot.txt
More boring examples...
Many processors implement ISA instructions as sequences of more primitive micro-instructions (basically collections of datapath control signals) which are "microcoded" in a microcode ROM.
For a simple enough processor, you might write microcode directly in binary rather than assembling it from a mnemonic language. Or if you're reverse engineering a processor, you might not know its micro-instruction set and just have to guess at the format of micro-instructions... in which case you're probably also working in binary. Either way this is lower level than assembly language.
Sometimes code for old processors like the 6502 used undocumented instructions which didn't have official mnemonics, so you had to write binary values rather than assembly instructions.
For a college project I had to design a simplified microcontroller in VHDL (a hardware description language). To test it, I wrote an extremely simple program in binary, because it was the most convenient way to feed the program into the simulated microcontroller.