views:

668

answers:

18

Is there ever a situation when ASM just isn't low-level enough? After all, assembler still has to be assembled. Has anyone ever written a program in binary? I'm just wondering if there's ever a theoretical reason why doing so might be practical or even if it's possible on modern computers.

A: 

It used to be not too uncommon to go from binary to assembler to understand a dump.

But not using assembler? I can't think of any reason. Assembler is already programming the bare metal. It's only benefit is to enable the use of labels such as "add" for the actual (binary) instruction. etc.

Larry K
+6  A: 

Historical reason: You are running a machine that requires its boot code to be toggled in on the front panel. (And yes, this was done. Regularly in the first couple of generation of machines.)

Not-what-you-were-looking-for modern reason: When you are writing an assembler you have to figure out the process.

dmckee
There are many other situations, and writing an assembler doesn't/shouldn't actually involve writing any functions/programs in binary.
Potatoswatter
@Potatoswatter: You don't write the assemble in machine code unless you're writing the first assemble. But *you* must hand (or head) translate enough code to know what the assemble is supposed to do. I stand by the answer.
dmckee
This was done in the very, very early days of personal computers. I saw a review of an IMSAI computer that praised it for having paddle switches on its front panel, a lot nicer than the Altair 8800's switches. (IIRC, it was also the review that praised the kit for ease of assembly, since the reviewer only had to bring his oscilloscope out once.) That was, however, a while ago.
David Thornley
@David: Even the kit PCs predate me by a few years; I've never actually done this. Though I have been toying with building a micro-controller board to support that supports that king of operation.
dmckee
+3  A: 

You got it — if no [dis]assembler is available. I've been in firmware hacking situations where I spent enough time looking at raw PowerPC instruction streams to be able to recognize and hand-assemble a few kinds of instructions. (I ended up porting a disassembler: http://homepage.mac.com/potswa/source/DisDave.sit, if you can manage to install it.)

Some ISAs are much simpler than others. RISCs follow simple formats and it's easy to orient yourself because instructions are typically the same length and aligned to word boundaries. x86-64, on the other hand, is full of variable-length encodings and prefix codes.

In FPGA projects or when custom circuitry is involved, it is very common to devise some kind of instruction stream and hand-encode it in binary.

Potatoswatter
+1  A: 

In a post apocalyptic world where all keyboards and monitors have been destroyed and the only way to program tetris into your computer is through toggles on your front-panel, yes.

But seriously why would anyone want to do such a thing?

Edit: obviously there are people out there designing processors that have to program in binary, until they can get an assembler running on their processors, but they are a very small group of people.

Jeffrey Hines
First you'll need to find a computer with toggle switches on the panel. I remember a million dollar CDC Cyber mainframe. It had toggles, but you had to first find the right cabinet and then open the door. There they were: 60 toggles for the 60 bit words. Ahh memories.
Larry K
My first paid programming job was on a Burroughs 1800, which had toggles and blinking lights. When we upgraded to a Burroughs 5900 we lost the toggles and blinking lights and we had no idea what the computer was doing. Yes... memories.
Jeffrey Hines
Memories, all alone with the blinking lights, way back in the old days, there were more toggles then. I remember, the time punch cards were luxury items, and debugging made you insane. (Well, I personally don't.)
Potatoswatter
Heh! I don't think processor designers program in binary to get new systems bootstrapped. Surely they cross-compile.
Jason Orendorff
Cross compiling is the way to go these days, but back at the dawn of time things were different.
dmckee
@Larry K: Those toggles were to enter the boot code (had to do it somehow). Typically, they were set once according to manufacturer's spec and completely ignored thereafter. When taking down campus computers was considered a sport, one trick was to get access and flip a few switches before bringing the computer down, and seeing how long it took the operators to think of looking.
David Thornley
@David Thornly: do to a memory dump on the Burroughs 1800 we have to set the toggles and then press a start button.
Jeffrey Hines
@David Thronley:Speaking from experience, no, the switches didn't stay the same. To toggle in the boot code, you set the switches for the bits of a single word, then hit another switch to actually write that value to memory and increment the PC. Then you changed the switches to the bits for the next word, wrote it to memory, and so on. Needless to say, there was a high priority on keeping the boot code short.
Jerry Coffin
@Jerry Coffin: Not on the CDC computers I used, at least not where I was. Those switches (and there were multiple behind the panel) were set up and left. There were other systems that worked as you said.
David Thornley
@David: That sort of makes sense -- it's hard to guess what switches you were looking at, but thinking back on it, they almost certainly would *not* have been for toggling code into memory. When you did a system dead-start on a CDC, you did it from a PPU, not the CPU. The PPU copied a dead-start tape into main memory, then started the CPU executing the code.
Jerry Coffin
@Jerry: That's what wasn't quite working in my memory. The switches I'm remembering were on the primary PPU. That's why the 60 bits were bothering me as wrong; it was rows of 12 switches. Since a PPU was an actual computer, I think this still counts as programming a computer in binary.
David Thornley
@David: Yes, the PPU definitely was a computer, complete with its own memory, and so on. The switches would have been toggling code into the PPU's memory rather than the CPU's, but I'd agree that it was programming a computer in binary.
Jerry Coffin
+3  A: 

When you're hacking binary formats by hand, as A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux does.

ephemient
A: 

Well, you might use hex to program some basic bootload instructions in a RAM or a ROM instead of using an assembler, if you were the chip developer. I've done so for a softcore I wrote.

Realistically, after you've done that, the next step is to write a basic assembler in Perl or something.

Paul Nathan
+3  A: 

Dynamic code generation:

If you have a very simple problem to solve, and performance is important it is often a good idea to analyze the problem-space and generate a specialized function on the fly to solve the problem.

One practical example: High performance math with sparse matrices.

This often involves multiplying arrays of numbers, thousands to millions of times. Since lots of the matrix elements may be zero or one you can save a significant amount of time if you remove all of the trivial multiplications.

To do this a little code-generator can analyze the matrices and generate the machine code for the matrix arithmetic on the fly. How to to this can range from using a JIT library (or built-in language feature) to very simple schemes.

For the sparse matrix multiplications case you may get great performance by just gluing pre-built code snippets for the different cases together. This can be done in 50 lines of C-code.

Nils Pipenbrinck
+2  A: 
  1. taking advantage of undocumented opcodes (still there on several modern day processors!) had to do this not too long ago on 6502 based processors.
  2. when flashing a program into home-built circuits with a microcontroller. Microcontrollers useful for all sorts of things these days.
Sanjay Manohar
+8  A: 

Back in 1997 I used to do this on TI-83 calculators when I was at school and didn't have access to a link cable.

Normally at that time, you would just write an assembly program, use TASM to build it, and then transfer it to the calculator via a link cable. But if I was bored and wanted to put something small together, I had memorized enough of the byte instructions to be able to type them in for certain things.

Side Note Of course this was fun if there was a bug in the program, because it could easily corrupt the entire calculator's RAM. So then you would have to hold down the ON button and/or remove the AAA batteries and hope that was enough to restore the calc (sans any programs that were in memory). Otherwise to do a hard reset, you would have to use a screwdriver to unscrew a special backup battery. Good times...

Justin Ethier
Short version of this (and indeed all the real answers): when no tool is available to do it for you.
dmckee
+1  A: 

I recall reading that Woz wrote the first Apple BASIC (Apple I? Apple II?) in machine language. Before they had storage devices, you needed to enter hex codes in the monitor.

Ken
+2  A: 

Even if you find yourself skipping the assembler and going straight to machine code, you won't be using binary, but hex instead.

In school, I had to patch code in-memory using a debugger without the benefit of an assembler. While entertaining, this is a skill with virtually no value outside of embedded systems debugging.

Also, consider that opcode mnemonics used in assembly should have a 1:1 correspondence with actual opcodes (thus the term "mnemonic"), so you won't be able to do anything by pounding out machine code by hand that you couldn't do in assembly. The assembler's role is to convert mnemonics to opcodes (also determining which version of a specific instruction should be used - immediate vs indirect MOVs, for instance), labels to addresses, and similar tasks.

It's good to know what's going on inside of the assembler, but this will almost never come up unless you're looking for a bug in an assembler, hacking an embedded gadget or MacGyvering your way out of a really, really weird situation.

David Lively
No difference. Hex (or octal which I'm told many old timers preferred (different word length, 'ya ken?)) is just binary shorthand.
dmckee
And the sun is hot and water's wet. However, would you rather type "01011010" or "5A?" Also, I have yet to see the debugger that would let you enter binary; everyone I've seen required hex or (ugh) octal entry.
David Lively
A: 

If you're creating an interpreter. Perhaps you have the interpreter completed, but not the parser. You could test out the interpreter by writing the to-be-interpreted program in pure binary.

Wallacoloo
+1  A: 

I didn't have any assembler to my eight bit Atari, so I wrote the machine code directly. To start the code from BASIC you either write the code as decimal data bytes or as a string. (Yes, you could actually write code in a string, the only character code from the 256 that you couldn't type in was 155 - the code for return. Luckily there is no 6502 machine code instruction with that value, so it was only a problem when a branch happened to be 101 bytes backwards (-101 = 155).)

I still remember a common piece of code to start a timer:

104 (pla)
169, 7 (lda #7)
162, 6 (ldx #6)
160, 10 (ldy #10)
76, 92, 228 (jmp 0xE45C)

In recent years I have attended some size optimisation assembly competitions. Eventhough most of the code is assembly, you still have to know exactly which instructions the assembler produces so that you know how many bytes they are. Also, sometimes you use tricks like having some bytes be used both as data and as code, or having some bytes being different instructions depending on whether you enter the first byte or enter in the middle of an instruction. Then you write instructions as data bytes in the middle of the assembly code.

Guffa
+3  A: 

When I was in training during my Navy days, (some time around 1986) we has a computer that we were given to learn electronics troubleshooting, not programming troubleshooting, that was programmed by entering binary information into the front of the computer, and we had to tell the instructor what they broke in the machine based on the results as well as troubleshooting of the hardware. As far as I know there might still be one of those machines around.

I wish I could find my source code for it, I actually wrote a simulator of the machine and a compile for the language for the machine. It was amazing how much work you could get done with 1024 bytes of memory! :)

David Parvin
A: 

Hazing ritual for new team member.

alchemical
A: 

There are a few times you benefit from working with raw machine code, not just assembly language. For example, consider sending a binary file via email, but with an email program that didn't know how to decode attachments. At one time, a few people wrote small programs that could decode the rest of an attachment, but everything in the program was a printable character. So, you decode your attachment, you'd save the body of the email as whatever.com, and then execute it. It would decode the attachment and write a binary file you could then execute.

For another example, years ago on Fidonet there was a rather simple challenge: write a program that simply prints out a number that increments each time it's run -- but (the part that made it tricky) it's not allowed to use any external files or other storage to do the job. To keep this from getting too boring, it was also a code-golf kind of thing, though the measured size was executable bytes, not source code. Quite a few of the entries to this challenge used self-modifying code that depended heavily on exactly how instructions were encoded and such.

Looking for a second, I see I still have the source code to one of my attempts:

.model tiny,c
.286
.code
.startup
main proc
    mov     si,offset count
    inc     byte ptr [si]
    mov     al, [si]
    mov     bx,4090h
    shr     al, 4
    call    convert
    lodsb
    and     al,0fh
    mov     byte ptr end_convert, 08bh
convert:
    add     al,bl
    daa
    adc     al,bh
    daa
    int     29h
end_convert:
    ret
    db      0d6h
;    mov     dx, si
    mov     ah,3ch
    xor     cx, cx
    int     21h
    xchg    bx, ax
    mov     dx,offset main
    mov     cx,offset the_end - offset main
    int     21h
    ret
main endp

count:
        db 0
name:
        db 'c.com', 0
the_end:
    end

I'd better quit now, before I'm responsible for anybody have apoplectic fits (hoping I'm not too late...)

Jerry Coffin
A: 

A really cool example is this famous polyglot, which is a valid DOS .COM file among other things because the ASCII in its source code doubles as binary x86 instructions! http://ideology.com.au/polyglot/polyglot.txt

More boring examples...

Many processors implement ISA instructions as sequences of more primitive micro-instructions (basically collections of datapath control signals) which are "microcoded" in a microcode ROM.

For a simple enough processor, you might write microcode directly in binary rather than assembling it from a mnemonic language. Or if you're reverse engineering a processor, you might not know its micro-instruction set and just have to guess at the format of micro-instructions... in which case you're probably also working in binary. Either way this is lower level than assembly language.

Sometimes code for old processors like the 6502 used undocumented instructions which didn't have official mnemonics, so you had to write binary values rather than assembly instructions.

ccmonkey
A: 

For a college project I had to design a simplified microcontroller in VHDL (a hardware description language). To test it, I wrote an extremely simple program in binary, because it was the most convenient way to feed the program into the simulated microcontroller.

Emilio M Bumachar