views:

1026

answers:

9

Hello,

recently I've been reading some SO archives and encountered statements against x86 architecture.

and many more comments like

  • Compared to most architectures, X86 sucks pretty badly.

  • It's definitely the conventional wisdom that X86 is inferior to MIPS, SPARC, and PowerPC x86 is ugly

I tried searching but didn't find any reasons. I don't find x86 bad probably because this is the only architecture I'm familiar with.

Can someone kindly give me reasons for considering x86 ugly/bad/inferior compared to others.

+27  A: 

Couple of possible reasons for it:

  1. x86 is a relatively old architecture (it's progenitors were 8086s, after all)
  2. x86 has evolved significantly several times, but hardware is required to maintain backwards compatibility with old binaries. Modern x86 hardware still has to contain support hardware to run 16 bit code natively. Additionally, several memory addressing models have to be in place to allow older code to interoperate on the same processor without modification. This can be confusing to some.
  3. In some respects, the x86 isn't inferior, it's simply different than how perhaps every other processor architecture operates. For example, input/output is handled as memory mapping on the vast majority of architectures, but not on the x86.
  4. x86 is a CISC machine, which for a long time meant it was slower than RISC machines. Nowadays the x86 is translated into RISC style instructions before it's executed anyway, so this matters little in practice.
  5. The x86 has a very small number of registers compared to most other architectures. Again, most modern x86 chips are RISCs internally so this is less of a problem, but that's not transparent looking at the ISA.
  6. x86 assembly code is complicated because x86 is a complicated architecture with many features. I can fit the features available on a MIPS machine on a single letter sized piece of paper. Even a simple lookup table of x86 instructions fills several pages. While that doesn't necessarily make MIPS superior, for teaching an introduction to assembler class it'd make sense to start with a simpler architecture.
  7. The x86 uses variable length opcodes, which add hardware complexity with respect to the parsing of instructions.

EDIT: This is not supposed to be a bash the x86! party. I had little choice but to do some amount of bashing given the way the question's worded. But with the exception of (1), all these things were done for good reasons (see comments). Intel designers aren't stupid -- they wanted to achieve some things with their architecture, and these are some of the taxes they had to pay to make those things a reality.

Billy ONeal
I see variable-length opcodes are a source of strength, as x86 machine code tends to take less space than PowerPC code, for instance. I may be wrong.
Joey Adams
It's a tradeoff. It's a strength in that the binary size might be smaller, but it's a weakness in that you need to have very complicated hardware to implement a parser for these instructions. The vast majority of instructions are the same size anyway -- most of the reason for variable length opcodes on x86 is for when they decided to add features and found they couldn't represent what they wanted in the number of bits they had to work with. The vast majority of people aren't concerned with binary size nearly as much as hardware complexity or power consumption.
Billy ONeal
@Joey Adams: Contrast the x86's variable length instructions with the ARM's Thumb Mode ( http://en.wikipedia.org/wiki/ARM_architecture#Thumb ). Thumb Mode results in significantly smaller object code for the ARM because the shorter instructions map directly to normal instructions. But since there is a 1:1 mapping between the larger instructions and the smaller ones, the parsing hardware is simple to implement. The x86's variable length instructions don't have these benefits because they weren't designed that way in the first place.
Billy ONeal
+1 Wow!! these were some -ves about x86. Great stuff!! I wonder what would be +ves of x86 over other architectures.
claws
(7) The x86 uses variable length opcodes, which add hardware complexity with respect to the parsing of instructions - is more of a problem for compiler writers and those writing self-modifying code. The hardware doesn't give a shit.
Chris Kaminski
(6) Not every op-code needs to be used by every program, but dammit, when I need SSE3, I'm glad I have it.
Chris Kaminski
@claws: It's not entirely meant to be read that way -- most everything I've listed above (except for #1) are tradeoffs. For example, #2 and #7 are the way they are to maintain backward compatibility with existing code. That's a definite +. #3 is just different, no win or loss. #4 means that for somebody who knows what they're doing, writing hand coded assembly can be significantly easier. #5 is a consequence of being a CISC, and doesn't matter much in practice nowadays. #6 means that the processor does more work for the assembly programmer -- again, a trade off. Can't have everything.
Billy ONeal
@Chris Kaminski: How does that not affect the hardware? Sure, on a modern full sized computer nobody's going to care, but if I'm making something like a cell phone, I care more about power consumption than almost anything else. The variable length opcodes don't increase execution time but the decode hardware still requires power to operate.
Billy ONeal
I've never had the chance to design ICs professionally, but had the opportunity to concoct routines to ensure processors do work. I have yet to understand how much more hardware complexity variable-length instruction-sets add up. I always thought the complexity was more significant on the designer's task and the amount of cell modularity you have to forego. I've never thought such complexity added much to how much pain a processor would suffer.
Blessed Geek
@Billy ONeal: I'm not saying it can't be better - just that the people bitching about how horrid the x86 is are compiler writers. The ones (I think) with a legitimate beef were the OS authors in the early 90's and the whole Protected Mode (and then PAE) mess. I'm glad AMD finally forced Intel to see the 64-bit light.
Chris Kaminski
@Chris: Agreed. I'm not trying to bash up on the x86. Just citing some of the things people typically bash it for.
Billy ONeal
@Billy ONeal: And you missed the biggest one of all - All the damn memory models!
Chris Kaminski
(3) Doesn't DMA change this (as well as introduce it's own share of NEW headaches)?
Chris Kaminski
@Chris: Regarding memory models, that's part of what I meant by (2). The memory models were not designed that way in the first place; they are a result of changing significant things (like native word size) and a desire to maintain backwards compatibility with code using the old model. Regarding DMA, yes, that changes things (like it changes things for *every* CPU), but that's the same for RISCs as well. Not all hardware (i.e. the keyboard) uses DMA though, so the x86's I/O differences are still alive and well.
Billy ONeal
@Billy : May be you should edit (2) to include your comment because its not obvious from (2)
claws
@claws: Done. Happy now? :)
Billy ONeal
breaking compatibility is sometimes the only way towards real enhancements and improvements; the only reason not to do that is something I can call to be brief "marketing", and often it is not a good thing (growing the pocket of some people apart, of course, but not to make hardware better for real)
ShinTakezou
@ShinTakezou: Frankly, I agree with you. However, x86 is one case where I'm quite happy they left things as they are. I like the ability to run software that came out 15 years ago without modification and without keeping old hardware around. The amount of software written for x86 is vast enough to make a backwards-incompatible change a problem.
Billy ONeal
@Billy luckly some evolution exists and hardware emulation can be done in software (with the help of some feature of modern processors too, thinking of "virtualization"), so 15-yo software can run, and sometimes faster than on the original hardware; I think that having cheap memories, powerful processors (still more powerful breaking backward compatibilities), innovative hardware design (more asynchronous architectures?), we can run the old sw without modification, on a "hosted virtual hardware" that has even better performance of the real old one.
ShinTakezou
@ShinTakezou: Good for you. Problem is that the proposed hardware does not exist. The only competing architectures with x86 nowadays are RISC architectures, and none offers better practical performance than x86 does anymore. If you want innovative hardware design, you should be looking at GPU computing. No general purpose computer offers better performance than existing architectures.
Billy ONeal
@Billy ONeal,we don't need async archs in particular:current"bad" hw is able to run emulations smoothly enough!Why they insist on"compatibility"very few persons(or none)are interested in?I am not in the debate x86(CISC?)vs RISC;I am in the debate that more promising and "better"processors/arch existed and could exist(Intel or whatever,it is not important for the final user,but unluckly it is for intel,that "drives"the market,and I suspect that,as often happens,slowing down innovation is a way of maximizing profits,thus the promises of '70s magazines"strangely"were betrayed).
ShinTakezou
@ShinTakezou: Your assertion that current hardware is "bad" is simply incorrect. If Intel hardware is so bad, then I ask why no other architecture offers significantly better performance. ARM and friends certainly have an incentive to produce such a platform, and they produce as many if not more chips than Intel, for devices such as the iPod, iPhone, and iPad, and other types of embedded devices, such as television sets.
Billy ONeal
I put " around bad purposely to avoid such a unsenseful talk;but I can also state it is __bad__, I've heard a lot of technical people saying that and trying to explain technical points I am not able to reproduce or fully understand,but I am more interested in logic in this case(to be "precise",we should enlarge the touched topics):the fact that no other archs compete on the consumer market in"performance",can't be used as an argument to prove "PC" current hardware is not bad.(Note:I stress the usage of generic "hw"/arch against "x86 arch" where people may think I talk about x86 internals)
ShinTakezou
@ShinTakezou: What specifically is bad? What architecture specifically exists that does not have the same "badness"?
Billy ONeal
once there were horses and someone talked about moving machines,people asked what's wrong with horses,until prototypes started to become widespread;other hardware canbe reality,simply there is no enough"enterprise"to make them widespread(at consumer level),so as already said,the current hw is decided by market(and its saturation to maximize profits),not by the possibility of known technology(molding in RD-labs,I think).From a"historical"PoV,current pc hardware is basically the same ibm pc('80s):faster clocks,larger buses etc. are not innovations(and real innovations are very few,if any)
ShinTakezou
@ShinTakezou: When something does not change, that generally indicates that something was done correctly in the first place. If you can't point to a case where such a change resulted in increased performance, then your argument has no leg to stand on. More to the point, there have been considerable changes in PC operation. Look at GPU computing -- that's a significant model departure. Look at recent chips from Intel and AMD: They don't have the traditional Northbridge/Southbridge pair that was standard for many years. What sort of innovation are you thinking of?
Billy ONeal
Logic again;"generally"means nothing;10millions persons saying a lie don't make it a truth.No need 4more legs than you,since ur only arg is"it works"/"it is so",basically.GPU usage is a false innovation,the idea of using coproc. is old(so no surprise things change and become better a bit..just not so fast as they can.I cited async hw;very old idea,but only recently reconsidere,since>>>>
ShinTakezou
<<<<(continuing)since we are reaching the limits of the possibilities of the current approach...we knew it would have happened,but we continued anyway on the less innovative path. Because it was considered for some reason easier,cheaper, whatever...anyway as already said, for economical-market reasons, not for strictly techinical impossibilities.I have old magazines promising incredible things in few years..before the "market" became so important,slowing down the peaces.So, as already said, these _possible_ promises were betrayed,and we are,to say a unsenseful number,10 years late.
ShinTakezou
@ShinTakezou: Name such a "promise", please.
Billy ONeal
ShinTakezou
to be brief without searching, imagine that they talked about what we can do today with computers as possible in few years (mags from 1980 or so), say 1985; maybe optimistical, so let us imagine they say it would be 2000. They talked of what computers do now, as possible in 1985-1990. It was a projection of the trend, I think. That was what technology was promising. Then things started to slow down, from the consumer PoV.Innovative machines (or machines trying to be innovative)were simply "ruled out" by the market/-ing... save taking from them years later and selling as "innovative"
ShinTakezou
@ShinTakezou: Name such a "promise", please. I still have yet to see you write one.
Billy ONeal
A promise is something that you say it can happen soon,
ShinTakezou
@ShinTakezou: Yes, the projection was wrong. They thought they were going to be making chips in the 20GHZ range too, until they started to push things and found there were physical limits to switching speed.
Billy ONeal
no, physical limits were well enough known. They was wrong since they thought that if a working tech is ready in a lab, the day after they could start producing it for the masses. This is currently wrong.
ShinTakezou
@ShinTakezou: "physical limits were well enough known" <-- Really? You expect me to believe they were pushing tens of GHZ 15 years ago? What other architecture exists that has this "working tech ready in a lab"? There are plenty of thriving architectures in use, most notably PPC (IBM's POWER Architecture, most notably) and ARM (and various children from ARM Holdings). Just because Intel's architecture took over the desktop market does not mean that Intel is the only game in town. If there existed the tech co accomplish what you propose, rest assured someone like IBM or ARM would now sell it.
Billy ONeal
the limits of how fast you can dissipate heat "produced" inside a small area because of any physical process can be exstimated since before the advent of chips;"they" knew that miniaturization and increasing switching speed impose limits(adjustable,but not beyond the physical threshold,again knowable,and nobody wants a liquid He cooler in his desktop,right?);the __Very same__ tech you have today,likely,is what was into labs before;I am not talking about mysterious things/techs,but simply about the gap from their working existence and their commercialization;it makes a big difference>>>
ShinTakezou
Since artificialy enlarging the gap makes an opportunity of increasing profits,all these tech-companies(intel,motorola,ibm or whatever)do their Rnot that they do so because are pervert,likely they have also no chance...because of the market...so again,my prev statement:the market,its mechanisms and selfreferentiality slow down innovation(and increase profits,so nobody is really interested in changing those mechanisms);so what's on the market now,it could have appeared 5,10,maybe 15 years ago.So,if there's no market for a thing,__nobody__ >>>
ShinTakezou
>>> tries to sell it,even though it is better than what it is currently selling. --- stop with these long comments: democratically I am right, you're are optimistically wrong :D (just since this is not a comfortable place where to express longly opinions about the matter)
ShinTakezou
+3  A: 

I'm not an expert, but it seems that many of the features why people don't like it can be the reasons it performs well. Several years ago, having registers (instead of a stack), register frames, etc. were seen as nice solutions for making the architecture seem simpler to humans. However, nowadays, what matters is cache performance, and x86's variable-length words allow it to store more instructions in cache. The "instruction decode", which I believe opponents pointed out once took up half the chip, is not nearly so much that way anymore.

I think parallelism is one of the most important factors nowadays -- at least for algorithms that already run fast enough to be usable. Expressing high parallelism in software allows the hardware to amortize (or often completely hide) memory latencies. Of course, the farther reaching architecture future is probably in something like quantum computing.

I have heard from nVidia that one of Intel's mistakes was that they kept the binary formats close to the hardware. CUDA's PTX does some fast register use calculations (graph coloring), so nVidia can use a register machine instead of a stack machine, but still have an upgrade path that doesn't break all old software.

gatoatigrado
RISC was not designed with human developers in mind. One of the ideas behind RISC was to offload some of the complexity of the chip onto whoever wrote the assembly, ideally the compiler. More registers meant less memory usage and fewer dependencies between instructions, allowing deeper pipelines and higher performance. Note that x86-64 has twice as many general registers as x86, and this alone is responsible for significant performance gains. And instructions on most x86 chips are decoded before they are cached, not after (so size doesn't matter here).
Dietrich Epp
@Dietrich Epp: That's not entirely true. The x86-64 does have more registers visible in the ISA, but modern x86 implementations usually have a RISC style register file which is mapped to the ISA's registers on demand to speed up execution.
Billy ONeal
"I have heard from nVidia that one of Intel's mistakes was that they kept the binary formats close to the hardware." -- I didn't get this and the CUDA's PTX part.
claws
@Dietrech Epp: "And instructions on most x86 chips are decoded before they are cached, not after" That's not true. They are cached before they are decoded. I believe the Pentium 4 had an additional trace cache that cached after decode, but that's been discontinued.
Nathan Fellman
+5  A: 

x86 assembler language isn't so bad. It's when you get to the machine code that it starts to get really ugly. Instruction codes, addressing modes, etc are much more complicated than the ones for most RISC CPUs. (There's actually an addressing mode for [BX+SI], iirc, but not one for [AX+BX]. Inconsistencies like that complicate register usage, since you need to ensure your value's in a register that you can use as you need to.

There's also the leftovers from the olden days, when Intel was trying to make x86 the ultimate processor. Instructions a couple of bytes long that performed tasks that no one actually does any more, cause they were frankly too freaking slow or complicated. The ENTER and LOOP instructions, for two examples -- note the C stack frame code is like "push ebp; mov ebp, esp" and not "enter" for most compilers.

cHao
I believe the "enter" versus "push/mov" issue arose because on some processors, "push/mov" is faster. On some processors, "enter" is faster. C’est la vie.
Dietrich Epp
When I was forced to a x86 based machine and started to take a look at it (having m68k background),I started to feel asm programming frustrating, ... like if I've learned programming with a language like C, and then be forced to get in touch with asm... you "feel" you lose power of expression, ease, clarity, "coherence", "intuitionability".I am sure that if I would have started asm programming with x86,I would have thought it is not so bad...maybe... I did also MMIX and MIPS, and their "asm lang" is far better than x86 (if this is the right PoV for the Q, but maybe it is not)
ShinTakezou
+10  A: 

In the Power instruction set, most instructions take two input registers, perform a computation, and store the result in an output register. There are 32 registers: r0 through r31. So you can:

divw r3,r4,r5

Which stores r4 / r5 into r3. That's very simple. There are no addressing modes. The registers aren't different from each other. You always specify which registers you use. The registers are always 32 or 64 bits wide, and the 64 bit support was designed into the spec right from the start (even though processors didn't implement it until later).

Compare to x86.

idivl %ebx

The dividend and result are always %edx:%eax, so you have to use those registers. This is already weird, and we haven't even left the confines of the ALU. Also, the divisor could be a constant, a register, or a memory location (in one of several ways). Many instructions have their own special rules like this. For example, "push" only works on %esp. Memory operands are also somewhat special, because you can't use two of them in a single operation. These are extra rules you have to remember and your compiler has to follow.

The difference is even wider when you look at floating point. On x86, you can use x87 or SSE for floating point, each with their own quirks. If you've ever written for the x87, you know just how bizarre it is (it uses a stack, registers are 80 bits). On Power, there's just a separate set of 32 floating point registers in IEEE double.

As Billy ONeal mentioned, there's also a set of instructions which really only exist for backwards compatibility. There are instructions for 8-bit and 16-bit registers, binary coded decimal, and other random cruft that few people would use these days. This isn't a burden for the software folk (who can just ignore them) but the hardware folk have to build those instructions out using transistors, which translates into larger dies and more power consumption. Embedded hardware often doesn't need to support legacy software so it usually uses a simpler architecture.

Dietrich Epp
compare to e.g. `divs.l <ea>,Dr:Dq` with <ea> almost all the allowed addressing mode (except using A* regs); having three regs is typical of RISC, while typically on CISC destination reg(s) is the same of one operand (in divs.l example above, destination registers are Dr and Dq ... instead of "fixed", say, D0:D1), and an `ea` can be a D* reg, or something like `(xyz,A*)`, and the whole op. corresponds to at least 2 x86 instruction (at least since if you have to preserve eax and edx, you must copy them somewhere)... this is "CISC"... x86 is poor even as CISC i.s.! (1/3 of the required PoV)
ShinTakezou
+4  A: 

The x86 architecture dates from the design of the 8008 microprocessor and relatives. These CPUs were designed in a time when memory was slow and if you could do it on the CPU die, it was often a lot faster. However, CPU die-space was also expensive. These two reasons are why there are only a small number of registers that tend to have special purposes, and a complicated instruction set with all sorts of gotchas and limitations.

Other processors from the same era (e.g. the 6502 family) also have similar limitations and quirks. Interestingly, both the 8008 series and the 6502 series were intended as embedded controllers. Even back then, embedded controllers were expected to be programmed in assembler and in many ways catered to the assembly programmer rather than the compiler writer. (Look at the VAX chip for what happens when you cater to the compiler write.) The designers didn't expect them to become general purpose computing platforms; that's what things like the predecessors of the POWER archicture were for. The Home Computer revolution changed that, of course.

staticsan
+1 for the only answer here from someone who actually seems to have historical background on the issue.
Billy ONeal
there are other cisc processors, coming out from the 8 bit era (m68k can be considered the descending of 6800; z8000 or alike from z80...) that evolved into "better" cisc, so it's not a good excuse. Extinction is the only path to real evolution, and trying to be backward compatible is a defect and limitation, not a feature. The Home Computer status is late if you think about Home Computer revolution promises. And I believe part of the guiltiness is for the "backward compatibility" issue, which is about _marketing_, not technology.
ShinTakezou
Yes, I didn't mention Marketing. This was a hugely powerful force in the lifeline of the x86 architecture and I don't know why I missed it.
staticsan
+8  A: 

The main knock against x86 in my mind is its CISC origins - the instruction set contains a lot of implicit interdependencies. These interdependencies make it difficult to do things like instruction reordering on the chip, because the artifacts and semantics of those interdependencies must be preserved for each instruction.

For example, most x86 integer add & subtract instructions modify the flags register. After performing an add or subtract, the next operation is often to look at the flags register to check for overflow, sign bit, etc. If there's another add after that, it's very difficult to tell whether it's safe to begin execution of the 2nd add before the outcome of the 1st add is known.

On a RISC architecture, the add instruction would specify the input operands and the output register(s), and everything about the operation would take place using only those registers. This makes it much easier to decouple add operations that are near each other because there's no bloomin' flags register forcing everything to line up and execute single file.

The DEC Alpha AXP chip, a MIPS style RISC design, was painfully spartan in the instructions available, but the instruction set was designed to avoid inter-instruction implicit register dependencies. There was no hardware-defined stack register. There was no hardware-defined flags register. Even the instruction pointer was OS defined - if you wanted to return to the caller, you had to work out how the caller was going to let you know what address to return to. This was usually defined by the OS calling convention. On the x86, though, it's defined by the chip hardware.

Anyway, over 3 or 4 generations of Alpha AXP chip designs, the hardware went from being a literal implementation of the spartan instruction set with 32 int registers and 32 float registers to a massively out of order execution engine with 80 internal registers, register renaming, result forwarding (where the result of a previous instruction is forwarded to a later instruction that is dependent on the value) and all sorts of wild and crazy performance boosters. And with all of those bells and whistles, the AXP chip die was still considerably smaller than the comparable Pentium chip die of that time, and the AXP was a hell of a lot faster.

You don't see those kinds of bursts of performance boosting things in the x86 family tree largely because the x86 instruction set's complexity makes many kinds of execution optimizations prohibitively expensive if not impossible. Intel's stroke of genius was in giving up on implementing the x86 instruction set in hardware anymore - all modern x86 chips are actually RISC cores that to a certain degree interpret the x86 instructions, translating them into internal microcode which preserves all the semantics of the original x86 instruction, but allows for a little bit of that RISC out-of-order and other optimizations over the microcode.

I've written a lot of x86 assembler and can fully appreciate the convenience of its CISC roots. But I didn't fully appreciate just how complicated x86 was until I spent some time writing Alpha AXP assembler. I was gobsmacked by AXP's simplicity and uniformity. The differences are enormous, and profound.

dthorpe
I'll listen to no bashing of CISC *per se* unless and until you can explain m68k.
dmckee
@dmckee : I'm the OP. I don't know anything about m68k But can you explain why doesn't these things hold for m68k?
claws
I'm not familiar with the m68k, so I can't critique it.
dthorpe
I don't think this answer is bad enough to downvote, but I do think the whole "RISC is smaller and faster than CISC" argument isn't really relevant in the modern era. Sure, the AXP might have been a hell of a lot faster for it's time, but the fact of the matter is that modern RISCs and modern CISCs are about the same when it comes to performance. As I said in my answer, the slight power penalty for x86 decode is a reason not to use x86 for something like a mobile phone, but that's little argument for a full sized desktop or notebook.
Billy ONeal
@Billy: size is more than just code size or instruction size. Intel pays quite a penalty in chip surface area to implement the hardware logic for all those special instructions, RISC microcode core under the hood or not. Size of the die directly impacts cost to manufacture, so it's still a valid concern with modern system designs.
dthorpe
@dthorpe: Really? I'd like to see that statement backed up with some actual data regarding the die area spent on x86 decode. Otherwise I have no choice but to discount it as FUD. Looking at recent Intel chips, over half the chip is cache. Somehow I don't think x86 decode is a significant portion of die area.
Billy ONeal
I'd like to add that RISC-like orthogonality in a CISC-like design is possible: Texas Instruments did it decades ago with their 990 architecture. The 9900 microprocessor was a wonderfully clean design. I've written assembly for Z80, 8086/88, 6502 and 9900 chips and the 9900 is far-and-away the best design IMO. However, the x86 was where I was getting paid.
staticsan
+2  A: 

Besides the reasons people have already mentioned:

  • x86-16 had a rather strange memory addressing scheme which allowed a single memory location to be addressed in up to 4096 different ways, limited RAM to 1 MB, and forced programmers to deal with two different sizes of pointers. Fortunately, the move to 32-bit made this feature unnecessary, but x86 chips still carry the cruft of segment registers.
  • While not a fault of x86 per se, x86 calling conventions weren't standardized like MIPS was (mostly because MS-DOS didn't come with any compilers), leaving us with the mess of __cdecl, __stdcall, __fastcall, etc.
dan04
Hmm.. when I think of x86 competitors, I don't think of MIPS. ARM or PowerPC maybe....
Billy ONeal
@Billy: x86 has been around near forever. At one time MIPS was an x86 competitor. As I remember x86 had its work cut out to get to a level where it was competitive with MIPS. (Back when MIPS and SPARC were fighting it out in the workstation arena.)
Shannon Severance
@Shannon Severance: Just because something once was does not mean something that is.
Billy ONeal
+1  A: 

I think you'll get to part of the answer if you ever try to write a compiler that targets x86, or if you write an x86 machine emulator, or even if you try to implement the ISA in a hardware design.

Although I understand the "x86 is ugly!" arguments, I still think it's more fun writing x86 assembly than MIPS (for example) - the latter is just plain tedious. It was always meant to be nice to compilers rather than to humans. I'm not sure a chip could be more hostile to compiler writers if it tried...

The ugliest part for me is the way (real-mode) segmentation works - that any physical address has 4096 segment:offset aliases. When last did you need that? Things would have been so much simpler if the segment part were strictly higher-order bits of a 32-bit address.

Bernd Jendrissek
m68k is a lot funnier, and nice to humans far more than x86 (which can't seem so "human" to many m68k programmers), if the right PoV is the way human can write code in those assembly.
ShinTakezou
The segment:offset addressing was an attempt to stay compatible to some extent with the CP/M - world. One of the worst decisions ever.
Turing Complete
+2  A: 
  1. x86 has a very, very limited set of general purpose registers

  2. it promotes a very inefficient style of development on the lowest level (CISC hell) instead of an efficient load / store methodology

  3. Intel made the horrifying decision to introduce the plainly stupid segment / offset - memory adressing model to stay compatible with (at this time already!) outdated technology

  4. At a time when everyone was going 32 bit, the x86 held back the mainstream PC world by being a meager 16 bit (most of them - the 8088 - even only with 8 bit external data paths, which is even scarier!) CPU


For me (and I'm a DOS veteran that has seen each and every generation of PCs from a developers perspective!) point 3. was the worst.

Imagine the following situation we had in the early 90s (mainstream!):

a) An operating system that had insane limitations for legacy reasons (640kB of easily accessible RAM) - DOS

b) An operating system extension (Windows) that could do more in terms of RAM, but was limited when it came to stuff like games, etc... and was not the most stable thing on Earth (luckily this changed later, but I'm talking about the early 90s here)

c) Most software was still DOS and we had to create boot disks often for special software, because there was this EMM386.exe that some programs liked, others hated (especially gamers - and I was an AVID gamer at this time - know what I'm talking about here)

d) We were limited to MCGA 320x200x8 bits (ok, there was a bit more with special tricks, 360x480x8 was possible, but only without runtime library support), everything else was messy and horrible ("VESA" - lol)

e) But in terms of hardware we had 32 bit machines with quite a few megabytes of RAM and VGA cards with support of up to 1024x768

Reason for this bad situation?

A simple design decision by Intel. Machine instruction level (NOT binary level!) compatibility to something that was already dying, I think it was the 8085. The other, seemingly unrelated problems (graphic modes, etc...) were related for technical reasons and because of the very narrow minded architecture the x86 platform brought with itself.

Today, the situation is different, but ask any assembler developer or people who build compiler backends for the x86. The insanely low number of general purpose registers is nothing but a horrible performance killer.

Turing Complete