So I was working on some existing code written in assembly and self-modifying. I was trying to reduce the amount of self-modifying code by replacing some self-modified immediates with register values where registers where available to hold said values. To my surprise after changing a shift instruction with immediate to CL, the code got significantly slower.
Now it took me quite a while to actually discover this, I had changed the function somewhat to free up the CL register for the shift length, I at first thought it was the combination of all the changes that had added together to result in the observed performance drop but after more experimentation it became clear is was all due to the shift instruction.
WHAT THE FUCK, THE SHIFT INSTRUCTION IS FUCKING SLOW. After checking Agner Fog’s instruction table I confirmed it. On sandy bridge and later the shift instruction with length in CL takes 2 CLOCK CYCLES. What the fuck where they thinking crippling such a common instruction, this instruction is 1/4 the speed of Nehalem, this is slower than every single x86 cpu going back to the Pentium (except P4, but we don’t talk about that). Its slower then fucking Bulldozer.
Intel are fucking retards, they are incapable of making cpus with consistent and predictable performance. Why would they do this, all code written for previous sane cpus is now crippled on this fucking garbage. This is just one of the many layers of bullshit that is the Intel cpu. I won’t even start on the disaster that is AVX and the SSE mode switch issue, seriously WTF. I read that with their newer cpus its even worse, apparently the AVX penalty is now payed for every single SSE instruction executed. If the AVX-512 resisters are dirty then cpu permanently fucking downclocks because each SSE instruction causes an implicit merge back into the full register.
This bullshit has gone too far, I demand Intel be nationalized and shut down. All executives and upper management shall be flayed alive. And any engineers responsible for this retarded bullshit shall be burn at the stake. At they very least all their bullshit x86 patents should be invalidated so that some competent companies can step up and give us some cpus that arn’t fucking Shite. AMD is looking pretty good at the moment but it would be nice to have some competition.
Instruction sets should not be patent-able, nothing in the x86 instruction set is new or novel. In fact its quite the opposite, most of the instructions Intel have come up with recently have been fucking awful garbage. But since so much software is using this crap you have to live with it. I think the x86 went horribly wrong with most of SSE and its gotten so much worse since then.
On the subject of bad instruction set design, I think Intel’s first mistake was back in 1997 with the MMX instruction set. This had the potential to be incredible, we could have had general purpose 64bit integer registers. I have almost never needed to use vector processing, but having native support for 64bit integers would have been a massive improvement especially if multiply and divide instructions where provided. Aside from the lack of 64bit integer instruction MMX made another massive mistake, a mistake which in my opinion was the reason MMX failed to gain real any real traction, MMX trashes the fpu state and required a complete reset after use.
Imagine what could have been if MMX could coexist with x87 and not require that expensive EMMS reset instruction. The compiler would have casual access to 64bit registers, if MMX had included some basic 64bit instructions such as add, sub, shifts, mul and div. That would have significantly sped up much code, as it is using 64bit int on x86 is horrible, especially so if one requires multiply and divide. It would also have been possible to mix fpu and mmx simultaneously. It would have still been possible to share the same registers for both mmx and x87, just don’t mark the x87 registers as in-use when mmx is used on those registers.