C++ lambdas suck

C++ lambdas are a killer feature, its just a shame that they are too bloated to ever be used. I had so much hope for this feature, I was crushed when I saw the generated assembly. Seriously WTF, every single captured variable is done so by way of a separate pointer. You could capture everything using just a single pointer to the parent scope. What a lazy good for nothing excuse for an implementation, I am not sure if something in the C++ standard has steered the implementation to be this way but every compiler does this as far as I can tell.

Here is a comparison between the generated assembly for GCC C nested functions vs C++ lambdas, I think you should clearly be able to see why I am so angry. I don’t understand why anyone would think that this bullshit code is acceptable, it makes the the whole feature completely worthless.

Lambda test code:
lambda test

C++ Lambda assembly:

GCC C Nested function equivalent:
nested function c


Posted in Uncategorized | Leave a comment

GCC gets worse each version: part 1

Each version of GCC is somehow worse than the last. The compiler does increasingly more retarded things each version. I have observed serious bugs which go unfixed for years, do they even test this shit?

Anyway onto today’s insanity. Current versions of GCC refuse to use XMM registers in 32bit for general purpose such as memory copy and filling. It used to work but in gcc8.1 for some inexplicable reason it stopped working. Using XMM registers for memcpy and memset is a huge improvement in both speed and code size, but for some reason apparently we can’t have that anymore in 32bits. FIX YOUR SHIT.

Posted in Uncategorized | Leave a comment

On the hidden improvements of NT6x: part 1

As a faithful user of NT5x it has become increasingly painful as much software has dropped support and requires vista or newer, hell even windows 7 support is waning now and I am still on xp/2k3. Anyway, over the years I have made various efforts to hack in newer apis such that I can run this newer software. I have never made much progress due to working on many projects at once and also the fact that I am a perfectionist. I can also never quite decided how to go about it so end up floundering around with a bunch of half implemented tests.

During my experiments I have discovered a number of ways in which NT5x is completely broken. The piece of brain damage I will discuss here today is one of several related to dynamically loaded libraries.

When one would like to load a library at run-time, one can call this function. The specified module and all its dependencies will be loaded and a corresponding call to FreeLibrary will unload this module and any dependencies will also be unloaded if no longer needed. All nice a simple, there is just one major flaw in this system.

Export forwarders:
So a dll is able to forward an export onto another dll. This export forwarding has a number of performance advantages: 1. there needs be no thunks, the function pointer is fetched directly from the target dll. 2. The target dll is loaded on demand, if none of the forwarded functions are used then the target dll need not be loaded. There is one major problem with this mechanism however, dlls loaded by way of a forwarded export can never be unloaded, windows does not track these loads and has no way to determine when these dlls should be unloaded.

NT6x fix:
When a dll is unloaded its import table is walked and each dependent dll has its reference count decremented. Those dlls are then also unloaded if their ref-count drops to zero. This mechanism is completely incompatible with unloading of dlls loaded by forwarding so in vista a new mechanism was added to track just forwarder loaded dlls.

1. a new field was added to LDR_DATA_TABLE_ENTRY: ForwarderLinks
2. when a dll is loaded by fowarder the function
LdrpRecordForwarder(PLDR_DATA_TABLE_ENTRY LdrEntry,  PLDR_DATA_TABLE_ENTRY ldrTarget)
is called, this function inserts the target dll into the ForwarderLinks list
3. The ForwarderLinks list structure
struct ForwarderLinks_t { LIST_ENTRY list;
    PLDR_DATA_TABLE_ENTRY* target; int refCount; };
For each time the target dll is loaded LdrpRecordForwarder is called
and the ForwarderLinks list is inserted the first time, susequent times the refCount is incremented
The ref-count field is not really needed, it could have been implemented without a ref-count
But they decided to leave the existing call to LdrpLoadDll which always increments the dll ref-count
Allot of the changes made in vista feel very tacked on.
4. The LdrUnloadDll has been altered as necessary to make use of this new linked list to correctly unload this dll which where loaded by forwarded and are no-longer needed. Pretty simple fix, why on earth was this allowed to be broken like this for so long. It took them more then 10years to fix this since the first windows NT release.


Posted in Uncategorized | Leave a comment

Beware the page heap

So I nearly pulled out all my hair today trying to figure out the most insane and bizarre problem. I thought my OS had gotten fucked up in some way, even worried that I had some kind of infection. There was however, nothing wrong, except my stupidity.

So it began when suddenly, and for no reason I could determine a piece of software I was working on suddenly got 10 times slower. It was a development tool I have produced which I make extensive use of and being 10 times slower was instantly noticed. At first I thought there was some kind of bug introduced by changing compiler flags which recently I had done. After changing them all back and recompiling everything there was no change.

At this point I was thoroughly confused, maybe the input data had changed somehow making the tool take more time to run, but even though the tool has sloppy n² algorithms 10 times slower is still a huge change when I did not recall changing anything at all. At some point I used my sampling profiler to determine where in the tool the slowdown occurred, that place was the symbol lookup function, it uses a crappy n² search array which was taking up all the time. That function performed some 40 million string comparisons during the execution of the program, that is quite allot, but it was taking 3000ms to run which is terrible.

At this point in desperation I pulled out an old binary and tried, that. The problem still persisted. I then coppied the code to another computer and ran the current version and there was no problem. I though maybe my cpu had down-clocked or broke in some way so I ran the tool in VM on same machine and it worked fine. I restarted in safe mode and there the tool ran at normal speed. At this point I was getting worried. What the fuck had happened to my computer as to make my tool run slow. Had my OS broken in some bizarre  way as to cause slowdown to piece of code which called no function called. Did I have some kind of infection. I was about ready to reinstall my whole computer, which would have been a serious ordeal.

Before reinstalling my computer I decided to do a few more experiments, I wanted to narrow down the exact cause of the slowness. I was thinking that some sequence of instructions was fucking up the cpu or maybe there was some problem with memory or something. Maybe there where some silent exceptions being thrown somewhere. Anyway after a little more experimentation a thought entered my brain, page heap.

The fucking page heap, I had recently enabled the page heap when trying to solve a bug in my tool, it turned out the bug was not memory corruption but I though not to disable the page heap because it had never caused me issues before.  I was so wrong to leave it enabled.

What does the page heap do?
The page heap alters the behavior of the memory allocator such that every allocation is given its own page. This is horrific in terms of memory usage but makes dealing with heap corruption much easier as now writing beyond the allocation can be detected by a page fault. Its not immediately obvious but this will also have a horrific effect on performance in some situations.

The cache.
The cache is arranged in such a way that only a certain number of items can share the same lower address bits. This type of cache is called Set Associative Cache. My cpu and pretty much every modern cpu has a cache which operates in this fashion, this limitation in not too bad in the normal case. However with page heap enabled pretty much all the small allocations end up with the same lower address bits completely filling up all usable slots in the cache resulting in thrashing and the massive 10x slowdown.

So in summary, always be aware of how the page heap can fuck you up. You shall disable it as soon as it is no-longer needed. It can come back to bite you later when you have forgotten that you had enabled it. But since you have read this warning you will now know where to look if your code gets slow for no reasons, you will not have to face the terror that I had this day.







Posted in Uncategorized | Leave a comment

The gcc optimizer is retarded

So it is usually the case that I compile my code with -Os, this results in nice and compact code, however gcc tries too hard to make the code as small as possible sometimes making it very slow. I was forced to switch to -O2 because of this. However I really do not like how -O2 behaves in regards to inlining. It inlines really quite large functions, now I was mostly putting up with this but then it came to virtual functions and I was triggered.

Virtual functions and inlining:
You know what insane thing gcc did?
It first compared the value in the vtable and then if it matched the class’s own implementation it branched to the inline code, else it calls the vtable function.
Seriously WTF, inlining is meant to speed things up but now you have introduced a conditional branch and all that extra code bloat. Function calls are not that expensive, especially if you are going to try all this bullshit to avoid them. Just call the damn virtual function and stop doing retarded things.

I am really starting to get sick of writing code for gcc, its a really shit compiler that produces very ugly assembly. Luckily I maintain my own fork of gcc so this is another thing I will look into fixing. I shall create a new optimization level which is partway between -Os and -O2. This new optimization level will be such that it will avoid code which increases size but will not try to reduce code size either.

Addendum: -fno-devirtualize-speculatively option can disable this particular piece of bullshit. At least that’s one less hack I need to make to the compiler.


Posted in Uncategorized | Leave a comment

gnu ld is a shit linker

So today I was trying to build a module where a few functions were placed at fixed addresses, well it turns out there is absolutely no way to do such a thing with ld. Sure you can place a section at a fixed address, but that is not what I want, I want to place a particular function at a fixed address, but I want this function to be in the same section as the rest of the code. Also I need the other sections to be fit in around these fixed allocations. Well that is not possible, ld does not perform any kind of free space allocation. There is no way to even make it do what I want as far as I can tell, the way its designed is just completely incompatible even with a hack solution. So of course now I have to write my own linker.


Posted in Uncategorized | Leave a comment

A simple solution to integer overflow in c/c++.

In c/c++ there is a major issue which is responsible for a significant proportion of all software security and stability problems. The bullshit integer model: for each operator of an expression its operands are first promoted to integer and then a common type is chosen by taking the larger of the the two operands, if the types differ by signed-ness then unsigned is selected. The resultant output is of this type also.

If for example one where to add two integers together then they would remain as integer type and the result type would be integer also. When one adds together two integers and attempts to force the result back into another integer then overflow is possible. Now one might think that this is reasonable, and it is if you perform a simple expression and assign the result back into an integer but for more complex situations then it becomes a problem.

int average(int a, int b) { return (a + b) / 2; }
s64 add64(s32 a, s32 b) { return a + b; }
Both of these will return the wrong results in some cases due to intermediate overflow. I consider this to be total horse shit. The c/c++ integer promotion rules make this even worse due to inconsistency. Why should s32 add32(s16 a, s16 b) { return a + b; } work when the above add64 does not, total bullshit.

Bounds checking:
The most significant area where the integer model really sucks is bounds checking. When parsing data structures loaded into memory its nearly impossible to correctly check for out of bound lengths or offsets. Its so difficult that in many cases one will not even bother, nobody wants to spend thrice as long writing bound checks as actually writing useful code. Even if you managed to write functional bounds checks they look so horrifically ugly and unmaintainable it makes you wish you had not even bothered.

Now of course one cannot go changing the integer model of the language that would probably break so many things (I would still consider doing so because the existing model sucks to bad). What we need is a way to tell the compiler to stop being retarded and do the right thing, perform the intermediate computation at significant width to get the mathematically correct result. This is really not that difficult for most expressions, double width would prevent overflow in most cases, and for the case of bounds checking, just checking the carry flag would suffice.

The solution:
I propose a new piece of syntax, a special pseudo function we shall call ‘X’ for now, any code placed as the argument to ‘X’ will be computed at a precision sufficient to return the mathematically correct result. With this new language feature performing bounds checking or other things which might overflow becomes trivial.

The previous examples would simply be rewritten as.
int average(int a, int b) { return X((a + b) / 2); }
s64 add64(s32 a, s32 b) { return X(a + b); }

Is it is really so much to ask to get a feature like this, it would be more useful than anything else they come up with in the last 10 years. But of course they would never consider actually adding something useful, they just endlessly masturbate over some weird useless ‘type safety’ bullshit that has no real world use.













Performing range checks is so profoundly difficult due to intermediate overflow that most programmers get it wrong, or do not even bother.





Posted in Uncategorized | Leave a comment

Why I no longer use fossil source control.

For many years I have used fossil as my source control of choice. It was my first source control and it was a revolution for me. So many projects I have completely fucked up by not using source control. Fossil is great, it solved all of the sticking points which caused me to resist using  other source controls such as git or whatever else was popular at the time. I really like its single file repository approach, all other source control have this massive folder of bullshit. I also love that the checkout is separate from the repository, seriously that is total bullshit. What if you want to work on two revisions at once, well you can’t.

Now fossil has one major issue which is not a problem at first but overtime it becomes a bigger and bigger problem. Absolutely no way to alter history at all. Now I am not a fan of re-basing or any other kind of altering history but fossil takes it to an extreme. Fuck up your last commit? Just realized one second after you hit enter? Well you are fucked that commit is there forever. Sure you could grovel through the undocumented database file and manually undo that commit but good luck with that.

Now lets talk about git. I really quite dislike git, its kind of shit. But git works and it has lots of momentum behind it. You also have places like github and gitlab where you can upload and share your repositories. Fossil basically has nothing, its too niche. Git has a terrible user interface and has no decent gui to speak-of. Even the web interface for github is absolute trash which is barely usable compared to the perfection that is the UI that comes built-in to fossil.

But even with all the advantages of fossil, I really just cannot use it anymore, not being able to undo your last commit is just too much to deal with. I am a scatterbrain, I use source control to help me organize my work and not completely destroy my project as has happened before. But not being able to undo a mistake I made just seconds ago is bullshit.

Another nice thing about git which I have come to realize is actually a good feature is the stage. Fossil does not have such a system, commit is just a single step in which you specify what files you want to commit and it is done. I have often forget to commit files and other times accidentally committed everything even stuff that I did not want to. The stage is really helpful in making sure you commit exactly what you intended to commit, and also allowed you to do so in stages and not just a single command which you can easily fuckup.

So in summary, I am sad that I have to stop using fossil its almost perfect but its issues are to large to deal with. I shall now enter under the suffering and bondage of that horrible master that is git. Fuck git.





Posted in Uncategorized | Leave a comment

Intel is fucking shit, absolute worthless garbage, fuck them and their worthless cpus.

So I was working on some existing code written in assembly and self-modifying. I was trying to reduce the amount of self-modifying code by replacing some self-modified immediates with register values where registers where available to hold said values. To my surprise after changing a shift instruction with immediate to CL, the code got significantly slower.

Now it took me quite a while to actually discover this, I had changed the function somewhat to free up the CL register for the shift length, I at first thought it was the combination of all the changes that had added together to result in the observed performance drop but after more experimentation it became clear is was all due to the shift instruction.

WHAT THE FUCK, THE SHIFT INSTRUCTION IS FUCKING SLOW. After checking Agner Fog’s instruction table I confirmed it. On sandy bridge and later the shift instruction with length in CL takes 2 CLOCK CYCLES. What the fuck where they thinking crippling such a common instruction, this instruction is 1/4 the speed of Nehalem, this is slower than every single x86 cpu going back to the Pentium (except P4, but we don’t talk about that). Its slower then fucking Bulldozer.

Intel are fucking retards, they are incapable of making cpus with consistent and predictable performance. Why would they do this, all code written for previous sane cpus is now crippled on this fucking garbage. This is just one of the many layers of bullshit that is the Intel cpu. I won’t even start on the disaster that is AVX and the SSE mode switch issue, seriously WTF. I read that with their newer cpus its even worse, apparently the AVX penalty is now payed for every single SSE instruction executed. If the AVX-512 resisters are dirty then cpu permanently fucking downclocks because each SSE instruction causes an implicit merge back into the full register.

This bullshit has gone too far, I demand Intel be nationalized and shut down. All executives and upper management shall be flayed alive. And any engineers responsible for this retarded bullshit shall be burn at the stake. At they very least all their bullshit x86 patents should be invalidated so that some competent companies can step up and give us some cpus that arn’t fucking Shite. AMD is looking pretty good at the moment but it would be nice to have some competition.

Instruction sets should not be patent-able, nothing in the x86 instruction set is new or novel. In fact its quite the opposite, most of the instructions Intel have come up with recently have been fucking awful garbage. But since so much software is using this crap you have to live with it. I think the x86 went horribly wrong with most of SSE and its gotten so much worse since then.

On the subject of bad instruction set design, I think Intel’s first mistake was back in 1997 with the MMX instruction set. This had the potential to be incredible, we could have had general purpose 64bit integer registers. I have almost never needed to use vector processing, but having native support for 64bit integers would have been a massive improvement especially if multiply and divide instructions where provided. Aside from the lack of 64bit integer instruction MMX made another massive mistake, a mistake which in my opinion was the reason MMX failed to gain real any real traction, MMX trashes the fpu state and required a complete reset after use.

Imagine what could have been if MMX could coexist with x87 and not require that expensive EMMS reset instruction. The compiler would have casual access to 64bit registers, if MMX had included some basic 64bit instructions such as add, sub, shifts, mul and div. That would have significantly sped up much code, as it is using 64bit int on x86 is horrible, especially so if one requires multiply and divide. It would also have been possible to mix fpu and mmx simultaneously. It would have still been possible to share the same registers for both mmx and x87, just don’t mark the x87 registers as in-use when mmx is used on those registers.











Posted in Uncategorized | Leave a comment

The exe-modifier.

For many years I have worked on the insanity which is my exe-modifier. This insane piece of software is basically a linker which can insert objects into an existing executable. This allows one to make extensive changes to an existing binary and implement whole new functionality and features. This new code can be written in c/c++ and closely integrated into the existing code.

There is little documentation on how to use this software and only a handful of example projects, the source code includes two examples, neither of which use even a fraction of all the features this software has haphazardly grown over the years. If anyone actually has any interest in using this abomination I am willing to assist them into getting started with the garbage. I will write some docs in the rare case that anyone takes interest.

Example, the music calculator.
This is a fun example which reveals the power of the exe-modifier.
I have taken the standard windows calculator and modified it to implement a rotating sine-bow behind the buttons and to play music from an embedded module file.  The code for this example can be found in the exe-modifier source code as “example2”. I have always been exceedingly pleased with this example, as well as being a great example for the exe modifier, its also the only demo like thing I have ever made.


exe modifier release on github

The music featured in the example:
happiness_island.mod, by Bernard Sumner, 1993


Posted in Uncategorized | Leave a comment