After some time running with win2k3 on the i7-4790k, I finally noticed that threads were not being correctly scheduled for hyper-threading. This caused a major performance hit when running 4 threads since the OS was making no effort to keep them on separate cores.
After a great deal of research and experimentation, I finally utilized the CPU enumeration API (which I had not previously known to exist) and discovered that the OS detected the CPU as a “single core with 8 HT units instead of 4 cores each with 2 HT units” (such insanity). On finding this thread it became obvious that the problem was cpuid related. It seems that Intel in their infinite wisdom decided to change the meaning of one of the fields in the cpuid data.
The only solution that could be determined was to patch the Kernel to force a topology of two logical threads per core. ntkrnlpa.exe was disassembled and all occurrences of the cpuid instruction were located, thank god for IDA-pro without it I would be fucked. A single function was located which retrieved the cpuid data, that function was replaced and the relevant field was overridden with the desired value.
extern "C" int __stdcall CPUID(int a1, int* a2, int* a3, int* a4, int* a5) { int EAX, EBX, ECX, EDX; asm("cpuid" : "=a"(EAX), "=b"(EBX), "=c"(ECX), "=d"(EDX) : "a"(a1)); if(a1 == 1) { EBX = (EBX & 0xFF00FFFF) | 0x00020000; } *a2 = EAX; *a3 = EBX; *a4 = ECX; *a5 = EDX; }
This patch was applied using my exe modifier utility, and then the original Kernel was replaced with my patched version. On reboot the machine still worked and when running the CPU enumeration again it now correctly detected the 4 cores with two threads each. When testing with 4 threads the correct scheduling was observed, each core was now given a single thread consistently. ntoskrnl_HT_FIX.rar