@rom1v Very nice writeup !

I think your benchmark is flawed because your CPU doesn’t really have 8 physical cores but only 4 with 2 threads each. That’s why you stay close to a 4x speedup.

