Torvalds blasts AWS Engineer’s Linux Kernel patch as “Beyond Stupid” for flushing L1d On Context Switches, reverts code back
It seldom happens that a code that made to the final beta release is trashed but who can argue with Linux Torvalds when it is Linux we are talking about. This happened with the Linux Kernel which was to be released as a part Linux 5.8. The culprit a patch from AWS engineer Balbir Singh that was to provide “an opt-in (prctl driven) mechanism to flush the L1D cache on context switch. The patch was supposed to be included in the Linux 5.8 but Linux Torvalds put his foot down and removed it.
Linux not only removed it but also added that the idea of including this patch was beyond stupid. Here is his commentary he just posted to the kernel mailing list from this change in the x86/mm PR:
Am I mis-reading this?
Because it looks to me like this basically exports cache flushing instructions to user space, and gives processes a way to just say “slow down anybody else I schedule with too”.
I don’t see a way for a system admin to say “this is stupid, don’t do it”.
In other words, from what I can tell, this takes the crazy “Intel ships buggy CPU’s and it causes problems for virtualization” code (which I didn’t much care about), and turns it into “anybody can opt in to this disease, and now it affects even people and CPU’s that don’t need it and configurations where it’s completely pointless”.
To make matters worse, it has that SW flushing fallback that isn’t even architectural from what I remember of the last time it was discussed, but most certainly will waste a lot of time going through the motions that may or may not flush the L1D after all.
I don’t want some application to go “Oh, I’m _soo_ special and pretty and such a delicate flower, that I want to flush the L1D on every task switch, regardless of what CPU I am on, and regardless of whether there are errata or not”.
Because that app isn’t just slowing down itself, it’s slowing down others too.
I have a hard time following whether this might all end up being predicated on the STIBP static branch conditionals and might thus at least be limited only to CPU’s that have the problem in the first place.
But I ended up unpulling it because I can’t figure that out, and the explanations in the commits don’t clarify (and do imply that it’s regardless of any other errata, since it’s for “undiscovered future errata”).
Because I don’t want a random “I can make the kernel do stupid things” flag for people to opt into. I think it needs a double opt-in.
At a _minimum_, SMT being enabled should disable this kind of crazy pseudo-security entirely, since it is completely pointless in that situation. Scheduling simply isn’t a synchronization point with SMT on, so saying “sure, I’ll flush the L1 at context switch” is beyond stupid.
I do not want the kernel to do things that seem to be “beyond stupid”.
Because I really think this is just PR and pseudo-security, and I think there’s a real cost in making people think “oh, I’m so special that I should enable this”.
I’m more than happy to be educated on why I’m wrong, but for now I’m unpulling it for lack of data.
Maybe it never happens on SMT because of all those subtle static branch rules, but I’d really like to that to be explained.
The patch made by Balbir Singh was supposed to fix snoop-assisted L1 Data Sampling vulnerability. It allows a potential hacker to install malware that can infer data via inspecting the cache. The patch offered an opt-in via new prctl options and was not enabled by default. The patch could have helped those concerned about snoop assisted data sampling vulnerabilities or cache leakage via side channels and yet to be uncovered CPU vulnerabilities. But for time being, Linux creator Linus Torvalds is not convinced about the use of this patch.
Singh can take comfort from the fact that Linux just made him world-famous. Also, the patch may still make it to the Linux Kernel in the future.