In other words, rule 3 (the inline asm block must refine the story code) is violated.
The principled fix for this is to extend the C++ memory model (which is shared by Rust) with a notion of non-temporal stores so that one can reason about how they interact with everything else that can go on in a concurrent program.
-I am not aware of anyone having actually done this---extending or even just slightly adjusting the C++ memory model is an [enormous undertaking](https://plv.mpi-sws.org/scfix/paper.pdf).
+This is [possible](https://people.mpi-sws.org/~viktor/papers/oopsla2024-inline.pdf), but it requires re-proving compiler correctness results, and at least the approach taken in that paper is architecture-specific which does not scale to the large number of architectures Rust supports.
However, there is a simpler alternative: we can try coming up with a more complicated story such that rule 3 is not violated.
This is exactly what a bunch of folks did when the issue around non-temporal stores was discovered.
The story says that doing a non-temporal store corresponds to *spawning a thread* that will asynchronously perform the actual store,