avoid vague 'reasonably', mention memory corruption

[web.git] / personal / _posts / 2025-07-24-memory-safety.md
diff --git a/personal/_posts/2025-07-24-memory-safety.md b/personal/_posts/2025-07-24-memory-safety.md

index 26a2427754f44102766a41798208992c6c62424b..ad38e76b43a210ae775675161ffa6fe30fe3a1bb 100644 (file)
--- a/personal/_posts/2025-07-24-memory-safety.md
+++ b/personal/_posts/2025-07-24-memory-safety.md
@@ -78,7 +78,7 @@ Every time `repeat_swap` stores a new value in `globalVar`, it just does two sep
  In `repeat_get`, there's thus a small chance that when we read `globalVar` *in between* those two stores, we get a mix of a pointer to an `Int` with the vtable for a `Ptr`.
  When that happens, we will run the `Ptr` version of `get`, which will dereference the `Int`'s `val` field as a pointer -- and hence the program accesses address 42, and crashes.
  
  In `repeat_get`, there's thus a small chance that when we read `globalVar` *in between* those two stores, we get a mix of a pointer to an `Int` with the vtable for a `Ptr`.
  When that happens, we will run the `Ptr` version of `get`, which will dereference the `Int`'s `val` field as a pointer -- and hence the program accesses address 42, and crashes.
  
-One could construct a similar example using Go's slices, where the data pointer, length, and capacity of the slice are stored in separate words, and reading a half-updated value can lead to an out-of-bounds access.
+One could easily turn this example into a function that casts an integer to a pointer, and then cause arbitrary memory corruption.
  
  ## What about other languages?
  
  
  ## What about other languages?
  
@@ -92,22 +92,25 @@ In that sense, all Java programs are thread-safe.[^java-safe]
  [^java-safe]: Java programmers will sometimes use the terms "thread safe" and "memory safe" differently than C++ or Rust programmers would. From a Rust perspective, Java programs are memory- and thread-safe by construction. Java programmers take that so much for granted that they use the same term to refer to stronger properties, such as not having "unintended" data races or not having null pointer exceptions. However, such bugs cannot cause segfaults from invalid pointer uses, so these kinds of issues are qualitatively very different from the memory safety violation in my Go example. For the purpose of this blog post, I am using the low-level Rust and C++ meaning of these terms.
  
  Generally, there are two options a language can pursue to ensure that concurrency does not break basic invariants:
  [^java-safe]: Java programmers will sometimes use the terms "thread safe" and "memory safe" differently than C++ or Rust programmers would. From a Rust perspective, Java programs are memory- and thread-safe by construction. Java programmers take that so much for granted that they use the same term to refer to stronger properties, such as not having "unintended" data races or not having null pointer exceptions. However, such bugs cannot cause segfaults from invalid pointer uses, so these kinds of issues are qualitatively very different from the memory safety violation in my Go example. For the purpose of this blog post, I am using the low-level Rust and C++ meaning of these terms.
  
  Generally, there are two options a language can pursue to ensure that concurrency does not break basic invariants:
-- Ensure that arbitrary concurrent programs actually behave "reasonably" in some sense. This comes at a significant cost, restricting the language to never assume consistency of multi-word values and limiting which optimizations the compiler can perform. This is the route most languages take, from Java to C#, OCaml, JavaScript, and WebAssembly.
+- Ensure that arbitrary concurrent programs still uphold the typing discipline and key language invariants. This comes at a significant cost, restricting the language to never assume consistency of multi-word values and limiting which optimizations the compiler can perform. This is the route most languages take, from Java to C#, OCaml, JavaScript, and WebAssembly.[^multi-word]
  - Have a strong enough type system to fully rule out data races on most accesses, and pay the cost of having to safely deal with races for only a small subset of memory accesses. This is the approach that Rust first brought into practice, and that Swift is now also adopting with their ["strict concurrency"](https://developer.apple.com/documentation/swift/adoptingswift6).
  
  - Have a strong enough type system to fully rule out data races on most accesses, and pay the cost of having to safely deal with races for only a small subset of memory accesses. This is the approach that Rust first brought into practice, and that Swift is now also adopting with their ["strict concurrency"](https://developer.apple.com/documentation/swift/adoptingswift6).
  
+[^multi-word]: Some hardware supports larger-than-pointer-sized atomic accesses, which could be used to ensure consistency of multi-word values. However, Go slices are three pointers large, and as far as I know no hardware supports atomic accesses which are *that* big.
+
  Go, unfortunately, chose to do neither of these.
  This means it is, strictly speaking, not a memory safe language: the best the language can promise is that *if* a program has no data races (or more specifically, no data races on problematic values such as interfaces, slices, and maps), then its memory accesses will never go wrong.
  Now, to be fair, Go comes with out-of-the-box tooling to detect data races, which quickly finds the issue in my example.
  However, in a real program, that means you have to hope that your test suite covers all the situations your program might encounter in practice, which is *exactly* the sort of issue that a strong type system and static safety guarantees are intended to avoid.
  It is therefore not surprising that [data races are a huge problem in Go](https://arxiv.org/pdf/2204.00764),
  Go, unfortunately, chose to do neither of these.
  This means it is, strictly speaking, not a memory safe language: the best the language can promise is that *if* a program has no data races (or more specifically, no data races on problematic values such as interfaces, slices, and maps), then its memory accesses will never go wrong.
  Now, to be fair, Go comes with out-of-the-box tooling to detect data races, which quickly finds the issue in my example.
  However, in a real program, that means you have to hope that your test suite covers all the situations your program might encounter in practice, which is *exactly* the sort of issue that a strong type system and static safety guarantees are intended to avoid.
  It is therefore not surprising that [data races are a huge problem in Go](https://arxiv.org/pdf/2204.00764),
-and there is at least [anecdotal evidence of actual memory safety violations](https://www.reddit.com/r/rust/comments/wbejky/comment/ii9piqe).
-
-I could accept Go's choice as an engineering trade-off, aimed at keeping the language simpler.
-However, putting Go into the same bucket as languages that actually *did* go through the effort of solving the problem with data races misrepresents the safety promises of the language.
+and there is at least [anecdotal evidence of actual memory safety violations](https://old.reddit.com/r/rust/comments/wbejky/a_succinct_comparison_of_memory_safety_in_rust_c/iid990t/?context=2).
  Even experienced Go programmers do not always realize that you can break memory safety without using any unsafe operations or exploiting any compiler or language bugs.
  Go is a language *designed* for concurrent programming, so people do not expect footguns of this sort.
  I think that is a problematic blind spot.
  
  Even experienced Go programmers do not always realize that you can break memory safety without using any unsafe operations or exploiting any compiler or language bugs.
  Go is a language *designed* for concurrent programming, so people do not expect footguns of this sort.
  I think that is a problematic blind spot.
  
+Of course, as all things in language design, in the end this is a trade-off.
+Go made the simplest possible choice here, which is entirely in line with the general design of the language.
+There's nothing fundamentally wrong with that.
+However, putting Go into the [same bucket](https://www.memorysafety.org/docs/memory-safety/) as languages that actually *did* go through the effort of solving the problem with data races misrepresents the safety promises of the language.
  The [Go memory model documentation](https://go.dev/ref/mem) is not exactly upfront about this point either: the "Informal Overview" emphasizes that "most races have a limited number of outcomes" and remarks that Go is unlike "C and C++, where the meaning of any program with a race is entirely undefined".
  You could say that the use of "most" here is foreshadowing, but this section does not list any cases where the number of outcomes is unlimited, so this is easy to miss.
  They even go so far as to claim that Go is "more like Java or JavaScript", which I think is rather unfair, given the lengths to which those languages went to achieve the thread safety they have.
  The [Go memory model documentation](https://go.dev/ref/mem) is not exactly upfront about this point either: the "Informal Overview" emphasizes that "most races have a limited number of outcomes" and remarks that Go is unlike "C and C++, where the meaning of any program with a race is entirely undefined".
  You could say that the use of "most" here is foreshadowing, but this section does not list any cases where the number of outcomes is unlimited, so this is easy to miss.
  They even go so far as to claim that Go is "more like Java or JavaScript", which I think is rather unfair, given the lengths to which those languages went to achieve the thread safety they have.