X-Git-Url: https://git.ralfj.de/web.git/blobdiff_plain/0c29fd73b78684e99915dd47718c5475ff197f09..f97f7a70b9ff9851a1ea4b22581e565a5b2b06a7:/personal/_posts/2025-07-24-memory-safety.md diff --git a/personal/_posts/2025-07-24-memory-safety.md b/personal/_posts/2025-07-24-memory-safety.md index 3c1e48a..26a2427 100644 --- a/personal/_posts/2025-07-24-memory-safety.md +++ b/personal/_posts/2025-07-24-memory-safety.md @@ -89,7 +89,7 @@ They even developed the [first industrially deployed concurrency memory model](h The result of all of this work is that in a concurrent Java program, you might see unexpected outdated values for certain variables, such as a null pointer where you expected the reference to be properly initialized, but you will *never* be able to actually break the language and dereference an invalid dangling pointer and segfault at address `0x2a`. In that sense, all Java programs are thread-safe.[^java-safe] -[^java-safe]: Java programmers will sometimes use the terms "thread safe" and "memory safe" differently than C++ or Rust programmers would. From a Rust perspective, Java programs are memory- and thread-safe by construction. Java programmers take that so much for granted that they use the same term to refer to stronger properties, such as absence of null pointer exception or absence of unintended data races. However, Java null pointer exceptions and data races cannot cause segfaults from invalid pointer uses, so these kinds of issues are qualitatively very different from the memory safety violation in my Go example. For the purpose of this blog post, I am using the low-level Rust and C++ meaning of these terms. +[^java-safe]: Java programmers will sometimes use the terms "thread safe" and "memory safe" differently than C++ or Rust programmers would. From a Rust perspective, Java programs are memory- and thread-safe by construction. Java programmers take that so much for granted that they use the same term to refer to stronger properties, such as not having "unintended" data races or not having null pointer exceptions. However, such bugs cannot cause segfaults from invalid pointer uses, so these kinds of issues are qualitatively very different from the memory safety violation in my Go example. For the purpose of this blog post, I am using the low-level Rust and C++ meaning of these terms. Generally, there are two options a language can pursue to ensure that concurrency does not break basic invariants: - Ensure that arbitrary concurrent programs actually behave "reasonably" in some sense. This comes at a significant cost, restricting the language to never assume consistency of multi-word values and limiting which optimizations the compiler can perform. This is the route most languages take, from Java to C#, OCaml, JavaScript, and WebAssembly. @@ -99,11 +99,13 @@ Go, unfortunately, chose to do neither of these. This means it is, strictly speaking, not a memory safe language: the best the language can promise is that *if* a program has no data races (or more specifically, no data races on problematic values such as interfaces, slices, and maps), then its memory accesses will never go wrong. Now, to be fair, Go comes with out-of-the-box tooling to detect data races, which quickly finds the issue in my example. However, in a real program, that means you have to hope that your test suite covers all the situations your program might encounter in practice, which is *exactly* the sort of issue that a strong type system and static safety guarantees are intended to avoid. -It is therefore not surprising that [data races are a huge problem in Go](https://dl.acm.org/doi/pdf/10.1145/3519939.3523720). +It is therefore not surprising that [data races are a huge problem in Go](https://arxiv.org/pdf/2204.00764), +and there is at least [anecdotal evidence of actual memory safety violations](https://www.reddit.com/r/rust/comments/wbejky/comment/ii9piqe). I could accept Go's choice as an engineering trade-off, aimed at keeping the language simpler. However, putting Go into the same bucket as languages that actually *did* go through the effort of solving the problem with data races misrepresents the safety promises of the language. Even experienced Go programmers do not always realize that you can break memory safety without using any unsafe operations or exploiting any compiler or language bugs. +Go is a language *designed* for concurrent programming, so people do not expect footguns of this sort. I think that is a problematic blind spot. The [Go memory model documentation](https://go.dev/ref/mem) is not exactly upfront about this point either: the "Informal Overview" emphasizes that "most races have a limited number of outcomes" and remarks that Go is unlike "C and C++, where the meaning of any program with a race is entirely undefined". @@ -126,6 +128,7 @@ There's no meaningful sense in which this can be further subdivided into memory In practice, of course, safety is not binary, it is a spectrum, and on that spectrum Go is much closer to a typical safe language than to C. It is plausible that UB caused by data races is less useful for attackers than UB caused by direct out-of-bounds or use-after-free accesses. But at the same time I think it is important to understand which safety guarantees a language reliably provides, and where the fuzzy area of trade-offs begins. +I am in the business of [*proving*](https://research.ralfj.de/thesis.html) safety claims of languages, and for Go, there's not much one could actually prove. I hope this post helps you to better understand some the non-trivial consequences of the choices different languages have made.[^go] :) [^go]: In case you are wondering why I am focusing on Go so much here... well, I simply do not know of any other language that claims to be memory safe, but where memory safety can be violated with data races. I originally wanted to write this blog post years ago, when Swift was pretty much in the same camp as Go in this regard, but Swift has meanwhile introduced "strict concurrency" and joined Rust in the small club of languages that use fancy type system techniques to deal with concurrency issues. That's awesome! Unfortunately for Go, that means it is the only language left that I can use to make my point here. This post is not meant to bash Go, but it is meant to put a little-known weakness of the language into the spotlight, because I think it is an instructive weakness.