Surelock: Deadlock-Free Mutexes for Rust (notes.brooklynzelenka.com)

by codetheweb 84 comments 248 points
Read article View on HN

84 comments

[−] jcalvinowens 34d ago
The Level<> abstraction is a really neat way to have your cake and eat it too: you only need a consistent arbitrary order to avoid deadlocks, but the order can have performance consequences when some locks are more coarse than others.

But the example seems backwards to me: unless every callsite that locks any item always locks the big global lock first (probably not true, because if you serialize all item access on a global lock then a per-item lock serves no purpose...), aren't you begging for priority inversions by acquiring the big global lock before you acquire the item lock?

My only gripe is missing the obvious opportunity for Ferengi memes ("rules of acquisition") :D :D

[−] vlovich123 34d ago
There’s no global lock. There’s a linear MutexKey that a lock of Level >= N has to be acquired with. Aquiring it consumes MutexKey and hands you back MutexKey where Level is the N of the level you’re locking.

There’s no priority inversion possible because locks can only ever be held in decreasing orders of priority - you can’t acquire a low priority lock and then a high priority lock since your remaining MutexKey won’t have the right level.

[−] jcalvinowens 34d ago
In the example it seems pretty clear to me that:

    Mutex::new(AppConfig::default());
...is meant to be acquiring a mutex protecting some global config object, yes? That's what I'm calling a "global lock".

> There’s no priority inversion possible because locks can only ever be held in decreasing orders of priority

    T1               T2
    --               --
    small_lock();
                     big_lock();
                     small_lock(); <--- Spins waiting for T1
                 
...and now any other thread that needs big_lock() spins waiting for T2 to release it, but T2 is spinning waiting for T1 to release the (presumably less critical) small lock.

If small_lock is never ever acquired without acquiring big_lock first, small_lock serves no purpose and should be deleted from the program.

[−] vlovich123 34d ago
Mutex::new creates a lock, it doesn’t acquire one.

Look at the API - if big_lock and small_lock are at the same level, you would need to acquire the lock simultaneously for both locks which is accomplished within the library by sorting* the locks and then acquiring. If you fail to acquire small_lock, big lock isn’t held (it’s an all or nothing situation). This exact scenario is explained in the link by the way. You can’t bypass the “acquire simultaneously” api because you only have a key for one level

Your terminology is also off. A lock around a configuration is typically called a fine grained lock unless you’re holding that lock for large swathes of program. Global as it refers to locking doesn’t refer to visibility of the lock or that it does mutual exclusion. For example, a lock on a database that only allows one thread into a hot path operation at a time is a global lock.

* sorting is done based on global construction order grabbed at construction - there’s a singleton atomic that hands out IDs for each mutex.

[−] jcalvinowens 34d ago
No, the entire point of what I was saying is that big_lock and little_lock are at two different levels.
[−] vlovich123 33d ago
If big lock and little lock are at different levels you won’t have a key at the appropriate level to create an inversion by trying to acquire in the first place.

T2 might “spin” waiting for small lock but assuming small lock is released at some point you’ve not got a deadlock (and by construction it’s impossible for small lock to have it’s release blocked on the acquisition of a lock that depends on big_lock).

That’s the whole point of having a level to the locks and to the key that you have to give up to acquire that lock.

Your terminology is also off. Mutexes are not implemented through spin locks. It’s an atomic operation and when lock acquisition fails you call futex_lock (or whatever your OS api is) to have the thread be put to sleep until the lock is acquired.

[−] yencabulator 31d ago
I think what they're trying to say is that sure it's deadlock-free but it might be sacrificing performance.

T2 sits there waiting for small_lock to be available while holding big_lock for a long time.

This bit:

> ...and now any other thread that needs big_lock() spins waiting for T2 to release it, but T2 is spinning waiting for T1 to release the (presumably less critical) small lock.

Which of course leads to conversations like can big_lock be an RWLock, ArcSwap or such.

[−] vlovich123 26d ago
I can’t tell what they’re trying to say and it seemed to primarily be about priority inversion which is precisely impossible in the scheme outlined. This isn’t sacrificing any performance vs any other locking mechanism.

> Which of course leads to conversations like can big_lock be an RWLock, ArcSwap or such.

I’m not sure what you’re trying to say. This blog post is about a mutex type that is guaranteed to not dead lock.

And again, OP is horribly wrong on the terminology - there’s no spinning in any sane system. You ask the kernel to acquire the mutex for you if you fail and the kernel just puts your thread to sleep until the lock can be acquired. So all threads are guaranteed to be making forward progress. The ideal granularity of the locks themselves is completely irrelevant - that’s a domain-specific decision.

[−] jcalvinowens 33d ago
This reply is word salad that completely fails to engage with anything I've actually said to you... please don't waste my time with more LLM generated comments.
[−] vlovich123 33d ago
None of it is LLM generated. You seem to fundamentally not understand how the system outlined in the blog post works and how it prevents deadlocks by construction (ie it’s impossible to write any program that deadlocks if the only mutexes used are from this library). You also seem to lack the appropriate terminology to describe what your concern is and use terminology in a way that belies either ignorance or fundamental misunderstanding of what words mean. So you lash out claiming my 100% human written comment is LLM as a way to distract from said ignorance.

I’ve tried to illuminate your ignorance for you but unfortunately I can’t do your thinking for you.

What I can do is recommend you try to write out the scenario you believe can create a deadlock and maybe then you’ll understand why it’s not possible and maybe my words will make a little bit more sense. If alternatively you succeed you can open an issue on the author’s open source library and create a blog post explaining their mistake. But until then you’re just unhappy you don’t understand and aren’t doing any being willful to remain uninformed.

[−] bonzini 34d ago
Usually a global lock is a lock that is taken outside all others and is taken for large parts of the runtime (or even, everywhere the thread isn't waiting on a condition variable, file descriptor and the like).

Mutex::new(AppConfig::default()) might very well be a small, leaf mutex.

[−] cryptonector 33d ago

> In the example it seems pretty clear to me that:

> Mutex::new(AppConfig::default());

> ...is meant to be acquiring a mutex protecting some global config object, yes? That's what I'm calling a "global lock".

You could certainly have a global lock at the top-most level, but you're not required to. The example is just an example.

[−] gpm 34d ago

> unless every callsite that locks any item always locks the big global lock first (probably not true, because if you serialize all item access on a global lock then a per-item lock serves no purpose...)

A pattern I've definitely both seen and used is

    let guard1 = datastructure_containing_the_whole_world.lock();
    let guard2 = guard1.subset_of_that_datastructure.lock();
    guard1.unlock();
    // Do expensive work
    guard2.unlock();
Which works to parallelize work so long as guard2 isn't contended... and at least ensures correctness and forward progress the rest of the time.
[−] jcalvinowens 33d ago
Agreed! But if you can reverse the acquire order, you can structure the function scopes such that you don't need the explicit unlock() calls, which is a bit nicer IMHO.
[−] EffCompute 34d ago
I really agree with jandrewrogers' point about the insularity of the database domain. While working on a custom C++ engine to handle 10M vectors in minimal RAM, I’ve noticed that many 'mainstream' concurrency patterns simply don't scale when cache-locality is your primary bottleneck.

In the DB world, we often trade complex locking for deterministic ordering or latch-free structures, but translating those to general-purpose app code (like what this Rust crate tries to do) is where the friction happens. It’s great to see more 'DB-style' rigour (like total ordering for locks) making its way into library design.

[−] senderista 33d ago
An example of this is Linux adopting wait-die locks:

https://docs.kernel.org/locking/ww-mutex-design.html

[−] vlovich123 34d ago
I feel like Fuschia’s DAG approach can still be made compile time lock free by either disallowing holding locks from different branches or requiring an ordering when that does happen to prevent cycles (ie you can’t acquire them independently, you have to acquire all independent branches as a single group.
[−] eru 34d ago
I agree with the author: it's a shame that TVars aren't catching on in more languages. They are a great idea from the database world, that we could use in the rest of computing, too.
[−] embedding-shape 34d ago
The entire programming (or even computing) ecosystem suffers from this issue where very useful ideas don't always propagate across domains even though they just make a whole lot of sense. I'm not sure if it's because they truly wouldn't work out in practice, or if it's just a discovery/communication thing.

One thing that I think do affect things, is that language design discussions tend to be concentrated into their own communities based on the programming language itself, rather than one "programming language discussions" place where everyone can easier cross-pollinate ideas across languages. Luckily, there are some individuals who move between communities without effort, which does lead to a bit of ideas making it across, but it feels like we're missing out on so much evolution and ideas from various languages across the ecosystem.

[−] eru 34d ago

> Luckily, there are some individuals who move between communities without effort, [...]

Oh, many of these travelers spend a lot of effort!

[−] 01HNNWZ0MV43FF 34d ago
It's discovery and communication. Public education for adults is way under-appreciated in many many scopes.
[−] jandrewrogers 34d ago
The cross-fertilization of ideas across computer science domains is more limited than I think people assume. Databases are just one area that contains a lot of good ideas that never seem to leak into other parts of the software world.

Supercomputing is another domain that has deep insights into scalable systems that is famously so insular that ideas rarely cross over into mainstream scalable systems. My detour through supercomputing probably added as much to my database design knowledge as anything I actually did in databases.

[−] ehtbanton 32d ago
I've had this thought myself too. Going off on a slight tangent: I think there's also loads of useful stuff in domains like either of these which maps amazingly well to AI agent system design, but there's such a huge discrepancy between the knowledge bases of the fields that no benefit ever really surfaces.

(Speaking from the perspective of someone who simultaneously loves high-performance compute and agentic AI haha)

[−] twoodfin 34d ago
The canonical industrial explanation “why not” is probably this 2010 piece from Joe Duffy @ Microsoft:

http://joeduffyblog.com/2010/01/03/a-brief-retrospective-on-...

[−] vlovich123 34d ago
I don’t think we read the same thing.

> Models can be pulled along other axes, however, such as whether memory locations must be tagged in order to be used in a transaction or not, etc. Haskell requires this tagging (via TVars) so that side-effects are evident in the type system as with any other kind of monad. We quickly settled on unbounded transactions.

Snip

> In hindsight, this was a critical decision that had far-reaching implications. And to be honest, I now frequently doubt that it was the right call. We had our hearts in the right places, and the entire industry was trekking down the same path at the same time (with the notable exception of Haskell)

So basically not that TM isn’t workable, but unbounded TM is likely a fool’s errand but Haskell’s is bounded TM that requires explicit annotation of memory that will participate in atomicity.

[−] senderista 34d ago
Having worked a bit on a hobby STM in C++ (spun out of a DB startup) I would have to agree. Fully transparent STM that depends on a "sufficiently smart compiler" for an imperative language with unrestricted side effects is hopeless. But I do think that a much humbler version of STM is feasible for C++ or Rust, requiring much more explicit cooperation from the programmer. I haven't worked on this for 3 years but hope to revisit it someday.
[−] vlovich123 33d ago
Haskell still needs TVar and it’s not an imperative language with unrestricted side effects. I think it’s bounded vs unbounded. Side effects make it more complicated perhaps but it sounds like even in a JIT language you could have done it.
[−] senderista 33d ago
It's possible (I've done it) in C++ but there are huge footguns that I'm still ambivalent about. I agree that the bounded/unbounded distinction is the key: data must explicitly be tagged as transactional. One of the benefits of bootstrapping an STM from an existing DB as I did is that this explicitness is already present.
[−] mrkeen 33d ago

> unbounded TM is likely a fool’s errand

It's the whole language, not just the TM code. Other languages have no way of opting out of the TM code, whereas Haskell does.

[−] eru 33d ago
Well, in a sense other languages opt-in to TM code by interfacing with a relational database. Similar to how in Haskell you put TVar in front of the relevant bits to opt-in.

In Haskell it's just more economic and better integrated with the rest of the language and its typesystem.

[−] imtringued 33d ago
I think it is pretty sad how there are so many modern programming languages coming out that fail to actually do something novel.

The primary advantage of a new programming language is that there is no legacy code to be compatible with. The software ergonomics space has been thoroughly explored, but restricting programs to subsets with useful properties is still an untapped field. It's seen that way because professional programmers are not used to dealing with restrictions to expressive freedom. New languages are supposed to increase freedom.

The vast majority of new programming languages would benefit from the main language being as restricted as possible and then require you to opt into the features selectively.

[−] eru 32d ago

> The primary advantage of a new programming language is that there is no legacy code to be compatible with.

Depends a lot on what you are doing. Clojure and Scala were new languages, but they had a lot of legacy code to stay compatible with. C++ falls in a similar camp.

[−] senderista 34d ago
Intel, MSFT, IBM spent billions from about 2005-2015 trying to make this happen and failed miserably.

https://dl.acm.org/doi/10.1145/1400214.1400228

[−] hackingonempty 34d ago
It is a big reason why I picked Scala3/Zio over Rust for my most recent project.
[−] mamcx 34d ago
Well, what means to support, truly, TVars?

Is easy, or hard?

Demand a new paradigm at large, or is only a inconvenience in the few places is used?

Because if the answer is "turns the language into Haskell" then is a big NOPE!

[−] cptroot 34d ago
I appreciate that this appears to be an incremental improvement on Fuschia's tree_lock, with the sharp edges sanded off. Good work! I hope I won't have to use it :p
[−] lifis 34d ago
I can't understand why address instability is a problem: if a Mutex is moved, then it can't be locked (because you need to hold a borrow while locked, which impedes moving), so using addresses is perfectly fine and there is absolutely no need to use IDs.

Also the fact that it doesn't detect locking the same mutex twice makes no sense: a static order obviously detects that and when locking multiple mutexes at the same level all you need to do is check for equal consecutive addresses after sorting, which is trivial.

Overall it seems like the authors are weirdly both quite competent and very incompetent. This is typical of LLMs, but it doesn't seem ZlLM-made.

[−] accelbred 34d ago
Most of the deadlocks I've faced are with different proccesses/devices both waiting on reads from each end of a socket/uart/etc. I've taken to putting timeouts on read calls, though then you have to deal with legitimate long request cycles timing out.
[−] forrestthewoods 34d ago
Hrm. I'm not immediately impressed by the "Level<>" construct. That feels like a lot of new cognitive burden. It's also not at all obvious to me that multiple levels of mutex is a common pattern? I'm not sure I've ever encountered a situation where locking Account also and always requires locking Config? Heaven help you if you have 3 or more levels.

I dunno. I appreciate the opposition to "just be careful". But this feels to me like it's inducing bad design patterns. So it feels like it's wandering down the wrong path.

[−] Groxx 34d ago

>

Why a Total Order, Not a DAG?

>This is a deliberate design decision. lock_tree uses a DAG, which lets you declare that branches A and B are independent — neither needs to come before the other. Sounds great, but it has a subtle problem: if thread 1 acquires A then B, and thread 2 acquires B then A, and both orderings are valid in the DAG, you have a deadlock that the compiler happily approved.

Would it be possible to build one at compile time? Static levels seem like they won't let you share code without level-collaboration, so that might be kinda important for larger-scale use.

I don't know enough about Rust's type system to know if that's possible though. Feels like it's pushing into "maybe" territory, like maybe not with just linear types but what about proc macros?

I can definitely see why it's easier to build this way though, and for some contexts that limitation seems entirely fine. Neat library, and nice post :)

[−] electromech 34d ago
I'm intrigued! I was fighting deadlocks in some Java code this week, and I'm working on a Rust project to maybe replace some of that.

One thing I didn't see in the post or the repo: does this work with async code?

I couldn't find the "search" button on Codeberg, and tests/integration.rs didn't have any async.

For embedded, I have had my eye on https://github.com/embassy-rs/embassy (which has an async runtime for embedded) and would love a nice locking crate to go with it.

[−] FpUser 33d ago
I must've been a lucky one. I develop software since 80s. Went from directly entering machine codes and up to enterprise middleware, backends and various device control and multimedia game like systems. In all my life I've only had a single case of deadlock. But it cost me more than 24 hours no sleep marathon trying to nail it down. It was related to communication between my custom Directshow filters and threads in a main software.
[−] avadodin 33d ago
I'm not a rust expert, but couldn't this be solved like memory safety? A few checks here and there.
[−] 0x1ceb00da 34d ago
What is the "graph" view on the right side?
[−] rowanG077 34d ago
That's pretty awesome. Dead locks are extremely tough to debug. There are even cases where I saw behavior in code that might have been a dead lock. I never found out though.
[−] airstrike 34d ago
[flagged]