“Tear”able Puns, and Worse Ideas: A Minimally Thread-Safe Cell

5 minute read

If you’ve ever thought “Using a Mutex<[f32; 3]> is pointless, what’s the worst thing that could happen here?”, I’ve written the Cell type that empowers you to find out the answer to that question yourself.

tearor::TearCell is a “thread safe” cell which downgrades data races into data corruption. Among other things, it’s an attempt to answer the question “How little can I care about thread safety while still calling my code thread-safe?”.

It does this by interpreting the data inside it’s inner UnsafeCell as a &[AtomicU8], and performing all operations on that instead of the actual value. That is, it performs multiple tearing reads/writes, each one itself is thread-safe.

This is pretty dodgy, and can go wrong all kinds of ways, but is still technically thread safe, as there are not really any data races happening. (It only looks like these are data races officer, the operations are perfectly well-sequenced, I swear)

Anyway, code implementing the core idea (stripped of optimizations and other complications) is below, I’ll go over it step-by-step, because it’s small, and I still haven’t decided what level of detail to go into in a post like this.

The implementation

Note that the real code is a bit more robust and complex, and will copy larger structures using larger atomic operations than u8 if they’re sufficiently aligned. (However, note that this means we really do need the full breadth of the restrictions imposed by Pod, as we’ll get to later)

So, in an ideal world we’d declare this as something like:

#[repr(transparent)]
pub struct TearCell<T> {
    buffer: [AtomicU8; size_of::<T>()],
    _boo: PhantomData<T>,
}

This would actually let us implement everything using safe code (aside from some operations provided by bytemuck, which use unsafe internally but are easily proven sound).

However, as it is, you’re not currently allowed to use size_of in this manner, and so we actually implement things by converting a &core::cell::UnsafeCell<T> to a &[AtomicU8] on every access.

use bytemuck::Pod;
use core::sync::atomic::{AtomicU8, Ordering};

#[repr(transparent)]
pub struct TearCell<T>(core::cell::UnsafeCell<T>);

impl<T: Pod> TearCell<T> {
    /// All (potentially) shared access to `self.0` goes through this function.
    #[inline]
    fn atom_slice(&self) -> &[AtomicU8] {
        // Note: the real code handles ZSTs, and avoids ever calling
        // (the equivalent to) this function for them.
        assert!(core::mem::size_of::<T>() != 0);
        // AtomicU8 isn't Copy, and thus bytemuck can't do this for us.
        // This is clearly sound though, as we never view the data
        // inside the UnsafeCell as anything other than `&[AtomicU8]`.
        unsafe {
            core::slice::from_raw_parts(
                self.0.get() as *const AtomicU8,
                core::mem::size_of::<T>()
            )
        }
    }
}

I’m using bytemuck because it meets the requirements of: being simple, available on play.rust-lang.org, and doing what I need. I also have contributed enough to it that it no longer sets off my internal NIH siren.

All operations are bounded on bytemuck::Pod. This is basically a much stronger cousin of Copy. Not only are all Pod types Copy, but (among other things) must:

have no padding bytes (no (u8, u16), for example).
have no uninitialized bytes (no internal use of MaybeUninit).
have no invalid bitpatterns (no bool, char, or &T for example).

Further, they also must be repr(C) or repr(transparent), as repr(Rust) structs can’t actually guarantee all these things.

(There are some other requirements too, feel free to read the docs if you’re interested. Also note that there’s effort to some standardize similar marker traits, but at the time of writing this, that has not gotten far enough to replace bytemuck here).

Strictly speaking, in the implementation provided here, I believe we almost could use a more relaxed bound than Pod. I believe internal padding bytes and other uninitialized fields, and repr(Rust) structs would be fine for us.

While we would be reading from an uninitialized byte sometimes, we’d then be writing it to one that’s expected to be uninitialized. The current language of the unsafe-code-guidelines seems to allow this implies that’s true, but I’m not fully certain. That said, it’s dodgy either way… and requiring Pod makes it easier to verify as sound.

Reading

Alright, now for load:

impl<T: Pod> TearCell<T> {
    #[inline]
    pub fn load(&self) -> T {
        // Note: Using MaybeUninit here would be better, but `Pod` gives us
        // access to `zeroed`, and `MaybeUninit` is still kind of a pain to
        // use for arrays/slices until more things stabilize.
        let mut dst = T::zeroed();
        let dst_bytes: &mut [u8] = bytemuck::bytes_of_mut(&mut dst);
        let src: &[AtomicU8] = self.atom_slice();
        // This can never fail, and makes `zip` produce better code...
        assert_eq!(dst_bytes.len(), src.len());

        for (d, s) in dst_bytes.iter_mut().zip(src.iter()) {
            *d = s.load(Ordering::Relaxed);
        }

        dst
    }
}

This is mostly what you’d expect:

Create destination T we’ll ultimately return,
Get it’s bytes as a &mut [u8].
Get the source bytes using atom_slice as shown above.
Iterate over both source slices, initializing each destination byte using a byte read from an atomic loads from the source byte.

One point to note is that the Ordering we use doesn’t matter, since fundamentally all access to TearCell is nonatomic, so we use the weakest we can.

Writing

All that’s left is store:

impl<T: Pod> TearCell<T> {
    #[inline]
    pub fn store(&self, value: T) {
        let src: &[u8] = bytemuck::bytes_of(&value);
        let dst: &[AtomicU8] = self.atom_slice();

        assert_eq!(dst_bytes.len(), src.len());

        for (d, s) in dst.iter().zip(src.iter()) {
            d.store(*s, Ordering::Relaxed);
        }
    }
}

This is essentially the opposite of load.

Get a slice containing the bytes of the provided value.
Get our atom_slice.
Copy the data into the atoms, using Relaxed stores (see above for why Relaxed is used).

Note that the operation bytemuck::bytes_of does here is essentially equivalent to what you see in the atom_slice above. (If it’s easier to follow I can avoid leveraging libraries in the future, let me know).

Wrapping up

And we’re done!

I don’t really think this will be useful for anybody, and it’s pretty dodgy code, but it’s certainly… interesting.

The real crate is more efficient than this, and slightly less dubious, but is essentially the same idea.

If you end up using a TearCell for anything (and you should really make sure you’re fully aware of why it might be a mistake…), then I’d be interested in your use case!

Anyway, it’s on crates.io, and github.

Twitter Facebook LinkedIn

Thom Chiovoloni

“Tear”able Puns, and Worse Ideas: A Minimally Thread-Safe Cell

The implementation

Reading

Writing

Wrapping up

You May Also Enjoy

I wish `#[non_exhaustive]` worked with struct update syntax

Why is there no realloc that takes the number of bytes to copy?

Allocator trait 1: Let’s talk about the Allocator trait

Miri Feature Wishlist

Thom Chiovoloni

The implementation

Reading

Writing

Wrapping up

You May Also Enjoy

I wish #[non_exhaustive] worked with struct update syntax

Why is there no realloc that takes the number of bytes to copy?

Allocator trait 1: Let’s talk about the Allocator trait

Miri Feature Wishlist

I wish `#[non_exhaustive]` worked with struct update syntax