“Tear”able Puns, and Worse Ideas: A Minimally Thread-Safe Cell
If you’ve ever thought “Using a Mutex<[f32; 3]>
is pointless, what’s the worst
thing that could happen here?”, I’ve written the Cell
type that empowers you
to find out the answer to that question yourself.
tearor::TearCell
is a “thread safe” cell
which downgrades data races into data corruption. Among other things, it’s an
attempt to answer the question “How little can I care about thread safety while
still calling my code thread-safe?”.
It does this by interpreting the data inside it’s inner UnsafeCell
as a
&[AtomicU8]
, and performing all operations on that instead of the actual
value. That is, it performs multiple tearing reads/writes, each one itself is
thread-safe.
This is pretty dodgy, and can go wrong all kinds of ways, but is still technically thread safe, as there are not really any data races happening. (It only looks like these are data races officer, the operations are perfectly well-sequenced, I swear)
Anyway, code implementing the core idea (stripped of optimizations and other complications) is below, I’ll go over it step-by-step, because it’s small, and I still haven’t decided what level of detail to go into in a post like this.
The implementation
Note that the real code is a bit more robust and complex, and will copy larger
structures using larger atomic operations than u8
if they’re sufficiently
aligned. (However, note that this means we really do need the full breadth of
the restrictions imposed by Pod
, as we’ll get to later)
So, in an ideal world we’d declare this as something like:
#[repr(transparent)]
pub struct TearCell<T> {
buffer: [AtomicU8; size_of::<T>()],
_boo: PhantomData<T>,
}
This would actually let us implement everything using safe code (aside from some
operations provided by bytemuck
, which use unsafe internally but are easily
proven sound).
However, as it is, you’re not currently allowed to use size_of
in this manner,
and so we actually implement things by converting a &core::cell::UnsafeCell<T>
to a &[AtomicU8]
on every access.
use bytemuck::Pod;
use core::sync::atomic::{AtomicU8, Ordering};
#[repr(transparent)]
pub struct TearCell<T>(core::cell::UnsafeCell<T>);
impl<T: Pod> TearCell<T> {
/// All (potentially) shared access to `self.0` goes through this function.
#[inline]
fn atom_slice(&self) -> &[AtomicU8] {
// Note: the real code handles ZSTs, and avoids ever calling
// (the equivalent to) this function for them.
assert!(core::mem::size_of::<T>() != 0);
// AtomicU8 isn't Copy, and thus bytemuck can't do this for us.
// This is clearly sound though, as we never view the data
// inside the UnsafeCell as anything other than `&[AtomicU8]`.
unsafe {
core::slice::from_raw_parts(
self.0.get() as *const AtomicU8,
core::mem::size_of::<T>()
)
}
}
}
I’m using bytemuck
because it meets the requirements of: being simple,
available on play.rust-lang.org, and doing what I need. I also have contributed
enough to it that it no longer sets off my internal NIH siren.
All operations are bounded on bytemuck::Pod
. This is basically a much stronger
cousin of Copy
. Not only are all Pod
types Copy
, but (among other things)
must:
- have no padding bytes (no
(u8, u16)
, for example). - have no uninitialized bytes (no internal use of
MaybeUninit
). - have no invalid bitpatterns (no
bool
,char
, or&T
for example).
Further, they also must be repr(C)
or repr(transparent)
, as repr(Rust)
structs can’t actually guarantee all these things.
(There are some other requirements too, feel free to read the docs
if you’re interested. Also note that there’s effort to some standardize similar
marker traits, but at the time of writing this, that has not gotten far enough
to replace bytemuck
here).
Strictly speaking, in the implementation provided here, I believe we almost
could use a more relaxed bound than Pod
. I believe internal padding bytes and
other uninitialized fields, and repr(Rust)
structs would be fine for us.
While we would be reading from an uninitialized byte sometimes, we’d then be
writing it to one that’s expected to be uninitialized. The current language of
the unsafe-code-guidelines seems to allow this implies that’s true,
but I’m not fully certain. That said, it’s dodgy either way… and requiring
Pod
makes it easier to verify as sound.
Reading
Alright, now for load
:
impl<T: Pod> TearCell<T> {
#[inline]
pub fn load(&self) -> T {
// Note: Using MaybeUninit here would be better, but `Pod` gives us
// access to `zeroed`, and `MaybeUninit` is still kind of a pain to
// use for arrays/slices until more things stabilize.
let mut dst = T::zeroed();
let dst_bytes: &mut [u8] = bytemuck::bytes_of_mut(&mut dst);
let src: &[AtomicU8] = self.atom_slice();
// This can never fail, and makes `zip` produce better code...
assert_eq!(dst_bytes.len(), src.len());
for (d, s) in dst_bytes.iter_mut().zip(src.iter()) {
*d = s.load(Ordering::Relaxed);
}
dst
}
}
This is mostly what you’d expect:
- Create destination
T
we’ll ultimately return, - Get it’s bytes as a
&mut [u8]
. - Get the source bytes using
atom_slice
as shown above. - Iterate over both source slices, initializing each destination byte using a byte read from an atomic loads from the source byte.
One point to note is that the Ordering
we use doesn’t matter, since
fundamentally all access to TearCell
is nonatomic, so we use the weakest we
can.
Writing
All that’s left is store:
impl<T: Pod> TearCell<T> {
#[inline]
pub fn store(&self, value: T) {
let src: &[u8] = bytemuck::bytes_of(&value);
let dst: &[AtomicU8] = self.atom_slice();
assert_eq!(dst_bytes.len(), src.len());
for (d, s) in dst.iter().zip(src.iter()) {
d.store(*s, Ordering::Relaxed);
}
}
}
This is essentially the opposite of load
.
- Get a slice containing the bytes of the provided value.
- Get our
atom_slice
. - Copy the data into the atoms, using
Relaxed
stores (see above for whyRelaxed
is used).
Note that the operation bytemuck::bytes_of
does here is essentially equivalent
to what you see in the atom_slice
above. (If it’s easier to follow I can avoid
leveraging libraries in the future, let me know).
Wrapping up
And we’re done!
I don’t really think this will be useful for anybody, and it’s pretty dodgy code, but it’s certainly… interesting.
The real crate is more efficient than this, and slightly less dubious, but is essentially the same idea.
If you end up using a TearCell
for anything (and you should really make sure
you’re fully aware of why it might be a mistake…), then I’d be interested in
your use case!