8 minute read

Miri is an evaluator for rust that detects many kinds of undefined behavior. Here are some things it could (possibly) do that would catch bugs I’ve had/seen before.

If you write Rust, especially unsafe rust, and aren’t familiar with miri, you should change that. It’s easy to install and use, and extremely useful for finding bugs. See its rust-lang/miri github for more info, since for the most part, I’m just going to assume familiarity with the tool.

As some background for this post, it’s worth noting that miri “intercepts”1 and emulates the behavior of a subset of OS system API calls, intrinsics, and generally implements everything abstractly — simulating your code under a hypothetical “abstract machine” for Rust.

A very compelling use case for this is that this means it can simulate other OSes/architectures than the one you’re running on. This lets you test my code on, say big-endian PPC linux, just with cargo miri test --target=powerpc64-unknown-linux-gnu, even if I’m not on linux, or PPC (I do need the relevant target installed with rustup, of course). It’s a bit like running under qemu, but it’s not running machine code, just changing a few details of how it interprets things.

Anyway, emulation of other targets is not the only compelling use case for this “interception” (nor is it what I’m here to talk about). It can also be used to better test the handling of edge cases that wouldn’t come up if the real intrinsic or OS call were used.

For example, a while ago I added some functionality to miri so that AtomicFoo::compare_exhange_weak would randomly fail spuriously some percentage of the time. This is allowed by the API, but can only happen on certain systems, and is rare even then. However, now miri will cause it to randomly2 happen about 80% of the time (This percentage can be controlled via CLI flags).

While it can take writing a test case that specifically exercises the code for this, it has helped me gain a lot of confidence in some concurrent algorithms.

I think miri should add other stuff like this, where goes out of its way to better exercise edge cases that aren’t as likely to exist in real executions.

Here’s a few ideas for them, in blog post form, since it’s too much for a single github issue, I guess. Maybe I’ll add some of these when I find time, but if you find the idea compelling, please feel free to beat me to the punch.

(Also, if any of these already are implemented, oops! I don’t think so, but honestly I could be wrong!)

More Interruptions and Spurious Failures

There are a lot of APIs in the OS and standard library which are allowed to fail spuriously or be interrupted partway through, etc. Sometimes this only happens under load, or in weird edge cases, so it’s extremely common for this code to exhibit bugs.

Spurious Wakeups in Concurrency Primitives

A easy example here is wait/wake APIs. For example, Thread::park, or Condvar::wait, as well as the wait-side of futex-style APIs common on many systems. All of these are both allowed to spuriously wake up, and proper use will be in a loop.

This code usually doesn’t usually get sufficient testing, and I’ve seen serious bugs caused by incorrectly handling futex wakeups, and even seen Rust code that had a data race in the case of a spurious Condvar wakeup. This seems like a natural fit for miri.

Partial / Interrupted IO

Several functions in std::io are allowed to fail spuriously or succeed only partially. Notably, std::io::Read::read and std::io::Write::write both can fail with ErrorKind::Interrupted (which should be immediately retried), or succeed but only after performing a partial read/write.

This generally reflects a property of the underlying OS api, for example: on Unix write is allowed to fail with EINTR in which case the caller should retry. It may also succeed but write/read less than expected, and Windows has. These will generally only happen under high load or in rare timing-dependent cases, such as when the IO happens at the same time that a signal is delivered (even if the signal is something like WINCH, e.g. the terminal window was resized).

This seems like a good candidate for miri to bump up the probability of IO failure.

Between read/write, read seems more useful to test than writes, since it could fairly easily lead to use of uninitialized memory (if it passed in a uninitialized buffer, and assumed the whole thing was filled). Doing the same for write doesn’t seem as useful, since I suspect most code that doesn’t check the number of bytes written, also wouldn’t notice that the written file (or whatever) is incorrect. Also, you don’t often write files in test code.

Address Value Shenanigans

Minimal-alignment mode

Right now miri has two3 modes that control how it deals with about pointer alignments. The first is a “symbolic alignment mode”, which treats addresses very abstractly but will never accidentally allow a program with buggy alignment handling to pass.

It also has an “int mode” that just uses a real integer address, erroring when attempting to perform a load which where the real address is misaligned for the type. That is, it behaves the way you’d expect, although some code may just happen to get a sufficiently aligned pointer by luck.

Int mode is now the default, since the symbolic mode is… Well, it causes problems. This means you can’t test code that manually aligns pointers, and leads to situations that seem like they should be impossible — assert!(pointer_is_aligned(p)); let v = p.read() fails with an alignment error.

This is a step forward, but I think it could be better. I agree the “aligned correctly by luck” case is genuinely concerning, but I think miri can help detect it by modifying “int mode” (or adding a variant if thats not possible) so that allocations never have a higher alignment then requested4 — if you request an alignment of 4, you should not happen to get an 8-byte aligned pointer by luck.

This isn’t perfect, as it only applies to the pointer returned by the allocation. I guess there might be certain operations that could cause it to become misaligned again, which would mean symbolic mode still needs to exist. In practice though, I think this would catch the majority of the issues here.

For example, this program contains a loop where it allocates a Box<u8>, and prints its address out:

fn main() {
    for _ in 0..100 {
        let b = Box::new(0u8);
        println!("{:p}", a);
    }
}

Under miri, the first few lines of output look like:

0x267d9
0x2f14b
0x35a5b
0x3c2d5
0x42a16
0x4921f
0x4f81d
0x5600c
0x5c782
0x62fb0
...

Some of these addresses are “over-aligned”, specifically, core::mem::align_of::<u8>() is 1 (equivalently, 20), but they’re divisible by a higher power of two. In an “minimal-alignment” mode, these would all be odd numbers.

In an ideal world, this would apply to allocations that take place on the stack and in static memory as well. I want to be sure I don’t accidentally assume a [u8; 256] or whatever is aligned more than requested.

All this sounds doable (definitely for heap allocation, but I don’t k), but may increase memory usage requirements too much to be on by default, but if not IMO it should be the normal behavior of “int mode”.

Beyond Miri? (-Zrandomize-layout=min-alignment?)

Note that I think the previous idea can’t apply to struct fields, but it could be performed as an extension or variation of the -Zrandomize-layout flag (which randomizes the layout of structs and the like which don’t have an explicit #[repr]).

For example, it could decide that (usize, u8) must be laid out with size_of::<usize>() - 1 bytes of padding before the u8 (rather than after), so that code accidentally relying on a field being aligned could be detected.

This would be good, since code that needs these guarantees should be using an explicit repr. It might be hard to implement though, I don’t know. It also seems like it would not always be desirable — in some sense this is less random.

More Address Shenanigans

One thing I notice in the program above is that miri’s addresses are very low; low enough that casting to smaller-than-usize integers would be lossless, and that overly-presumptive pointer tagging might succeed.

Unfortunately, how many bits to use is tricky. I think on 32 bit systems, code that relies on any of the higher bits being unset is broken. So perhaps on those systems, miri’s heap would return allocations that have at least the top bit set, and all other bits are pseudorandom. Note that we need to be sure that no two pointers are ever separated by more than isize::MAX for compliance with other properties.

Arguably we could do this on 64 bit systems too, and I think we can, but perhaps not by default. In real world 64 bit systems, the top 16 bits (or top 8 bits if the OS is using 5 level page tables) of the pointer are typically free on these systems, and code often will use5 these bits if available. Maybe that’s fine and that code can set a flag though, I don’t know how common it is in Rust at the moment, but also miri could do this but just for 48 bits, perhaps.

This seems slightly harder to implement, but I could be mistaken.

Anyway that’s all I’ve got.

  1. That is, it conceptually intercepts them. This is not at all how it actually works in the implementation, though. 

  2. Note that miri also has flags to configure the randomness seed, in case that nondeterminism worries you. 

  3. Okay, it also has one for disabling this check entirely. 

  4. More precisely, the integer address of the allocation’s base pointer should always have the bit 1 << (requested_alignment.trailing_zeros() + 1) set. It should never be divisible by a higher power of two. This of course would only apply to the pointer at the “base” of the allocation. For example, the one returned directly by the heap allocator. 

  5. For example, several lock-free algorithms need to be able to pack a ABA counter in the top bits of a pointer (or to use 128 bit atomics, which are not available to Rust). Usually the alternative is a deferred reclamation scheme, which is much slower and more complex, and I suspect most of them would also include techniques that upset miri.