If you've been doing async for a while, you've probably heard someone say something like "the compiler takes an async function and converts it to a state machine." I want to dive into this more, since we can think of cancellation as making the state machine more complex. In this post, I'll show how to build a state machine for a simple async function. Then we'll we'll see how the state machine changes if we want to be able to run async code during cancellation. Finally, we'll explore some of the design space around cancellation, particularly what happens if a future that has been cancelled is cancelled again, and see how state machines can suggest several possibilities.

Let's use the program below as a running example. In real life, this would probably return a Result<DataTable>, but I want to avoid the extra complexity around additional early exits.

async fn load_data(file: AsyncFile) -> DataTable {
    let mut data = Vec::new();
    let result = file.read_to_end(&mut data).await;
    result.unwrap(); // We're ignoring proper error handling
    parse_data(data)
}

For an async function's state machine, states are made up of the the code between await points. Or alternatively, you can think of await points as edges between states. For this program, the state machine would look like this:

A basic state machine diagram for the preceding code fragment

You might notice I pulled a bit of a fast one on you. I said await points turn into edges in the state transition diagram, so we'd expect to see just one edge labeled await. Instead, we have two edges labelled await and one without a label. What's going on?

First, some conventions. I realized it's helpful to see some of the traditional control flow in addition to suspension or await points. I've represented these edges as a solid, unlabeled line. These mean that control transfers from the previous state immediately to the second state without any suspension. Our example is a relatively simple strait-line program so the actual control flow graph isn't particularly interesting but this will change a little when we look at cancellation. The other edge we have in this graph is the await edge. These edges are labeled await and are dotted lines to indicate that execution is interrupted---the future will suspend and give the executor the chance to switch to another future for a time. Finally, I've introduced a couple of special states that do not exactly correspond to any code the user wrote. These states are shown in orange.

Now let's turn our attention to why the diagram shows two await edges but the await keyword only shows up once in the program. Every async fn has an implicit suspend point that represents the time between when the function is called and when is first polled. In this diagram, I've represented this as an await edge going from the start state to the first line of the function. In general, you don't have to worry about this hidden initial suspend point too much because async function calls are almost always immediately awaited. In other words, it's more common to see foo().await instead of let future = foo(); /* do some other stuff */; future.await.

CancellationπŸ”—

The state machine we've looked at so far does not do a good job of representing cancellation. Let's try to extend it to do so.

Today in Rust cancellation simply means you stop polling the future, and instead it is dropped. When dropping something like a closure or a future returned by an async fn, Rust needs to recursively drop the values store in (in other words, captured by) the closure or future. Depending on what state the future is in when it is dropped, there are different values that need to be captured. In our example, if we drop the future before we pull it, we only need to drop the AsyncFile that was passed in as a parameter. On the other hand, if the future is dropped at the await point, we also need to drop the Vec that we read the file contents into.

We can add some extra states to our graph to illustrate this.

A state machine diagram showing cancellation as it exists today, where cancellation cannot be interrupted once started

I've specifically called out drop along the cancellation path, but Rust also drops values during the normal exit path. I've left the normal drops out for simplicity.

I like thinking of async functions this way because we can use it to make several observations about cancellation in Rust. Many of these seem rather obvious, but they raise important requirements for designing a system that can handle cancellation well.

Observation 1: Cancellation is a state change. When we cancel a future, it transitions from its normal running states to a cancellation path. Currently this happens implicitly when a future is dropped, but in the future we will probably want a way to explicitly transition a future to its cancellation path.

Observation 2: Async cancellation handlers1 require adding await points on the cancellation path. At the moment, cancelling futures is synchronous. This shows up in the async state graph in the fact that there are no await edges on the cancellation path. If we want to allow for cancellation handlers, we will need to add await points in the cancellation path. This may be obvious, but this also implies we need a way to make sure executors continue to poll futures that have been cancelled.

Observation 3: Cancellation is an alternate exit. An async function that has been cancelled does not exit through the normal return path. From the perspective of an async function author, this shows up as the function not continuing to execute past an await point. From a types standpoint, a function cannot exit normally in general because we may not yet have a value of the right type to return. In our example we can see that the type of the function does not allow it to exit at the await point, because at that point we have not created a DataTable to return. This observation has implications that will show up in the types of the API we eventually design for cancellation handlers.

Cancellation CancellationπŸ”—

Another thing we can explore with a state graph is what behaviors are possible if a cancelled future is cancelled again. One common way this could happen is if you have something like a race combinator that returns the value of the first future to complete and cancels the other one. If the race combinator is itself cancelled while it was cancelling the slower sub-future, the slower sub-future would be cancelled twice.

FIXME: write out and explain a code example of this case.s

Let's look at this in the abstract with state machines.

There are a couple of possibilities for how to handle cancellation of cancellation. I'll consider three of them, inspired by the zero one infinity rule.

0. Cancelling during cancellation is not allowedπŸ”—

Once we have support for cancellation handlers, it will definitely be possible to write code that leads to trying to cancel a cancellation. The race example we mentioned earlier is one example. So in this option, we would declare cancelling a cancellation to be an error. We have some flexibility on what mechanism we'd use exactly, but I think the best option would be to panic.

I think in practice this option is not feasible. Cancellation flows from top to bottom (e.g. an executor decides to terminate a task early and so runs the task's cancellation handler), but the higher levels do not know anything about the internal behavior of futures. An executor that is cancelling a task does not know if one of the task's subfutures is trying to cancel a future already.

1. Cancelling a cancellation is idempotentπŸ”—

In this version, cancelling an already-cancelled future is basically a no-op. In state machines, it would look something like this:

A possible model for cancelling a cancellation, where cancellation is idempotent

The key point here is that any of the cancel states have a cancellation edge that comes back to the same state. In other words, cancelling once your future has already been cancelled means you stay in the same state and continue executing the cancellation handler before.

What does this mean in practice? It essentially means you can trust that your cleanup code in a cancellation handler will run to completion. Admittedly, this might take additional rules, like we may want to declare it to be undefined behavior to not poll a cancelled future to completion2. Scoped tasks would likely need this guarantee, but we could consider weaker ones, like that a "well-behaved" executor will poll cancelled futures to completion. The "well-behaved" guarantee is roughly what we have today for Drop, so it might be similarly useful.

The downside is that this also means we can add cancellation behavior that can take arbitrarily or even infinitely long.3 We might decide instead that cancellation means something like "request graceful shutdown" but then forcibly terminate a future if it takes too long. For this we need recursive cancellation.

∞. Cancelling a cancellation is recursiveπŸ”—

In this version, canceling an already cancelled future would transfer us to a separate cancellation path. That cancellation path could also be cancelled, and its cancellation could be cancelled, and so on. In pictures, recursive cancellation looks like this:

An alternate approach to cancellation handlers that allows for recursive cancellation

While an infinite regress of cancellations might seem ridiculous, there are some cases where it might be useful. There's also a nice regularity to it.4 One class of problems where this might be useful are cases where you have optional cleanup work to do but you can cancel it if needed for a more prompt shutdown. Of course, I'm not sure this is really all that useful in practice, and if you need it there might be other ways to do it.

More importantly, there are many cases where you absolutely do not want to cancel the cancellation. For example, maybe you have a transaction future whose cancellation path rolls back the transaction. You do not want to stop the rollback before it's complete, or else you've completely defeated the purpose of transactions.

That said, recursive cancellation appears to be strictly more powerful than idempotent cancellation because if you have recursive cancellation you should be able to implement idempotent cancellation where needed (basically, you just ignore the subsequent cancellation signals and stay in the same state you were in).

Seen this way, recursive cancellation gives us a lot of flexibility. It means individual futures can implement either behavior, according to what best fits their needs. The main thing the Rust language would need to do is design reasonable defaults and set expectations so people authoring futures can encapsulate their specialized behavior.

ConclusionπŸ”—

We've long talked about async functions as state machines, so in this post we looked at how you might draw a state transition diagram for async functions. This gave us a way to play with cancellation and look at what various cancellation semantics might imply in terms of the shape of the state transition diagram. I've found it really helpful to think about async cancellation this way, so I hope others find it useful as well!

This post was originally part of a larger post about implementing a prototype of async cancellation handlers. The larger post was taking a long time and I felt like the content in this post was useful on its own so I wanted to go ahead and publish it. While I no longer like to promise that a followup post is coming soon5, I do have most of the longer post drafted so chances are good I will get it out soon. Plus, I did commit to discussing it at the WG Async Reading Club next week, so there is a little pressure on.

Anyway, please reach out if you have any thoughts or questions!


1

I'm using "cancellation handlers" refer broadly to mechanisms to allow running async code on the cancellation path. This would likely be async Drop, I want to use a more general term to emphasize there are multiple possibilities here.↩

2

This will require us to mark something unsafe somewhere.↩

3

This is true of drop already today. I can write fn drop(&mut self) { loop {} } and my program will hang when the destructor tries to run.↩

4

One of the things that bugs me about the idempotent version of cancellation is that you can call any future from either a normal execution or cancellation path, but in the cancellation path they effectively become uncancellable. It's not actually a problem, since not cancelling a future is always a choice your allowed to make, but the asymmetry still bothers me.↩

5

I'm sure you'll find plenty of examples on my blog of posts I said were coming soon that did not, in fact, come soon, if they ever came at all.↩