A Rose By Any Other Name
There are currently two competing designs for async iteration traits for Rust.
The first is poll_next
.
The second is async fn next
.
I see strengths to each design.
The poll_next
design seems stronger on technical concerns, such as performance and ease of implementation.
The async fn next
design seems better from an ergonomics and consistency perspective.
Unfortunately, the process of resolve this debate has been slow going.
One thing I've realized in the debate is that at this point the two designs are more similar than they may seem at first glance. In this post I'd like to show how a handful of minor tweaks to each design results in something that is the same modulo names.
A Concrete Performance Differenceπ
But first, we're going to go on a bit of a diversion about the performance of the two designs.
One of the early arguments in favor of poll_next
was that it was more efficient.
In most cases, we expect the compiler to be able to optimize away any of the overhead that async fn next
might introduce, but it seems better, all else being equal, to pick the design that doesn't require as much work from the compiler.
That said, we discovered one case where the compiler cannot optimize away the overhead of async fn next
, and that has to do with the using a dyn AsyncIterator
object.
To see the difference, let's consider pseudo-code for a for await
loop for the two designs.
We'll also desugar the .await
calls so we can see where the calls to poll
happen.
Let's consider a simple function that receives a dyn AsyncIterator
and iterates over it:
async fn sum(it: &Box<dyn AsyncIterator<Item = i32>>) -> i32 { let mut total = 0; for await i in it { total += i; } total }
Using the poll_next
design, this desugars to something like follows.
I've left out the context parameter to poll_next
for simplicity.
async fn sum(it: Box<dyn AsyncIterator<Item = i32>>) -> i32 { let mut total = 0; let it = Box::into_pin(it); loop { match it.poll_next() { // poll_next call is indirect Poll::Ready(Some(i)) => total += i, Poll::Ready(None) => break, Poll::Pending => yield Poll::Pending, } } total }
Using the async fn next
design, this desugars to something like below.
Note that the in this version AsyncIterator
is not object safe, so we're assuming we have support using something like dyn*
.
async fn sum(it: Box<dyn AsyncIterator<Item = i32>>) -> i32 { let mut total = 0; loop { let f: Pin<dyn* Future<Output = Option<i32>>> = pin!(it.next()); // next call is indirect let next = loop { match f.poll() { // poll call is indirect Poll::Ready(next) => break next, Poll::Pending => yield Poll::Pending, } }; match next { Some(i) => total += i, None => break, } } total }
The async fn next
version has two indirect calls in its desugaring.
First need to make an indirect call to next
to get the future to poll to get the actual next item.
The way async in trait objects works means that the resulting future will itself be a trait object, which means calling poll
on that future will be an indirect call too.
It's possible these calls don't matter. Indirect calls are normally only relevant in CPU-bound workloads, while async is most often used for IO-bound workloads. In wg-async, we've talked a lot about the this extra indirections but I think it's mostly because we can objectively count how many indirect calls there are, while we can't objectively things like design elegance or ergonomics.
There will be cases where this overhead does matter though, so it is desirable to not bake these calls in at the language level.
Can we tweak the design of async fn next
to remove most of this overhead?
Reducing the async fn next
overheadπ
The key idea here is to change the semantics of the future returned by async fn next
so that it's able to be polled again after completion.
When polled after completion, it essentially calls next
again but reuses the future.
This is, in fact, how the Next
future from the futures crate works.
It wraps poll_next
, so the future has the same semantics as poll_next
.
If we make this semantic change, then we can desugar the async fn next
version to something like below.
async fn sum(it: Box<dyn AsyncIterator<Item = i32>>) -> i32 { let mut total = 0; let mut f: Pin<dyn* Future<Output = Option<i32>>> = pin!(it.next()); // next call is indirect loop { let next = loop { match f.poll() { // poll call is indirect Poll::Ready(next) => break next, Poll::Pending => yield Poll::Pending, } }; match next { Some(i) => total += i, None => break, } // the next time around the loop we'll use `f` again } total }
We still have two distinct indirect calls, but the first one is only called once at the start of the loop rather than inside the loop for each iteration.
This code is now more convoluted than it needs to be. We can instead refactor it to only have one loop and one match expression:
async fn sum(it: Box<dyn AsyncIterator<Item = i32>>) -> i32 { let mut total = 0; let mut f: Pin<dyn* Future<Output = Option<i32>>> = pin!(it.next()); // next call is indirect loop { match f.poll() { // poll call is indirect Poll::Ready(Some(i)) => total += i, Poll::Ready(None) => break, Poll::Pending => yield Poll::Pending, } } total }
This version looks suspiciously like the poll_next
desugaring.1
Aside: This Doesn't Actually Workπ
This optimization sounds good, but it doesn't actually work. I want to briefly touch on why but then let's just pretend we didn't notice this because I'd like to consider the rest of the argument.
The problem is that async fn
s only complete once.
We might redefine the semantics of Future
to allow multi-completion futures, but unless we also change async fn
, the futures produced by async functions will still be single-completion.
The key benefit of the async fn next
is that users can hand-roll their own async iterators by writing async fn next
and that this should be simpler than writing two state machines at once with a hand-written poll_next
.
We could add a modified version of async fn
that lets you write a multi-completion future, but if we do this I'm pretty sure we'll end up with async gen fn
with some different names.
Or we could add a way to create a multi-completion future from an async closure, but if we make users go through all of this, we've given up the simplicity benefits of being able to just write async fn next
.
Anyway, let's ignore this for now and continue on.
Iterator Setupπ
So with the performance optimization, things are looking very similar.
Both versions start with a little bit of setup code.
The poll_next
version needs to pin the iterator since poll_next
takes a pinned argument.
The async fn next
version needs to call next
once to get the future to poll, and then pins that as well.
Then both versions go into a loop calling either poll_next
, or a poll
function that we've redefined to have the same semantics as poll_next
.
The poll_next
version does not have an analog to the next
call in the async fn next
version.
I think this does exist, it's just not shown here.
At some point, you need to set up your iterator state.
You can't just call poll_next
on a Vec
, partly because the Vec
would have to be pinned and that would be inconvenient, and also because it's not ideal to keep your iteration state attached to the Vec
.
So let's imagine we have some operation which sets up the state needed to run an async iterator.
In our poll_next
examples, we've been assuming the iterator is set up before the call to sum
.
In other words, it's called as sum(my_vec.async_iter())
and not sum(my_vec)
.
What if, instead, we created some generic way to make an async iterator for some collection and moved that into the sum
function?
We could call this trait IntoAsyncIterator
2, and it would look something like:
trait IntoAsyncIterator { type Item; type AsyncIter: AsyncIterator<Item = Self::Item>; fn into_async_iter(self) -> Self::AsyncIter; }
Then we could write the poll_next
version of sum
as:
async fn sum(it: Box<dyn IntoAsyncIterator<Item = i32>>) -> i32 { let mut total = 0; let mut f: Pin<dyn* AsyncIterator<Item = i32>> = pin!(it.into_async_iter()); // into_async_iter call is indirect loop { match f.poll_next() { // poll_next call is indirect Poll::Ready(Some(i)) => total += i, Poll::Ready(None) => break, Poll::Pending => yield Poll::Pending, } } total }
Now we basically have the same code as the async fn next
version.
There's a subtle difference, since async fn next
takes &mut self
while into_async_iter
takes self
, but we could design the "initialize async iterator state" operation to take &mut self
if we wanted.
Also, I should point out that self
methods are not allowed in object-safe traits, so as designed, we would not be able to have a dyn IntoAsyncIterator
.
What's in a name?π
If we make the change to async fn next
to allow the future to be polled again after completion and we add some kind of IntoAsyncIterator
, then the two designs are basically equivalent.
In both of them, the for await
loop desugars into a setup phase (similar to have synchronous for
loops call IntoIterator
), and then there is a loop can calls a method to advance the state of the iterator.
If we make these two changes, then IntoAsyncIterator
in the poll_next
design is essentially what AsyncIterator
is in the async fn next
design.
They are both the way to initialize iterator state.
Then the AsyncIterator
trait in the poll_next
design is equivalent to the modified Future
semantics in the async fn next
design.3
So to me it seems that if we apply the multi-completion optimization to async fn next
then the two designs have the same shape and most of the debate is around the boundaries and names of various subcomponents.
Conclusionπ
I wrote the first draft of this post about three months ago and then hesitated to publish it because in writing it I discovered some fatal flaws in the argument. However, I've shared the draft privately with a few people and we've been referring to it often in discussions so I felt like it was worth putting it out in public. Others have come up with similar arguments to this post so I consider this post to be my particular take on the argument rather than the definitive statement.
At the beginning I had hoped to say that with the optimizations we've added to the async fn next
design, the two designs are essentially the same.
The implication then would be that we should stop arguing about names, pick something, and stabilize it.
Having written the post, I think we instead see an irreconcilable difference in the designs.
While the multi-completion future optimization does lead to a design that's equivalent to poll_next
, it gives up the key characteristic of the async fn next
design, which is the ability to hand-write iterators with async fn next
.
It's not "here's a small tweak and now the designs are the same," it's "here's a small tweak and now async fn next
is a completely different design."
So in conclusion, the two designs are not secretly the same and we are going to have to actually make a decision on which one we want to have.
In fact, I copied the poll_next
desugaring and then made a few changes to get this version.β©
IntoAsyncIterator
may not be the best name, because it implies it consumes its source. We might want a non-destructive way to iterate over something asynchronously.β©
If we go this route, I'd suggest having AsyncIterator::next
return a subtrait of Future
called something like IterableFuture
or MultiCompletionFuture
to highlight multi-completion semantics. This is similar to how FusedFuture
adds additional semantics to the underlying Future
trait.β©