Async Rust: Panics vs. Cancellation

pornel · on Jan 31, 2022

I really like cancellation in Rust, and I don't mind that it's so forceful. Maybe it's because I've worked with Node.js which is all the way at the other end of the spectrum with no way of aborting promises externally. In Node if you want to abort an async operation, you need to thread a DIY signal/flag/callback throughout your entire program, or you'll be left with orphaned tasks still running. Compared to that, Rust's futures that just immediately stop when you abort them are a luxury.

I've written quite a bit of async Rust code, and I haven't hit the problem described in the blog post. I think there are a couple of reasons:

• Rust already has programming patterns (like Drop guards) that avoid leaving the program in an inconsistent state in case of panics, and this solves the problem for aborted futures too. If your tasks owns a TCP stream, then it will always close the stream when it's aborted.

• Aborting and error handling are usually tied together, e.g. `timeout(copy()).await?` aborts a future, but turns that into an error. Especially for network-related tasks it's natural to treat timeouts as yet another case of an I/O error. Each end needs to handle suddenly closed connections anyway, regardless whether they're closed by Rust or a network error along the way.

• Rust uses layered abstractions for async code, e.g. you have request/response servers or streams for protocols. This way you can't leave the network protocol in an inconsistent state, even if your request handling code is aborted. If you were writing to a multiplexed HTTP/2 stream, then the underlying protocol handler will send an end-of-stream packet for you when your higher-level response stream is dropped.

wging · on Jan 31, 2022

One current gap is that async drops aren't currently a thing - you cannot do any async work to clean up a resource. I imagine Niko will point that out in the next post. See also https://boats.gitlab.io/blog/post/poll-drop/

ithkuil · on Jan 31, 2022

I toyed a little bit with an approach to async drop; let me know what do you think

https://crates.io/crates/arcy

leshow · on Jan 31, 2022

Not OP, async drop isn't a thing, but you can still have a drop guard in an async function. It's not as nice as having async drop obviously, because you can't run any await-able code inside Drop, but it still works for some situations:

   async fn foo() {
      let guard = Guard;
      // async stuff
   }

   impl Drop for Guard {
      fn drop(&mut self) {
         // do cleanup
      }
   }

pornel · on Jan 31, 2022

This is indeed a big gap for interacting with non-Rust async APIs where you can't dictate memory management strategy. However, within Rust, it rarely comes up. Probably because native Rust libraries/interfaces have no choice but design around it.

If you really have to finish some work asynchronously on drop, you can spawn another task from your drop function. But most of the time I've found it's not even necessary, because you can restructure the code to separate the "abortable" part from the "must finish" part (e.g. some network protocol serializer must always terminate a message it writes, but generation of the message itself is typically done elsewhere, so you can set it up so that abort of the message-generating future doesn't abort the protocol-framing future).

codeflo · on Jan 31, 2022

Maybe the use of the word “cancel” can mislead some people, because the term means a much more controlled way of shutdown in other ecosystems. For example, you can’t really stop Tasks in C# unless you explicitly pass around a CancellationToken and check it at strategic points — very similar to the one in Tokio: https://docs.rs/tokio-util/latest/tokio_util/sync/struct.Can...

What dropping futures does in Rust is much more forceful, and possibly meant for a different usecase, or as a lower-level primitive.

SigmundA · on Jan 31, 2022

Cancelling in Rust sound a lot like Thread.Abort() in .Net [1] just injects an exception arbitrarily into the thread and generally frowned upon due to potentially corrupt shared state and has been completely removed in later .Nets.

Unless you treat threads like a full process (no shared state) then you need cooperative cancellation (CancellationToken) which generally works well its nice to have common agreed upon cancellation message most commonly used in async I/O calls like a long running a DB query and wanting to cancel it if the client disconnects because they navigated away.

If you have uncooperative code that may say go into an infinite loop then you need to somehow preempt it if it runs on too long so then you have Thread.Abort or really Process.Kill which is the only safe way to do this in .Net (run it in another process).

[1] https://docs.microsoft.com/en-us/dotnet/api/system.threading...

notriddle · on Jan 31, 2022

It only happens at `await` points. This means you don’t need to worry about it leaving an invalid BTreeMap or anything, only code that actually does async work needs to worry about being canceled.

SigmundA · on Jan 31, 2022

Hmm the article says: In effect, if you look at things from the “inside view” of the async fn, cancellation looks like the await call panicking – it unwinds the stack, running the destructors for all values. The analogy, of course, only goes so far: you can’t, for example, “catch” the unwinding from a cancellation. Also, panics arise from code that the thread executed, but cancellations are injected from the outside when the async fn’s result is no longer needed

It also references Javas deprecated Thread.stop which looks very similar to .Nets Thread.Abort

maxwell86 · on Jan 31, 2022

> cancellation looks like *the await call* panicking

Emphasis mine. In Rust, "cancellation" happens at well defined "await points".

SigmundA · on Jan 31, 2022

Thanks for clearing that up, I missed that detail. So it seems similar to Tasks in .Net that check the cancellation token before starting and will throw on await. However in .Net you can check IsCancellationRequested in your code once the Task is running to decide how to cancel or just ThrowIfCancellationRequested() . Then you try catch your awaits to handle cancellation (or not).

In Thread.Abort the .Net runtime is just injecting a call to throw into the instructions at potentially any point in the threads code which is obviously problematic although throwing on any await can still cause problem if you're not expecting it. There is no try catching involved because you really can't catch ThreadAbortException instead you are just waiting for the thread to stop with Thread.Join or what ever.

codeflo · on Jan 31, 2022

I don’t think it’s similar at all, that was my point. The .NET runtime itself will never check a CancellationToken for you, but an API you pass it to might, of course (and many standard APIs take one as a parameter). And even then, the result is an exception, which you can catch and keep going.

That’s not at all what happens when the Future is dropped in Rust: Yes, it only happens at .await, but when it does, execution is simply stopped dead, no chance to continue.

SigmundA · on Jan 31, 2022

Using the await keyword in C# a TaskCanceledException will be thrown automatically regardless if you check it or not, it just won't stop your code once running the task starts, it checks before start and at the end to set the task and sets to a canceled state which leads to an exception.

Rust seems to just have exceptions you can't catch (Panics) they are like ThreadAbortExceptions in .Net. That is if task threw ThreadAbortExceptions instead of TaskCanceledException and you called await with an already cancelled token you would get similar behavior it seems.

codeflo · on Jan 31, 2022

I think that depends a bit on what you intended to do after the await. I’m certain there are uses of an async mutex (several libraries have one) where there’s indeed a broken invariant.

amalcon · on Jan 31, 2022

Java has thread.stop(), which does a similar thing and is deprecated for similar reasons.

https://docs.oracle.com/javase/8/docs/api/java/lang/Thread.h...

I remember trying to use it once, in my more reckless days and despite the warnings, thinking it's all fine if I just wrap all intermediate states in try/finally. It turns out that sometimes this will cause a throw in the finally clean-up, so you need to use a try/catch+rethrow and repeat yourself a bit. It's still not quite right, though, because you can still throw in the catch...

noitpmeder · on Jan 31, 2022

I find it really interesting that the author's example assumes that the read() and send() calls are the ones you need to worry about w/r/t exceptions. To me, the parse() call seems the most volatile -- what guarantees are there that the bytes you just read are parsable?

I usually code my programs assuming the system will work (reads/writes/sends). While I know this isn't guaranteed, it's a lot more likely my filesystem will work than that a file is assured to contain parsable data.

maxwell86 · on Jan 31, 2022

> To me, the parse() call seems the most volatile -- what guarantees are there that the bytes you just read are parsable?

None, which is why, in Rust, parse never throws an exception on a parsing error, and instead, returns a Result<Ok=T,Err=ParseError>, which is an ADT with either an Ok(T), which means parse succeeded, or an Err(ParseError), which means that an error happened, and contains state about where, etc.

See its documentation: https://doc.rust-lang.org/stable/std/primitive.str.html#meth...

The author isn't talking about this, probably cause its not the point of the article, but in Rust, you can't avoid handling parsing errors. If you want to actually get the value, you need to handle the possibility that an error happened. The programming model does not give the user a choice here.

wallacoloo · on Jan 31, 2022

it’s a blog about async, and especially cancelation. parse presumably doesn’t do any I/O. read/send are the interesting bits because they hit the network, where latency is practically unbounded. a first approach at cancelation might be to make precisely those I/O routines cancelable via (say) error injection, in such a way that none of the non-I/O code ever needs to care specifically about cancelation. so read/send is where the “interesting” cancelation logic lives or hooks into.

berkes · on Jan 31, 2022

> assuming the system will work (reads/writes/sends)

I encounter these issues quite commonly. Permissions, being in the wrong dir, buggy setup that forgot to create directories or copy files etc.

Isn't the `read()` and `send()` in rust concerned with those less-exceptional exceptions as well?

noitpmeder · on Jan 31, 2022

I totally get these issues, and one definitely should be aware of them when designing a system. It just feels a bit weird to treat these as the points where exceptions emerge and not the parsing of input data.

Could totally be an artifact of the example that was chosen to prove a point, it's just what jumped out at me.

unrealhoang · on Jan 31, 2022

Usually in Rust, parsing would never cause panic (or your parsing library is very wrong), parser will just return error for unparsable input.

berkes · on Feb 1, 2022

> It just feels a bit weird to treat these as the points where exceptions emerge and not the parsing of input data.

I understand this. It does make sense when seen as an artifact of the educational nature, though.

If the article wanted to explain about exceptions in the parser, the author had to explain the domain as well. Whereas we all are familiar with the exceptions in the file-io-domain already.

msopena · on Jan 31, 2022

> I find it really interesting that the author's example assumes that the read() and send() calls are the ones you need to worry about w/r/t exceptions.

I didn't read it that way. In my view, the article is explaining a mental model where async code & panics are similar/related in a possible abstract mental model. He's using that snippet of code which from an async perspective, one could reasonably expect that the file or network IO is worth async waiting on but parsing is not.

But I don't think the author assumes parsing couldn't raise an exception since he states at the beginning: "If the parse function or the send function were to throw an exception, whatever data had just been read (and maybe parsed) would be lost.".

arielb1 · on Jan 31, 2022

Its more that in Rust, async cancellation will occur exactly at await points, which are generally when you do IO.

throw10920 · on Jan 31, 2022

> The reason is that long experience with exceptions has shown that exceptions work really well for propagating errors out, but they don’t work well for recovering from errors or handling them in a structured way.

That's why you use a condition system like Common Lisp[1] - conditions can be recovered from using restarts. The thrown condition signals the kind of error (or, even, exceptional non-error circumstance), while the defined restarts provide various error-recovery strategies, from which one can be chosen programmatically or manually.

I think that this part:

> In most programs, you have some kind of invariants that you are maintaining to ensure your data is in a valid state. It’s relatively straightforward to ensure that these invariants hold at the beginning of every operation and that they hold by the end of every operation. It’s really, really hard to ensure that those invariants hold all the time.

Can be fixed by adding immutability - instead of mutating your program's state in a low-level function, either generate a "transaction" object that is finished and applied to some other piece of state, or an exception is thrown and the whole transaction object is thrown away (or a particular loop could be restarted, or the program could be entirely restarted or quit, etc., depending on what restart you choose).

This:

> The problem is that exceptions make errors invisible, which means that programmers don’t think about them.

Can be fixed using smarter tooling (or checked exceptions, but I don't think anyone wants those) that ensures that you handle exceptions/conditions, unless I'm mistaken.

[1] https://en.wikipedia.org/wiki/Common_Lisp#Condition_system

arielb1 · on Jan 31, 2022

Maybe that's the subject for the next blog post, but I think the main reason cancellation causes more troubles than ordinary IO problems, is that with ordinary errors you assume that the resource that suffered the errors is down and don't care about its precise state, while with cancellation the resource is perfectly OK and you want to continue using it.