-
Notifications
You must be signed in to change notification settings - Fork 0
Rust Async Syntax
The key elements of asynchronous programming in Rust are futures and Rust’s async and await keywords. A future is a value which may not be ready now, but will become ready at some point in the future. Rust provides a Future trait as a building block so different async operations can be implemented with different data structures, but with a common interface. In Rust, we say that types which implement the Future trait are futures. Each type which implements Future holds its own information about the progress that has been made and what “ready” means.
The async keyword can be applied to blocks and functions to specify that they can be interrupted and resumed. Within an async block or async function, you can use the await keyword to wait for a future to become ready, called awaiting a future. Each place you await a future within an async block or function is a place that async block or function may get paused and resumed.
Some other languages also use async and await keywords for async programming. If you are familiar with those languages, you may notice some significant differences in how Rust does things, including how it handles the syntax.
Most of the time when writing async Rust, we use the async and await keywords. Rust compiles them into equivalent code using the Future trait, much like it compiles for loops into equivalent code using the Iterator trait. Because Rust provides the Future trait, though, you can also implement it for your own data types when you need to.
To keep this chapter focused on learning async, rather than juggling parts of the ecosystem, we have created the trpl crate (trpl is short for “The Rust Programming Language”). It re-exports all the types, traits, and functions you will need, primarily from the futures.
- The futures crate is an official home for Rust experimentation for async code, and is actually where the Future type was originally designed.
- Tokio is the most widely used async runtime in Rust today, especially for web applications.
In some cases, trpl also renames or wraps the original APIs to let us stay focused on the details relevant to chapter.
cargo add trpl
use trpl::Html;
async fn page_title(url: &str) -> Option<String> {
let response = trpl::get(url).await;
let response_text = response.text().await;
Html::parse(&response_text)
.select_first("title")
.map(|title_element| title_element.inner_html())
}
we define a function named page_title, and we mark it with the async keyword. Then we use the trpl::get function to fetch whatever URL is passed in, and, and we await the response by using the await keyword. Then we get the text of the response by calling its text method and once again awaiting it with the await keyword. Both of these steps are asynchronous. For get, we need to wait for the server to send back the first part of its response, which will include HTTP headers, cookies, and so on. That part of the response can be delivered separately from the body of the request. Especially if the body is very large, it can take some time for it all to arrive. Thus, we have to wait for the entirety of the response to arrive, so the text method is also async.
We have to explicitly await both of these futures, because futures in Rust are lazy: they don’t do anything until you ask them to with await.
Once we have response_text, we can then parse it into an instance of the Html type using Html::parse. Instead of a raw string, we now have a data type we can use to work with the HTML as a richer data structure. In particular, we can use the select_first method to find the first instance of a given CSS selector. By passing the string "title", we will get the first <title> element in the document, if there is one. Since there may not be any matching element, select_first returns an Option<ElementRef>. Finally, we use the Option::map method, which lets us work with the item in the Option if it is present, and do nothing if it is not.
In the body of the function we supply to map, we call inner_html on the title_element to get its content, which is a String. When all is said and done, we have an Option<String>.
Notice that Rust’s await keyword goes after the expression you are awaiting, not before it. That is, it is a postfix keyword. This may be different from what you might be used to if you have used async in other languages. Rust chose this because it makes chains of methods much nicer to work with.
As a result, we can change the body of page_url_for to chain the trpl::get and text function calls together with await between them:
let response_text = trpl::get(url).await.text().await;
When Rust sees a block marked with the async keyword, it compiles it into a unique, anonymous data type which implements the Future trait. When Rust sees a function marked with async, it compiles it into a non-async function whose body is an async block. Thus, an async function’s return type is the type of the anonymous data type the compiler creates for that async block.
Thus, writing async fn is equivalent to writing a function which returns a future of the return type. When the compiler sees a function like async fn page_title it is equivalent to a non-async function defined like this:
use std::future::Future;
use trpl::Html;
fn page_title(url: &str) -> impl Future<Output = Option<String>> + '_ {
async move {
let text = trpl::get(url).await.text().await;
Html::parse(&text)
.select_first("title")
.map(|title| title.inner_html())
}
}
Let’s walk through each part of the transformed version:
- It uses the impl Trait syntax
- The returned trait is a Future, with an associated type of Output. Notice that the Output type is Option<String>, which is the same as the the original return type from the async fn version of page_title.
- All of the code called in the body of the original function is wrapped in an async move block. Remember that blocks are expressions. This whole block is the expression returned from the function.
- This async block produces a value with the type Option<String>, as described above. That value matches the Output type in the return type. This is just like other blocks you have seen.
- The new function body is an async move block because of how it uses the name argument.
- The new version of the function has a kind of lifetime we have not seen before in the output type: '_. Because the function returns a Future which refers to a reference—in this case, the reference from the url parameter—we need to tell Rust that we mean for that reference to be included. We do not have to name the lifetime here, because Rust is smart enough to know there is only one reference which could be involved, but we do have to be explicit that the resulting Future is bound by that lifetime.
Note that he only place we can use the await keyword is in async functions or blocks, so Rust will not let us mark main as async. The reason is that async code needs a runtime: a Rust crate which manages the details of executing asynchronous code. A program’s main function can initialize a runtime, but it is not a runtime itself. Every async program in Rust has at least one place where it sets up a runtime and executes the futures. Most languages which support async bundle a runtime with the language. Rust does not. Instead, there are many different async runtimes available, each of which makes different tradeoffs suitable to the use case they target. For example, a high-throughput web server with many CPU cores and a large amount of RAM has very different different needs than a microcontroller with a single core, a small amount of RAM, and no ability to do heap allocations. The crates which provide those runtimes also often supply async versions of common functionality like file or network I/O.
Here, and throughout the rest of this chapter, we will use the run function from the trpl crate, which takes a future as an argument and runs it to completion. Behind the scenes, calling run sets up a runtime to use to run the future passed in. Once the future completes, run returns whatever value the future produced.
We could pass the future returned by page_title directly to run. Once it completed, we would be able to match on the resulting Option<String>:
trpl::run(async {
let url = &args[1];
match page_title(url).await {
Some(title) => println!("The title for {url} was {title}"),
None => println!("{url} had no title"),
}
})
Each await point—that is, every place where the code uses the await keyword—represents a place where control gets handed back to the runtime. To make that work, Rust needs to keep track of the state involved in the async block, so that the runtime can kick off some other work and then come back when it is ready to try advancing this one again. This is an invisible state machine such as:
enum PageTitleFuture<'a> {
GetAwaitPoint {
url: &'a str,
},
TextAwaitPoint {
response: trpl::Response,
},
}
Writing that out by hand would be tedious and error-prone, especially when making changes to code later. Instead, the Rust compiler creates and manages the state machine data structures for async code automatically.
Ultimately, something has to execute that state machine. That something is a runtime. If main were an async function, something else would need to manage the state machine for whatever future main returned, but main is the starting point for the program! Instead, we use the trpl::run function, which sets up a runtime and runs the future returned by page_title until it returns Ready.
Let’s put these pieces together and see how we can write concurrent code, by calling page_title_for with two different URLs passed in from the command line and racing it.
use trpl::{Either, Html};
fn main() {
let args: Vec<String> = std::env::args().collect();
trpl::run(async {
let title_fut_1 = page_title(&args[1]);
let title_fut_2 = page_title(&args[2]);
let (url, maybe_title) =
match trpl::race(title_fut_1, title_fut_2).await {
Either::Left(left) => left,
Either::Right(right) => right,
};
println!("{url} returned first");
match maybe_title {
Some(title) => println!("Its page title is: '{title}'"),
None => println!("Its title could not be parsed."),
}
})
}
async fn page_title(url: &str) -> (&str, Option<String>) {
let text = trpl::get(url).await.text().await;
let title = Html::parse(&text)
.select_first("title")
.map(|title| title.inner_html());
(url, title)
}
Either future can legitimately “win,” so it does not make sense to return a Result. Instead, race returns a type we have not seen before, trpl::Either. The Either type is somewhat like a Result, in that it has two cases. Unlike Result, though, there is no notion of success or failure baked into Either. Instead, it uses Left and Right to indicate “one or the other”.
enum Either<A, B> {
Left(A),
Right(B)
}
The race function returns Left if the first argument finishes first, with that future’s output, and Right with the second future argument’s output if that one finishes first.