Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT] nvme_driver: fix overflow computing queue size and don't wait for bad devices forever #682

Draft
wants to merge 12 commits into
base: main
Choose a base branch
from
Draft
4 changes: 4 additions & 0 deletions vm/devices/storage/disk_nvme/nvme_driver/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@ name = "nvme_driver"
edition = "2021"
rust-version.workspace = true

[features]
# Reduced timeouts for faster fuzzing
fuzz_timeout = []

[dependencies]
inspect = { workspace = true, features = ["defer"] }
inspect_counters.workspace = true
Expand Down
2 changes: 1 addition & 1 deletion vm/devices/storage/disk_nvme/nvme_driver/fuzz/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ guestmem.workspace = true
guid.workspace = true
inspect.workspace = true
nvme.workspace = true
nvme_driver.workspace = true
nvme_driver = { workspace = true, features = ["fuzz_timeout"] }
nvme_spec.workspace = true
pal_async.workspace = true
parking_lot.workspace = true
Expand Down
4 changes: 2 additions & 2 deletions vm/devices/storage/disk_nvme/nvme_driver/src/driver.rs
Original file line number Diff line number Diff line change
Expand Up @@ -399,8 +399,8 @@ impl<T: DeviceBacking> NvmeDriver<T> {
let max_io_queues = allocated_io_queue_count.min(requested_io_queue_count);

let qsize = {
let io_cqsize = QueuePair::MAX_CQ_ENTRIES.min(worker.registers.cap.mqes_z() + 1);
let io_sqsize = QueuePair::MAX_SQ_ENTRIES.min(worker.registers.cap.mqes_z() + 1);
let io_cqsize = QueuePair::MAX_CQ_ENTRIES.min(worker.registers.cap.mqes_z().saturating_add(1));
let io_sqsize = QueuePair::MAX_SQ_ENTRIES.min(worker.registers.cap.mqes_z().saturating_add(1));

// Some hardware (such as ASAP) require that the sq and cq have the same size.
io_cqsize.min(io_sqsize)
Expand Down
13 changes: 13 additions & 0 deletions vm/devices/storage/disk_nvme/nvme_driver/src/registers.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ use inspect::Inspect;
use pal_async::driver::Driver;
use std::sync::atomic::AtomicBool;
use std::sync::atomic::Ordering::Relaxed;
use std::time::Duration;
use std::time::Instant;
use tracing::instrument;
use user_driver::backoff::Backoff;
use user_driver::DeviceBacking;
Expand Down Expand Up @@ -108,6 +110,14 @@ impl<T: DeviceRegisterIo + Inspect> Bar0<T> {
let cc = self.cc().with_en(false);
self.set_cc(cc);
let mut backoff = Backoff::new(driver);
let timeout_duration = self.cap().to() as u64 * 500;
#[cfg(feature="fuzz_timeout")]
{
timeout_duration = 10;
}
let start = Instant::now();
let timeout = Duration::from_millis(timeout_duration);

loop {
let csts = self.csts();
if !csts.rdy() {
Expand All @@ -116,6 +126,9 @@ impl<T: DeviceRegisterIo + Inspect> Bar0<T> {
if u32::from(csts) == !0 {
break false;
}
if start.elapsed() >= timeout {
break false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The caller seems to rely on this actually resetting the device before it frees buffers. With this change, you might now return with the device still referencing these buffers. And that can lead to memory corruption.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

            // Hold onto responses until the reset completes so that waiting IOs do
            // not think the memory is unaliased by the device.

}
backoff.back_off().await;
}
}
Expand Down
Loading