Adding cancelcontext to closing vmbus channels #734
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
shutdown_vf_devices(), the method where nic_shutdown() is called, doesn't have a CancelContext. So the wait for vmbus operations during unload_for_servicing() is unbounded. We have seen shutdown_mana() wait indefinitely.
shutdown_pci_devices() returns a ShutdownError. And something similar could be put in place for shutdown_vf_devices().
Note: There may be a hang in the future if the vmbus stays unresponsive. At best this change allows us to return an error from shutdown_vf_devices() so we know where in shutdown_mana() the delay is coming from.
Testing with a 1 ms duration fails, but leaves VM in a bad state
The vm sees a lot of disk errors on the vmbus. And
Stop-Vm
gets stuck.Output from ohcldiag:
Output from kmsg log:
Output on the VM:
Testing with a 1 second duration succeeds normally
Output from ohcldiag:
Output from kmsg log: