-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drive failure tests #69
Conversation
drivers/node/ssh/ssh.go
Outdated
} | ||
|
||
driveID = strings.TrimRight(driveID, "\n") | ||
driveNameToFail = strings.TrimRight(strings.TrimLeft(driveNameToFail, "/"), "/") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could use just Trim to delete trailing and leading characters
} | ||
|
||
// Enable the drive by rescaning | ||
recoverCmd := "echo \" - - -\" > /sys/class/scsi_host/host" + driveUUIDToRecover + "/scan" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this have a '/' after host and before driverUUID
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No.
It is no a subdirectory under host, but it appends it to host to build the correct directory.
drivers/volume/portworx/portworx.go
Outdated
resourcesKey = "Resources" | ||
pathKey = "path" | ||
) | ||
pxNode, err := d.getClusterManager().Inspect(n.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you will have to use VolDriverNodeId instead of Name after my change because Name is the scheduler ID/Name for the node
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch. I didn't run the tests after master rebase
drivers/volume/portworx/portworx.go
Outdated
) | ||
pxNode, err := d.getClusterManager().Inspect(n.Name) | ||
if err != nil { | ||
return []string{}, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could just be nil, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
drivers/volume/portworx/portworx.go
Outdated
|
||
resourcesMapIntf, ok := storageInfoMap[resourcesKey] | ||
if !ok || resourcesMapIntf == nil { | ||
return []string{}, fmt.Errorf("Unable to find resource info for node: %v", n.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here.. can we send nil?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
drivers/volume/portworx/portworx.go
Outdated
} | ||
resourcesMap := resourcesMapIntf.(map[string]interface{}) | ||
|
||
for _, v := range resourcesMap { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we move the devPaths slice above this for loop from line 167.. it seems out of place there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved devPaths down in the function
} | ||
|
||
var _ = Describe("Induce drive failure on of the nodes", func() { | ||
driveFailureTest("drivefailure") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: instead of a separate method we could just put the 'it' block in this 'describe'
In reboot its done so because two Describes are using the same method with different params
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the wrapper function.
InitInstance() | ||
}) | ||
|
||
var _ = Describe("Induce drive failure on of the nodes", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo in description
appNodes, err = Inst().S.GetNodesForApp(ctx) | ||
Expect(err).NotTo(HaveOccurred()) | ||
Expect(appNodes).NotTo(BeEmpty()) | ||
nodeWithDrive = appNodes[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of picking the first one, we can do this on a quorum on nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a reason why we need to fail drives on more than one node.
What I can check for is if the node whose drive I am failing has the replica of the volume being used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. We could end up off-lining drives on a node which doesn't have the replica of the volume.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently it assumes that the volume replica is 3 and the cluster size is 3 (default settings).
I have created an issue to add a new API to handle the generic case where replica may not exist on all nodes - #83
drives, err = Inst().V.GetStorageDevices(nodeWithDrive) | ||
Expect(err).NotTo(HaveOccurred()) | ||
Expect(drives).NotTo(BeEmpty()) | ||
driveToFail = drives[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why just the first drive and not all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add code to fail all the drives on one node.
…ives. - Add two new APIs in node driver -YankDrive and RecoverDrive - Add implementations for ssh driver. - Add two new APIs in volume driver - GetStorageDevices and RecoverDriver.
dd1230f
to
89657a4
Compare
Add utilities in node driver and volume driver to Fail and Recover drives.
Add a ginkgo test to induce a drive failure and check apps.
Drive Failure implementation for AWS coming soon.