doc/executioner.txt

DRAFT! DRAFT! DRAFT! DRAFT! DRAFT! DRAFT! DRAFT! DRAFT! DRAFT! DRAFT! DRAFT!  

NOTICE: Some ideas in this paper aren't yet well sorted. Some ideas aren't
complete. Some phrasings I'm myself not happy with yet. Some ideas need
further explanation. Most of the ideas presented are not final yet. It is
mostly a braindump.

And did I say yet that this is still a DRAFT!!!!!! ?


Title: The Executioner
Author: Lars Marowsky-Brée
Acknowledgements: David Brower, Oracle
		  Alan Robertson, IBM


1. Summary

Every node runs an instance of the fencing daemon ("executioner"). This daemon
knows which fencing devices are currently reachable and which nodes can be
fenced by them - ie, the current topology of the fencing mechanisms - and will
execute such requests on behalf of the CRM and report success or failure.

A succesful fencing operation shall imply that the target node of the request
can no longer access any shared resources in the cluster, until it has
"properly rejoined the cluster".


2. Fencing topology information

(Note: modelled after the STONITH model by alanr in heartbeat)

2.1. Static configuration data

The mechanisms available for fencing need to be configured on each node.

Provision should be made that this file can be the same on all nodes (to ease
configuration deployment). It shall be made easy to configure a device for a
list of nodes or all nodes.

TODO: Can this configuration also be stored in the CIB configuration part?
This would collide with the concept that every node is the authoritive source
of information about itself.


2.2. Runtime topology

Every device needs to support a low-latency "ping" operation, which shall
verify whether it can currently be reached from the local node; this shall
preferrably not be affected by concurrent access.

The devices can either autodiscover their targets (ie, like via STONITH
devices), or have to provide means of configuring this list; the list shall
only be queried from the device on explicit request by the CRM.

The CRM shall be allowed to assume that the same device can fence the same set
of nodes for all clients.


3. Interaction with the CRM

3.1. Policies

The CRM will be responsible for retrying failed commands; the Executioner
shall only make exactly one attempt. It shall not retry the request on another
device in particular; it is permissible to retry the command on the same
device if it seems like an intermediate failure.


3.2. Queries/Commands issued

3.2.1. Device reachable

The Executioner shall verify whether the given device is still reachable by
the local node at this point in time.

The verification shall be low-latency and low weight and allow for concurrent
access from multiple nodes (if appropriate, ie for network switches).


3.2.2. Targets fenceable via device Y

The node shall contact the device and return the list of nodes which it can
fence.

The CRM will ensure that no other node in the partition is accessing device Y
right now.

Results:
	0	Success; list of targets included
	1	Failed; device not reached
	2	Failed; device failed to return list of targets


3.2.3. Fencing request to fence node X via device Y

This is a blocking, synchronous call. The CRM will ensure that no other node
in the partition is accessing device Y right now.

(This is an issue for certain network powerswitches)

The result code need to distinguish between:

	0	Fencing request succeeded
	1	Failed: device could not be reached, potential network issue
	2	Failed: device tried, but failed to acknowledge success
	3	Failed: interal device failure

TODO: So much differentiation really necessary? Yes, it can help identify
	quickly whether it is sensible to retry the fencing request via
	another node to the same device, or whether the next fencing device
	should be tried immediately; maybe that is overkill.