-
Notifications
You must be signed in to change notification settings - Fork 176
Pilots 2.0: generic, configurable pilots
This is a proposal for a refactoring of the "pilots" code. The goal of this proposal is, first of all, to make today's pilots easy to configure, and easy to extend.
We start with a definition of pilot. A pilot is what creates the possibility to run jobs on a worker node. A pilot can be sent, as a script to be run. Or, it can be fetched. A pilot can run on every computing resource, e.g.: CREAM Computing element, DIRAC Computing element, Virtual Machine in the form of contextualization script.
A pilot has, at a minimum, to:
- install DIRAC
- configure DIRAC
- run the JobAgent
A pilot has to run on each and every computing resource type.
The current solution lacks extensibility, and maintainability. Its building block is the script dirac-pilot.py, which uses the script dirac-install.py for the installation. Communities can define their own pilot scripts, but it is currently impossible to:
- add capabilities to the current pilot while inheriting from the base one
- define different pilots based on the type of the computing resource.
We propose a solution where:
- the pilot script is generated, at runtime, server-side
- a toolbox of pilots capabilities (that we will call "commands") is available for generating the pilot script
- each command implement a single, atomic, functions, e.g.:
- run tests
- install DIRAC
- configure DIRAC
- run JobAgent
- run monitoring agent
- report usage
- ... and whatever it is needed
- VOs can easily extend the content of the toolbox, adding more commands
- different computing resource types can run different pilots
The proposed solution requires that each command follows a coding convention, and can be found in a specific part of the code, e.g. "DIRAC.WorkloadManagementSystem.Command.MyCommand":
class MyCommand(object): def do(): """ Here, we specify what the command do """