From fa4b928a562a96cf2ae22002627703816efcac41 Mon Sep 17 00:00:00 2001 From: Andreas Hellander Date: Thu, 4 Jul 2024 15:09:20 +0200 Subject: [PATCH] Improved quickstart --- docs/architecture.rst | 2 + docs/index.rst | 4 +- docs/projects.rst | 22 ++++++---- docs/quickstart.rst | 97 ++++++++++++++++++++++++++++--------------- 4 files changed, 80 insertions(+), 45 deletions(-) diff --git a/docs/architecture.rst b/docs/architecture.rst index a820e7e20..85e2430da 100644 --- a/docs/architecture.rst +++ b/docs/architecture.rst @@ -1,3 +1,5 @@ +.. _architecture-label: + Architecture overview ===================== diff --git a/docs/index.rst b/docs/index.rst index 04ff85b07..7cda27188 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -3,13 +3,13 @@ :caption: Introduction introduction + quickstart + projects .. toctree:: :maxdepth: 1 :caption: Documentation - quickstart - projects studio apiclient architecture diff --git a/docs/projects.rst b/docs/projects.rst index 507b0c610..1c9a078e2 100644 --- a/docs/projects.rst +++ b/docs/projects.rst @@ -1,8 +1,11 @@ .. _projects-label: -Creating your own FEDn Projects +Building your own projects ================================================ +This guide explains how a FEDn project is structured, and details how to develop your own +projects for your own use-cases. + A FEDn project is a convention for packaging/wrapping machine learning code to be used for federated learning with FEDn. At the core, a project is a directory of files (often a Git repository), containing your machine learning code, FEDn entry points, and a specification of the runtime environment (python environment or a Docker image). The FEDn API and command-line tools provides functionality @@ -28,9 +31,9 @@ We recommend that projects have roughly the following folder and file structure: | └ Dockerfile / docker-compose.yaml | -The "client" folder is referred to as the *compute package*. The file fedn.yaml is the FEDn Project File. It informs the FEDn Client of the code entry points to execute when computing model updates (local training) and validating models (optionally) . -When deploying the project to FEDn, the client folder will be compressed as a .tgz bundle and uploaded to the FEDn controller. FEDn can then manage the distribution of the compute package to each client/data provider when they connect. -Upon recipt of the bundle, the client will unpack it and stage it locally. +The ``client`` folder is commonly referred to as the *compute package*. The file ``fedn.yaml`` is the FEDn Project File. It contains information about the ``entry points``. The entry points are used by the client to compute model updates (local training) and local validations (optional) . +To run a project in FEDn, the client folder is compressed as a .tgz bundle and pushed to the FEDn controller. FEDn then manages the distribution of the compute package to each client. +Upon recipt of the package, a client will unpack it and stage it locally. .. image:: img/ComputePackageOverview.png :alt: Compute package overview @@ -62,11 +65,12 @@ what environment to execute those entrypoints in. Environment ^^^^^^^^^^^ - -The software environment to be used to exectute the entry points. This should specify all client side dependencies of the project. -FEDn currently supports Virtualenv environments, with packages on PyPI. When a project specifies a **python_env**, the FEDn -client will create an isolated virtual environment and install the project dependencies into it before starting up the client. +It is assumed that all entry points are executable within the client runtime environment. As a user, you have two main options +to specify the environment: + + 1. Provide a ``python_env`` in the ``fedn.yaml`` file. In this case, FEDn will create an isolated virtual environment and install the project dependencies into it before starting up the client. FEDn currently supports Virtualenv environments, with packages on PyPI. + 2. Manage the environment manually. Here you have several options, such as managing your own virtualenv, running in a Docker container, etc. Remove the ``python_env`` tag from ``fedn.yaml`` to handle the environment manually. Entry Points ^^^^^^^^^^^^ @@ -75,7 +79,7 @@ There are up to four Entry Points to be specified. **Build Entrypoint (build, optional):** -This entrypoint is usually called **once** for building artifacts such as initial seed models. However, it not limited to artifacts, and can be used for any kind of setup that needs to be done before the client starts up. +This entrypoint is intended to be called **once** for building artifacts such as initial seed models. However, it not limited to artifacts, and can be used for any kind of setup that needs to be done before the client starts up. **Startup Entrypoint (startup, optional):** diff --git a/docs/quickstart.rst b/docs/quickstart.rst index 65e22d8ad..1c880ece4 100644 --- a/docs/quickstart.rst +++ b/docs/quickstart.rst @@ -2,7 +2,7 @@ Getting started with FEDn ========================= .. note:: - This tutorial is a quickstart guide to FEDn based on a pre-made FEDn Project. It is designed to serve as a minimalistic starting point for developers. + This tutorial is a quickstart guide to FEDn based on a pre-made FEDn Project. It is designed to serve as a starting point for new developers. To learn about FEDn Projects in order to develop your own federated machine learning projects, see :ref:`projects-label`. **Prerequisites** @@ -11,7 +11,7 @@ Getting started with FEDn - `A FEDn Studio account `__ -Set up a FEDn Studio Project +Start a FEDn Studio Project ---------------------------- Start by creating an account in Studio. Head over to `fedn.scaleoutsystems.com/signup `_ and sign up. @@ -23,10 +23,13 @@ Logged into Studio, do: 3. Enter the project name (mandatory). The project description is optional. 4. Click the "Create" button to create the project. -You Studio project provides all server side components. Next, you will set up your local machine / client and create a FEDn project. +.. image:: img/studio_project_overview.png -Install FEDn ------------- +When these steps are complete, you will see a Studio project similar to the above image. The Studio project provides all server side components of FEDn needed to manage +federated training. We will use this project in a later stage to run the federated experiments. But first, we will set up the local client. + +Install FEDn on your client +---------------------------- **Using pip** @@ -50,35 +53,45 @@ It is recommended to use a virtual environment when installing FEDn. .. _package-creation: -Initialize FEDn with the client code bundle and seed model ----------------------------------------------------------- -Next, we will prepare the client. The key part of a FEDn Project is the client definition - -code that contains entrypoints for training and (optionally) validating a model update on the client. +Create the compute package and seed model +-------------------------------------------- + +Next, we will prepare the client. For illustrative purposes, we use one of the pre-defined projects in the FEDn repository, ``minst-pytorch``. -Locate into ``examples/mnist-pytorch`` and familiarize yourself with the project structure. The dependencies needed in the client environment are specified -in ``client/python_env.yaml``. +In order to train a federated model using FEDn, your Studio project needs to be initialized with a ``compute package`` and a ``seed model``. The compute package is a code bundle containing the +code used by the client to execute local training and local validation. The seed model is a first version of the global model. -In order to train a federated model using FEDn, your Studio project needs to be initialized with a compute package and a seed model. The compute package is a bundle -of the client specification, and the seed model is a first version of the global model. +Locate into ``examples/mnist-pytorch`` folder in the cloned fedn repository. The compute package is located in the folder ``client``. -Create a package of the fedn project (assumes your current working directory is in the root of the project /examples/mnist-pytorch): +Create a package of the fedn project. Standing in ``examples/mnist-pytorch``: .. code-block:: fedn package create --path client -This will create a package called 'package.tgz' in the root of the project. +This will create a package called ``package.tgz`` in the root of the project. -Next, run the build entrypoint defined in ``client/fedn.yaml`` to build the model artifact. +Next, create the seed model: .. code-block:: fedn run build --path client -This will create a seed model called 'seed.npz' in the root of the project. We will now upload these to your Studio project using the FEDn APIClient. +This will create a seed model called ``seed.npz`` in the root of the project. We will now upload these to your Studio project using the FEDn APIClient. + +For a detailed explaination of the FEDn Project with instructions for how to create your own project, see this guide: :ref:`projects-label` + +Initialize your FEDn Studio Project +------------------------------------ + +In the Studio UI, navigate to the project you created above and click on the "Sessions" tab. Click on the "New Session" button. Under the "Compute package" tab, select a name and upload the generated package file. Under the "Seed model" tab, upload the generated seed file: -**Upload the package and seed model** +.. image:: img/upload_package.png + +**Upload the package and seed model using the Python APIClient** + +It is also possible to upload a package and seed model using the Python API Client. .. note:: You need to create an API admin token and use the token to authenticate the APIClient. @@ -100,9 +113,9 @@ Upload the package and seed model using the APIClient: Configure and attach clients ---------------------------- -Each local client needs an access token in order to connect. These tokens are issued from your Studio Project. Go to the 'Clients' tab and click 'Connect client'. -Download a client configuration file and save it to the root of the examples/mnist-pytorch folder. Rename the file to 'client.yaml'. -Then start the client by running the following command in the root of the project: +Each local client needs an access token in order to connect. These tokens are issued from your Studio Project. Go to the Clients' tab and click 'Connect client'. +Download a client configuration file and save it to the root of the ``examples/mnist-pytorch folder``. Rename the file to 'client.yaml'. +Then start the client by running the following command: .. code-block:: @@ -110,7 +123,7 @@ Then start the client by running the following command in the root of the projec Repeat the above for the number of clients you want to use. A normal laptop should be able to handle several clients for this example. -**Modifying the data split:** +**Modifying the data split (multiple-clients, optional):** The default traning and test data for this example (MNIST) is for convenience downloaded and split by the client when it starts up (see 'startup' entrypoint). The number of splits and which split is used by a client can be controlled via the environment variables ``FEDN_NUM_DATA_SPLITS`` and ``FEDN_DATA_PATH``. @@ -138,7 +151,21 @@ For example, to split the data in 10 parts and start a client using the 8th part Start a training session ------------------------ -You are now ready to start training the model using the APIClient: +In Studio click on the "Sessions" link, then the "New session" button in the upper right corner. Click the "Start session" tab and enter your desirable settings (or use default) and hit the "Start run" button. In the terminal where your are running your client you should now see some activity. When the round is completed, you can see the results in the FEDn Studio UI on the "Models" page. + +**Watch the training progress** + +Once a training session is started, you can monitor the progress of the training by navigating to "Sessions" and click on the "Open" button of the active session. The session page will list the models as soon as they are generated. To get more information about a particular model, navigate to the model page by clicking the model name. From the model page you can download the model weights and get validation metrics. + +To get an overview of how the models have evolved over time, navigate to the "Models" tab in the sidebar. Here you can see a list of all models generated across sessions along with a graph showing some metrics of how the models are performing. + +.. image:: img/studio_model_overview.png + +.. _studio-api: + +**Control training sessions using the Python APIClient** + +You can also issue training sessions using the APIClient: .. code:: python @@ -153,12 +180,7 @@ You are now ready to start training the model using the APIClient: >>> validations = client.get_validations(model_id=model_id) -Please see :py:mod:`fedn.network.api` for more details on the APIClient. - -.. note:: - - In FEDn Studio, you can start a training session by going to the 'Sessions' tab and click 'Start session'. See :ref:`studio` for a - step-by-step guide for how to control experiments using the UI. +Please see :py:mod:`fedn.network.api` for more details on how to use the APIClient. Access model updates -------------------- @@ -167,7 +189,7 @@ Access model updates In FEDn Studio, you can access global model updates by going to the 'Models' or 'Sessions' tab. Here you can download model updates, metrics (as csv) and view the model trail. -You can access global model updates via the APIClient: +You can also access global model updates via the APIClient: .. code:: python @@ -175,7 +197,8 @@ You can access global model updates via the APIClient: >>> client.download_model("", path="model.npz") -**Connecting clients using Docker** +Connecting clients using Docker +-------------------------------- You can also use Docker to containerize the client. For convenience, there is a Docker image hosted on ghrc.io with fedn preinstalled. @@ -188,12 +211,18 @@ To start a client using Docker: -e FEDN_PACKAGE_EXTRACT_DIR=package \ -e FEDN_NUM_DATA_SPLITS=2 \ -e FEDN_DATA_PATH=/app/package/data/clients/1/mnist.pt \ - ghcr.io/scaleoutsystems/fedn/fedn:0.9.0 run client -in client.yaml --force-ssl --secure=True + ghcr.io/scaleoutsystems/fedn/fedn:0.10.0 run client -in client.yaml --force-ssl --secure=True -**Where to go from here?** +Where to go from here? +------------------------ -With you first FEDn federation set up, we suggest that you take a close look at how a FEDn project is structured +With you first FEDn federated project set up, we suggest that you take a close look at how a FEDn project is structured and how you develop your own FEDn projects: - :ref:`projects-label` + +You can also dive into the architecture overview to learn more about how FEDn is designed and works under the hood: +- :ref:`architecture-label` + +