Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/retire dashboard #509

Merged
merged 8 commits into from
Feb 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
53 changes: 33 additions & 20 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,32 +10,35 @@
.. image:: https://readthedocs.org/projects/fedn/badge/?version=latest&style=flat
:target: https://fedn.readthedocs.io

FEDn is a modular and model agnostic framework for hierarchical
federated machine learning which scales from pseudo-distributed
development to real-world production networks in distributed,
heterogeneous environments. For more details see https://arxiv.org/abs/2103.00148.
FEDn is a modular and model agnostic framework for
federated machine learning. FEDn is designed to scale from pseudo-distributed
development on your laptop to real-world production setups in geographically distributed environments.

Core Features
=============

- **Scalable and resilient.** FEDn is highly scalable and resilient via a tiered
architecture where multiple aggregation servers (combiners) form a network to divide up the work to coordinate clients and aggregate models.
Recent benchmarks show high performance both for thousands of clients in a cross-device
setting and for large model updates (1GB) in a cross-silo setting.
FEDn has the ability to recover from failure in all critical components.

Benchmarks show high performance both for thousands of clients in a cross-device
setting and for large model updates in a cross-silo setting.
FEDn has the ability to recover from failure in all critical components.

- **Security**. A key feature is that
clients do not have to expose any ingress ports.

- **Track events and training progress in real-time**. FEDn tracks events for clients and aggregation servers, logging to MongoDB. This
helps developers monitor traning progress in real-time, and to troubleshoot the distributed computation.
Tracking and model validation data can easily be retrieved using the API enabling development of custom dashboards and visualizations.

- **Flexible handling of asynchronous clients**. FEDn supports flexible experimentation
with clients coming in and dropping out during training sessions. Extend aggregators to experiment
with different strategies to handle so called stragglers.

- **ML-framework agnostic**. Model updates are treated as black-box
computations. This means that it is possible to support any
ML model type or framework. Support for Keras and PyTorch is
available out-of-the-box.

- **Security**. A key feature is that
clients do not have to expose any ingress ports.

- **Track events and training progress**. FEDn logs events in the federation and tracks both training and validation progress in real time. Data is logged as JSON to MongoDB and a user can easily make custom dashboards and visualizations.

- **UI.** A Flask UI lets users see client model validations in real time, as well as track client training time distributions and key performance metrics for clients and combiners.

Getting started
===============

Expand All @@ -55,23 +58,33 @@ Clone this repository, locate into it and start a pseudo-distributed FEDn networ

docker-compose up

Navigate to http://localhost:8090. You should see the FEDn UI, asking you to upload a compute package. The compute package is a tarball of a project. The project in turn implements the entrypoints used by clients to compute model updates and to validate a model.
This starts up the needed backend services MongoDB and Minio, the API Server and one Combiner. You can verify deployment using these urls:

- API Server: localhost:8092
- Minio: localhost:9000
- Mongo Express: localhost:8081

Next, we will prepare the client. A key concept in FEDn is the compute package -
a code bundle that contains entrypoints for training and (optionally) validating a model update on the client.
The following steps uses the compute package defined in the example project 'examples/mnist-pytorch'.

Locate into 'examples/mnist-pytorch'.
Locate into 'examples/mnist-pytorch' and familiarize yourself with the project structure. The entrypoints
are defined in 'client/entrypoint'. The dependencies needed in the client environment are specified in
'requirements.txt'. For convenience, we have provided utility scripts to set up a virtual environment.

Start by initializing a virtual enviroment with all of the required dependencies for this project.

.. code-block::

bin/init_venv.sh

Now create the compute package and a seed model:
Next create the compute package and a seed model:

.. code-block::

bin/build.sh

Uploade the generated files 'package.tar.gz' and 'seed.npz' in the FEDn UI.
Uploade the generated files 'package.tgz' and 'seed.npz' using the API:

The next step is to configure and attach clients. For this we download data and make data partitions:

Expand All @@ -82,7 +95,7 @@ Download the data:
bin/get_data


Split the data in 2 parts for the clients:
Split the data in 2 partitions:

.. code-block::

Expand Down
20 changes: 0 additions & 20 deletions docker-compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -58,26 +58,6 @@ services:
ports:
- 8081:8081

dashboard:
environment:
- GET_HOSTS_FROM=dns
- USER=test
- PROJECT=project
- FLASK_DEBUG=1
- STATESTORE_CONFIG=/app/config/settings-reducer.yaml
build:
context: .
args:
BASE_IMG: ${BASE_IMG:-python:3.10-slim}
working_dir: /app
volumes:
- ${HOST_REPO_DIR:-.}/fedn:/app/fedn
entrypoint: [ "sh", "-c" ]
command:
- "/venv/bin/pip install --no-cache-dir -e /app/fedn && /venv/bin/fedn run dashboard -n reducer --init=config/settings-reducer.yaml"
ports:
- 8090:8090

api-server:
environment:
- GET_HOSTS_FROM=dns
Expand Down
16 changes: 12 additions & 4 deletions examples/async-simulation/Experiment.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,12 @@
},
{
"cell_type": "code",
"execution_count": 32,
"execution_count": 6,
"id": "743dfe47",
"metadata": {},
"outputs": [],
"source": [
"from fedn import APIClient\n",
"from fedn.dashboard.plots import Plot\n",
"from fedn.network.clients.client import Client\n",
"import uuid\n",
"import json\n",
Expand All @@ -40,7 +39,7 @@
},
{
"cell_type": "code",
"execution_count": 33,
"execution_count": 7,
"id": "1061722d",
"metadata": {},
"outputs": [],
Expand All @@ -60,7 +59,7 @@
},
{
"cell_type": "code",
"execution_count": 48,
"execution_count": 8,
"id": "5107f6f9",
"metadata": {},
"outputs": [],
Expand All @@ -80,7 +79,11 @@
},
{
"cell_type": "code",
<<<<<<< HEAD
"execution_count": 9,
=======
"execution_count": 74,
>>>>>>> master
"id": "f0380d35",
"metadata": {},
"outputs": [],
Expand Down Expand Up @@ -171,6 +174,11 @@
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
},
"vscode": {
"interpreter": {
"hash": "21345b455230dd04cf84c108e7c182ecfe8d1aa1242b8b64881a6d2c0a5951ac"
}
}
},
"nbformat": 4,
Expand Down
115 changes: 89 additions & 26 deletions examples/mnist-pytorch/API_Example.ipynb

Large diffs are not rendered by default.

100 changes: 2 additions & 98 deletions fedn/cli/run_cmd.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,8 @@

from fedn.common.exceptions import InvalidClientConfig
from fedn.common.log_config import logger
from fedn.dashboard.dashboard import Dashboard
from fedn.dashboard.restservice import decode_auth_token, encode_auth_token
from fedn.network.clients.client import Client
from fedn.network.combiner.combiner import Combiner
from fedn.network.storage.statestore.mongostatestore import MongoStateStore

from .main import main

Expand All @@ -33,7 +30,7 @@ def check_helper_config_file(config):
try:
helper = control["helper"]
except KeyError:
print("--local-package was used, but no helper was found in --init settings file.", flush=True)
logger.error("--local-package was used, but no helper was found in --init settings file.")
exit(-1)
return helper

Expand All @@ -49,7 +46,7 @@ def apply_config(config):
try:
settings = dict(yaml.safe_load(file))
except Exception:
print('Failed to read config from settings file, exiting.', flush=True)
logger.error('Failed to read config from settings file, exiting.')
return

for key, val in settings.items():
Expand Down Expand Up @@ -79,8 +76,6 @@ def run_cmd(ctx):

:param ctx:
"""
# if daemon:
# print('{} NYI should run as daemon...'.format(__file__))
pass


Expand Down Expand Up @@ -147,97 +142,6 @@ def client_cmd(ctx, discoverhost, discoverport, token, name, client_id, local_pa
client.run()


@run_cmd.command('dashboard')
@click.option('-h', '--host', required=False)
@click.option('-p', '--port', required=False, default='8090', show_default=True)
@click.option('-k', '--secret-key', required=False, help='Set secret key to enable jwt token authentication.')
@click.option('-l', '--local-package', is_flag=True, help='Enable use of local compute package')
@click.option('-n', '--name', required=False, default="reducer" + str(uuid.uuid4())[:8], help='Set service name')
@click.option('-in', '--init', required=True, default=None,
help='Set to a filename to (re)init reducer state from file.')
@click.pass_context
def dashboard_cmd(ctx, host, port, secret_key, local_package, name, init):
""" Start the dashboard service.

:param ctx: Click context.
:param discoverhost: Hostname for discovery services (dashboard).
:param discoverport: Port for discovery services (dashboard).
:param secret_key: Set secret key to enable jwt token authentication.
:param local_package: Enable use of local compute package.
:param name: Set service name.
:param init: Set to a filename to (re)init config state from file.
"""
remote = False if local_package else True
config = {'host': host, 'port': port, 'secret_key': secret_key,
'name': name, 'remote_compute_package': remote, 'init': init}

# Read settings from config file
try:
fedn_config = get_statestore_config_from_file(config['init'])
except Exception as e:
print('Failed to read config from settings file, exiting.', flush=True)
print(e, flush=True)
exit(-1)

if not remote:
_ = check_helper_config_file(fedn_config)

try:
network_id = fedn_config['network_id']
except KeyError:
print("No network_id in config, please specify the control network id.", flush=True)
exit(-1)

# Obtain state from database, in case already initialized (service restart)
statestore_config = fedn_config['statestore']
if statestore_config['type'] == 'MongoDB':
statestore = MongoStateStore(
network_id, statestore_config['mongo_config'])
else:
print("Unsupported statestore type, exiting. ", flush=True)
exit(-1)

statestore.set_storage_backend(fedn_config['storage'])

# Enable JWT token authentication.
if config['secret_key']:
# If we already have a valid token in statestore config, use that one.
existing_config = statestore.get_reducer()
if existing_config:
try:
existing_config = statestore.get_reducer()
current_token = existing_config['token']
status = decode_auth_token(current_token, config['secret_key'])
if status != 'Success':
token = encode_auth_token(config['secret_key'])
config['token'] = token
except Exception:
raise

else:
token = encode_auth_token(config['secret_key'])
config['token'] = token
try:
statestore.set_reducer(config)
except Exception:
print("Failed to set reducer config in statestore, exiting.", flush=True)
exit(-1)

# Configure storage backend.
try:
statestore.set_storage_backend(fedn_config['storage'])
except KeyError:
print("storage configuration missing in statestore_config.", flush=True)
exit(-1)
except Exception:
print("Failed to set storage config in statestore, exiting.", flush=True)
exit(-1)

dashboard = Dashboard(statestore)
dashboard.run()
logger.warning("The Dashboard is deprecated and will be removed in a future release.")


@run_cmd.command('combiner')
@click.option('-d', '--discoverhost', required=False, help='Hostname for discovery services (reducer).')
@click.option('-p', '--discoverport', required=False, help='Port for discovery services (reducer).')
Expand Down
3 changes: 0 additions & 3 deletions fedn/fedn/dashboard/__init__.py

This file was deleted.

Loading
Loading