Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs/SK-646 | Docs for aggregator and helper plugin #507

Closed
wants to merge 279 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
279 commits
Select commit Hold shift + click to select a range
3b7d319
Improve logs
Jun 7, 2022
717541b
Always print logs
Jun 7, 2022
23ef432
Fix PyTorch example data mount path in compose file
Jun 7, 2022
82b74c0
Merge branch 'bugfix/torchexample' into feature/ci-matrix
Jun 7, 2022
04b78f3
mets many python versions
Jun 7, 2022
160ee02
quotes
Jun 7, 2022
e4ed2d4
Fix CI sleep time
Jun 8, 2022
d547b94
Add Python versiong
Jun 8, 2022
463d48a
don't fail fast
Jun 8, 2022
db384b4
remove python 3.10
Jun 8, 2022
9df12c3
remove python 3.10
Jun 8, 2022
6355ba8
fix numpy for py 3.7
Jun 8, 2022
239f82f
Merge branch 'feature/ci-matrix' into feature/inference
Jun 8, 2022
30370f8
Merge branch 'bugfix/ci-time' into feature/inference
Jun 8, 2022
cb8b362
Merge branch 'develop' into feature/inference
Jun 8, 2022
61e14d1
Merge branch 'develop' into feature/inference
Jun 9, 2022
1c556c0
Merge branch 'develop' into feature/inference
Jun 10, 2022
976e12d
Inference CI
Jun 10, 2022
18de033
minor
Jun 10, 2022
ca5f809
fix
Jun 10, 2022
3051550
fix
Jun 10, 2022
9fb4b01
fix
Jun 10, 2022
8f84f3c
fix
Jun 10, 2022
9dddc02
fix
Jun 10, 2022
98c5aba
fix
Jun 10, 2022
ed3b3bd
fix
Jun 10, 2022
7a457de
reduce CI time
Jun 13, 2022
f1be8c5
update and fix conflicts
Jun 15, 2022
d1904c1
fix conflict
Jun 15, 2022
04316a6
upstream updates
Jul 4, 2022
a9ea496
Initial implementation toggle ssl for REST service
Jul 6, 2022
5a20932
Removed unused reducer inference interface mockup
Jul 6, 2022
da9d64d
Removed geoip2 dependency
Jul 6, 2022
175e3d9
Dockerfile update, install developer tools
Jul 6, 2022
0b95ebf
Draft implementation
Jul 7, 2022
d2920bb
Update and fix conflicts
Jul 7, 2022
d45bed8
update and fix conflict
Jul 7, 2022
1b4eec8
Merge branch 'develop' into feature/toggle-ssl
Jul 7, 2022
7e954b0
Remove mocked inference endpoint in restservice
Jul 8, 2022
37e522d
Develop (#418)
ahellander Jul 11, 2022
59890e2
fix code-checks
Wrede Jul 11, 2022
97b556b
insecure mode in ci (http)
Wrede Jul 11, 2022
0e24072
secure option to package download and checksum
Wrede Jul 11, 2022
a802c4c
work in progress
Jul 11, 2022
9e3902b
merge conflicts
Jul 11, 2022
c831508
fix flake8 warning
Wrede Jul 12, 2022
3917779
Merge branch 'feature/toggle-ssl' of https://github.com/scaleoutsyste…
Jul 13, 2022
90396e0
Remove Talisman
Jul 13, 2022
3981f83
bugfix, combiner now correctly uses secure flag in connector
Jul 13, 2022
53ef114
Revert accidetal change to compose file
Jul 13, 2022
2c29009
sort import
Jul 13, 2022
3ebf8d9
Changed combiner ssl default config to False
Jul 13, 2022
b771ed2
Fixed code checks
Jul 14, 2022
be0a726
Code checks
Jul 14, 2022
eecc65f
Add docstings in connecy.py
Jul 14, 2022
c4555f1
Add docstings in certificatemanager
Jul 14, 2022
637a7be
Docstrings
Jul 28, 2022
48c8dea
Changed some parameter names in reducer CLI
Jul 28, 2022
884ef41
Default no-ssl for REST, ssl for gRPC
Aug 2, 2022
37b66f9
Fix code check
Aug 3, 2022
0741607
Harmoize option names between combiner and reducer
Aug 3, 2022
9f2983e
Add help text for combiner options
Aug 3, 2022
79c088a
Make --secure option flag
Aug 3, 2022
81ea77e
Works to disable secure grpc
Aug 10, 2022
9304725
Added back use of copy
Aug 12, 2022
0f0a3d7
Remove possibility to generate cert for reducer
Aug 15, 2022
09c568f
Default to insecure gRPC setting
Aug 15, 2022
91adabe
Fix code scanning alerts
Aug 15, 2022
e3252eb
Initial refactor
Aug 16, 2022
d22bbd4
Initial refactor reducer
Aug 18, 2022
5d7abee
Introduce base class for controller
Aug 18, 2022
612dd75
More refactoring and cleaning
Aug 26, 2022
1b8c0ab
refactored look-aside loadbalancer
Aug 29, 2022
2f639a1
Refactored load-balancer
Aug 30, 2022
fe61751
Fixed code checks
Aug 31, 2022
63fb1f1
latest
Aug 31, 2022
b32cd34
work in progress
Sep 1, 2022
33ea7a8
Resolved conflicts
Sep 5, 2022
a472bab
Fixed code checks
Sep 5, 2022
e4be8cb
Update control page
Sep 5, 2022
be12df7
added metadata field to modelupdaterequest
Sep 14, 2022
1c83ac2
Client passes on metadata dict with model update
Sep 15, 2022
b9e4980
Latest
Sep 16, 2022
e2295a8
Latest
Sep 19, 2022
47a0409
Merge branch 'develop' into feature/refactor-control
Oct 3, 2022
0941195
latest
Oct 3, 2022
e60ec65
Resolve conflict
Oct 3, 2022
5ead760
Refactor aggregation
Oct 17, 2022
cd28882
Fix
Oct 17, 2022
f9f4321
Merge branch 'bugfix/430' into feature/429
Oct 17, 2022
1be171a
Add docstring for load_model_update
Oct 17, 2022
b70347b
Extract model update metadata and make available in aggregator
Oct 17, 2022
aabaac3
Added some docstrings
Oct 17, 2022
85d58b3
More docstrings
Oct 17, 2022
e54024d
Renamed aggregator files and base class
Oct 17, 2022
4cba0e6
suppress LOG status messages in stdout
Oct 17, 2022
1bed1aa
Introduce policy for when to trigger aggregation at combiner
Oct 17, 2022
d61f256
Latest
Oct 23, 2022
cc71a7b
Merge branch 'develop' into feature/429
Oct 23, 2022
1e770a3
Added files
Oct 24, 2022
580cb4e
Fixes
Oct 30, 2022
83086d2
Fixed broken congig file generation.
Oct 31, 2022
897ea39
Added option to parse client name from config file
Oct 31, 2022
1036a49
Flattened client config file, generalized so that all settings can be…
Nov 1, 2022
7936c4e
Fixed file generation
Nov 1, 2022
513f010
Resolved conflict
Nov 1, 2022
b66d1d9
Latest
Nov 1, 2022
1fad8d0
Updated config template
Nov 1, 2022
aee5e2a
Merge branch 'feature/438' into feature/429
Nov 2, 2022
8b1a595
Resolved conflict
Jan 25, 2023
1ae52ff
Removed mongotracing in control, will refactor to have all tracing da…
Jan 26, 2023
ab9cec9
Refactored combiner job submit
Jan 26, 2023
3a867a2
Remove psutil tracing
Jan 26, 2023
2b3098c
Refactor tracer
Jan 26, 2023
016275c
cleaning
Jan 26, 2023
758c551
get latest round refactored
Jan 26, 2023
1c6319c
Enable early termination by default
Jan 27, 2023
13379b9
Removed unused round_config object
Jan 29, 2023
e0ff053
Remove printout of sensitive information
Jan 30, 2023
858ccce
Remove old control, make new version default
Jan 30, 2023
61da4ea
Remove unused code
Jan 30, 2023
112de4d
Changed default name for fedn network in config template
Jan 30, 2023
82a8ee8
Cleaning, docstrings
Feb 8, 2023
9130b8e
bugfix
Feb 8, 2023
9f01ca0
Variable name changes
Feb 8, 2023
aee9cef
Removed old combine models implementation
Feb 8, 2023
981703d
bugfix
Feb 8, 2023
85bca23
Add a hook to validate the model update before putting it on the aggr…
Feb 8, 2023
2718797
Validate metadata on model 'update
Feb 8, 2023
88ce477
Validate metadata on model 'update
Feb 8, 2023
c970571
incremental weighted average in new style aggregator
Feb 11, 2023
27362da
small cleaning in control form
Feb 11, 2023
b8058dc
Added instructions in controller form, rearranged menu items
Feb 11, 2023
4733d34
Merge pull request #1 from scaleoutsystems/feature/inference
ahellander Feb 16, 2023
1ce1094
latest
Feb 16, 2023
cb386a8
started mergin
Feb 16, 2023
745a7d8
Resolve merge conflicts
Feb 16, 2023
47c0497
Added back accidentally removed file
Feb 17, 2023
c03146b
Conflict resolution
Feb 17, 2023
678716f
Remove unused readme file
Feb 17, 2023
de43d59
More merging
Feb 17, 2023
f2eaf58
latest
Feb 21, 2023
087fb63
Fixed round_config regression
Feb 23, 2023
e2bd997
Controller polls db instead of combiners
Feb 23, 2023
746475e
More api docs
Feb 27, 2023
b096aff
Add infer_instruct
Feb 27, 2023
b427d5b
Cleaning
Feb 27, 2023
2d1f213
Added training metadata for keras example
Mar 6, 2023
7470f04
work in progress db cleanup
Mar 6, 2023
834a342
Refactor
Mar 7, 2023
4f80eef
More refactoring in db backend
Mar 13, 2023
dd14149
Remove 'control' setting from reducer config file
Mar 13, 2023
148e98c
Flatten combiner config
Mar 13, 2023
2cdf437
Flatten combiner config
Mar 13, 2023
25d3149
Flatten combiner config
Mar 13, 2023
f31ab71
Harmonize CLI option names
Mar 13, 2023
d7eeb62
Refactor helpers
Mar 13, 2023
199de56
Refactor helpers
Mar 14, 2023
5d0f125
Merge branch 'master' into feature/refactor
Mar 14, 2023
81138d7
Refactor helpers
Mar 22, 2023
b3cd90e
Refactor helpers
Mar 22, 2023
3ee493b
Refactor helpers
Mar 23, 2023
a2f0e96
Plugin arch for helpers
Mar 27, 2023
b3fed84
Updated UI config
Mar 27, 2023
a59215d
Raise exception if misconfigured helper
Mar 28, 2023
59fdb38
Added tracing of sessions in the db
Mar 28, 2023
209c4dc
Update version to 0.5-dev
Mar 31, 2023
b6f3879
Merge branch 'develop' into feature/refactor
ahellander Apr 9, 2023
dc265c7
Updated torch version
Apr 11, 2023
287b63c
resolved conflict
Apr 11, 2023
62426d3
Updated torch version
Apr 11, 2023
e020bbf
bugfix
Apr 14, 2023
d475f2f
Skip osx tests
Apr 16, 2023
8416030
latest
May 9, 2023
cf96344
change helper name
Wrede May 9, 2023
f719e83
fix formatting and syntax
Wrede May 9, 2023
25800ab
fix formatting and syntax errors
Wrede May 9, 2023
3db2cca
Resolved conflicts
May 15, 2023
d8a7b16
Merge branch 'develop' of github.com:scaleoutsystems/fedn into featur…
Wrede May 15, 2023
5ea7e74
update ci new db
Wrede May 16, 2023
5521e20
Merge branch 'feature/refactor' of https://github.com/scaleoutsystems…
May 17, 2023
44fc3a3
fix round_id key and equal weight to reduce models
Wrede May 17, 2023
5d86b11
save helper for metrics and metadata
Wrede May 17, 2023
9a96996
merge conflict
May 17, 2023
2c4475b
improve readability and add test for fedavg
Wrede May 17, 2023
7e77ad9
update doc strings for client and combiner
Wrede May 19, 2023
857b80a
Merge branch 'feature/refactor' of https://github.com/scaleoutsystems…
May 23, 2023
8f99e44
Resolve conflict
May 23, 2023
cf1e8bf
formatting
May 25, 2023
b3c6316
add id to logging
Wrede May 29, 2023
3554c1f
Merge branch 'feature/refactor' of github.com:scaleoutsystems/fedn in…
Wrede May 29, 2023
a45a722
extra logging and doc strings
Wrede May 29, 2023
942e1e0
work in progress
Aug 16, 2023
d4734d9
Refactor of controller
Aug 18, 2023
3775021
Refactor of controller
Aug 19, 2023
d1c0474
Refactor polling in control
Aug 21, 2023
6fe98b2
Refactor polling in control
Aug 21, 2023
3fd0b29
Refactor polling in control
Aug 22, 2023
c5d5df3
Merge conflicts
Aug 30, 2023
c5ef76c
Functioning
Aug 30, 2023
e7894bc
start on new simulation example
Sep 25, 2023
a0f8218
Merge branch 'master' of https://github.com/scaleoutsystems/fedn
Sep 25, 2023
30f3da0
Merge branch 'develop' into feature/SK-521
Oct 16, 2023
7847486
update
Oct 17, 2023
a8ffd6c
Updated test
Oct 19, 2023
f70e1d3
Fix typos
Oct 19, 2023
4b7c28e
Removed accidentally committed files
Oct 19, 2023
26fe052
update api
Oct 19, 2023
566fdc4
added new async-simulation example
Oct 21, 2023
e62709a
rename example
Oct 21, 2023
4262dea
latest
Oct 25, 2023
06cd806
Updates after code review
Oct 30, 2023
375b8c1
Merge branch 'feature/SK-521' into feature/cross-device-simulation
Oct 30, 2023
4484b67
Resolve conflict
Oct 30, 2023
ad0a7b0
Merge branch 'develop' into feature/cross-device-simulation
Oct 30, 2023
ab6f95f
Resolved merge conflicts
Nov 2, 2023
9a21972
Resolved merge conflicts
Nov 2, 2023
adca222
Updated docstrings
Nov 2, 2023
0bd0a5d
Fixed docstrings
Nov 4, 2023
142df57
Fixes
Nov 4, 2023
aa6ab19
Fixed code check
Nov 4, 2023
5494b1c
use setter
Nov 5, 2023
7d51935
Merge branch 'feature/SK-521' into feature/cross-device-simulation
Nov 5, 2023
b8444c6
latest
Nov 8, 2023
6c2a1e1
removed script for combiners
Nov 8, 2023
6709019
Merge branch 'master' of https://github.com/scaleoutsystems/fedn
Nov 14, 2023
1e51843
Merge branch 'master' of https://github.com/scaleoutsystems/fedn
Nov 21, 2023
423de0d
Merge branch 'master' into feature/cross-device-simulation
Nov 21, 2023
cb5c660
Fix numpyarrayhelper
Nov 24, 2023
d2ec572
Resolved conflict
Jan 26, 2024
56bd0d9
work in progress
Jan 28, 2024
aa5bd0c
Use latest mongodb and bump version number
Jan 28, 2024
06d8b3b
Merge branch 'hotfix/broken-mongo' into feature/cross-device-simulation
Jan 28, 2024
0830f46
Fixed bug in client
Jan 28, 2024
166e460
Client sends model only once, combiner deletes staged model after tra…
Jan 28, 2024
215f748
Merge branch 'bugfix/SK-649' into feature/cross-device-simulation
Jan 29, 2024
09bc999
Cleaned up new example/test
Jan 29, 2024
2c17646
Change naming of temp storage class member in modelservice, for clarity
Jan 29, 2024
147c6ad
Make detach() public
Jan 29, 2024
3b7c8fb
Renamed some methods in client for clarity
Jan 29, 2024
170888f
refactored set_model to avoide code duplication on client
Jan 29, 2024
1bbe7af
Refactored modelservice for code reuse
Jan 29, 2024
8abce77
Fix dashboard package upload
Jan 29, 2024
ee2e442
Fix default helper in session
Jan 29, 2024
2ae055d
Delete combiner level model from minio after reduce
Jan 29, 2024
fa9f0a0
resolved conflicts
Jan 29, 2024
a25afcb
delete combiner models from minio by default
Jan 29, 2024
53ef5c8
code checks
Jan 29, 2024
341d2f7
update docs
Wrede Jan 30, 2024
c60e2ed
Merge branch 'master' into feature/SK-646
Wrede Jan 30, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 38 additions & 0 deletions docs/aggregators.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
Aggregators
===========

Aggregators handle combinations of model updates received by the combiner into a combiner-level global model.
During a training session, the combiners will instantiate an Aggregator and use it to process the incoming model updates from clients.

The above figure illustrates the overall flow. When a client completes a model update, the model parameters are streamed to the combiner, and a model update message is sent. The model parameters are written to file on disk, and the model update message is passed to a callback function, on_model_update. The callback function validates the model update, and if successful, puts the update message on an aggregation queue. The model parameters are written to disk at a configurable storage location at the combiner. This is done to avoid exhausting RAM memory at the combiner. As multiple clients send updates, the aggregation queue builds up, and when a certain criteria is met, another method, combine_models, starts processing the queue, aggregating models according to the specifics of the scheme (FedAvg, FedAdam, etc).

The user can configure several parameters that guide general behavior of the aggregation flow:

- Round timeout: The maximal time the combiner waits before processing the update queue.
- Buffer size: The maximal allowed length of the queue before processing it.
- Whether to retain or delete model update files after they have been processed (default is to delete them)



A developer can extend FEDn with his/her own Aggregator(s) by implementing the interface specified in
:py:mod:`fedn.network.combiners.aggregators.aggregatorbase.AggregatorBase`. The developer implements two following methods:

- ``on_model_update`` (optional)
- ``combine_models``

on_model_update
----------------

The on_model_update has access to the complete model update including the metadata passed on by the clients (as specified in the training entrypoint, see compute package). The base class implements a default callback that checks that all metadata assumed by the aggregation algorithms FedAvg and FedAdam is present in the metadata. However, the callback could also be used to implement custom preprocessing and additional checks including strategies to filter out updates that are suspected to be corrupted or malicious.

combine_models
--------------

This method is responsible for processing the model update queue and in doing so produce an aggregated model. This is the main extension point where the numerical detail of the aggregation scheme is implemented. The best way to understand how to implement this methods is to study the already implemented algorithms:

:py:mod:`fedn.network.combiners.aggregators.fedavg.FedAvg`
:py:mod:`fedn.network.combiners.aggregators.fedopt.FedOpt`

To add an aggregator plugin “myaggregator”, the developer implements the interface and places a file called ‘myaggregator.py’ in the folder ‘fedn.network.combiner.aggregators’.


28 changes: 28 additions & 0 deletions docs/helpers.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
Model Serialization/Deserialization - Helpers
=============================================

In federated learning, model updates need to be serialized and deserialized in order to be
transferred between clients and server/combiner. There is also a need to write and load models
to/from disk, for example to transiently store updates during training rounds.
Furthermore, aggregation algorithms need to perform a range of numerical operations on the
model updates (addition, multiplication, etc). Since different ML frameworks (TF, Torch, etc)
have different internal ways to represent model parameters, there is a need to inform the
framework how to handle models of a given type. In FEDn, this compatibility layer is the
task of Helpers.

A helper is defined by the interface in :py:mod:`fedn.utils.helpers.HelperBase`.
By implementing a helper plugin, a developer can extend the framework with support for new ML
frameworks and numerical operations.

FEDn ships with a default helper implementation, ``numpyhelper``.
This helper relies on the assumption that the model update is made up of parameters
represented by a list of :py:class:`numpy.ndarray` arrays. Since most ML frameworks have
good numpy support it should in most cases be sufficient to use this helper.
Both TF/Keras and PyTorch models can be readily serialized in this way.

To add a helper plugin “myhelper” you implement the interface and place a
file called ‘myhelper.py’ in the folder fedn.utils.helpers.plugins.

See the Keras and PyTorch quickstart examples and :py:mod:`fedn.utils.helpers.plugins.numpyhelper`
for further details.

2 changes: 2 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@
architecture
deployment
interfaces
aggregators
helpers
tutorial
faq
modules
Expand Down
Loading