From 121cc09bd8247ab995ab1eb5601c00afdcfd8c88 Mon Sep 17 00:00:00 2001 From: root Date: Fri, 14 Jun 2024 15:19:10 +0000 Subject: [PATCH 01/15] preparing a short readme to push create a PR for the MonAI tutorial. --- examples/monai-2D-mednist/README_MonAI_Tutorial.rst | 13 +++++++++++++ examples/monai-2D-mednist/client/python_env.yaml | 2 +- 2 files changed, 14 insertions(+), 1 deletion(-) create mode 100644 examples/monai-2D-mednist/README_MonAI_Tutorial.rst diff --git a/examples/monai-2D-mednist/README_MonAI_Tutorial.rst b/examples/monai-2D-mednist/README_MonAI_Tutorial.rst new file mode 100644 index 000000000..a316e7940 --- /dev/null +++ b/examples/monai-2D-mednist/README_MonAI_Tutorial.rst @@ -0,0 +1,13 @@ + +Implementing 2D Classification Model with MedNIST Using FEDn +------------------------------------------------------ + +This example provides a step-by-step guide to deploying and running a 2D classification model using the MedNIST dataset in a federated environment with the `FEDn framework ` developed by `Scaleout Systems `. This example builds on the centralized example from the `MonAI project ` and adapts it for federated learning settings, utilizing the same code for ease of transition. + +The FEDn framework supports researchers with its robust `open-source SDK ` and a `public SaaS platform `, enabling scalable and efficient federated learning use cases. + +Getting Started +--------------- + +For a step-by-step example guide, click `here `. + diff --git a/examples/monai-2D-mednist/client/python_env.yaml b/examples/monai-2D-mednist/client/python_env.yaml index 7580ffb76..f7660f304 100644 --- a/examples/monai-2D-mednist/client/python_env.yaml +++ b/examples/monai-2D-mednist/client/python_env.yaml @@ -2,7 +2,7 @@ name: monai-2d-mdnist build_dependencies: - pip - setuptools - - wheel==0.37.1 + - wheel dependencies: - torch==2.2.1 - torchvision==0.17.1 From b16692ad35b2b23ed6b6d8c78a46170525e6e7af Mon Sep 17 00:00:00 2001 From: root Date: Fri, 14 Jun 2024 15:23:46 +0000 Subject: [PATCH 02/15] preparing a short readme to push create a PR for the MonAI tutorial. --- examples/monai-2D-mednist/README_MonAI_Tutorial.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/examples/monai-2D-mednist/README_MonAI_Tutorial.rst b/examples/monai-2D-mednist/README_MonAI_Tutorial.rst index a316e7940..bdde57c44 100644 --- a/examples/monai-2D-mednist/README_MonAI_Tutorial.rst +++ b/examples/monai-2D-mednist/README_MonAI_Tutorial.rst @@ -2,12 +2,12 @@ Implementing 2D Classification Model with MedNIST Using FEDn ------------------------------------------------------ -This example provides a step-by-step guide to deploying and running a 2D classification model using the MedNIST dataset in a federated environment with the `FEDn framework ` developed by `Scaleout Systems `. This example builds on the centralized example from the `MonAI project ` and adapts it for federated learning settings, utilizing the same code for ease of transition. +This example provides a step-by-step guide to deploying and running a 2D classification model using the MedNIST dataset in a federated environment with the `FEDn framework `__ developed by `Scaleout Systems `__ . This example builds on the centralized example from the `MonAI project `__ and adapts it for federated learning settings, utilizing the same code for ease of transition. -The FEDn framework supports researchers with its robust `open-source SDK ` and a `public SaaS platform `, enabling scalable and efficient federated learning use cases. +The FEDn framework supports researchers with its robust `open-source SDK `__ and a `public SaaS platform `__ , enabling scalable and efficient federated learning use cases. Getting Started --------------- -For a step-by-step example guide, click `here `. +For a step-by-step example guide, click `here `__ . From aa8d4ced3050fb3e25626884f2790f64c069be5f Mon Sep 17 00:00:00 2001 From: root Date: Fri, 14 Jun 2024 15:26:45 +0000 Subject: [PATCH 03/15] preparing a short readme to push create a PR for the MonAI tutorial. --- examples/monai-2D-mednist/README_MonAI_Tutorial.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/monai-2D-mednist/README_MonAI_Tutorial.rst b/examples/monai-2D-mednist/README_MonAI_Tutorial.rst index bdde57c44..81f520a80 100644 --- a/examples/monai-2D-mednist/README_MonAI_Tutorial.rst +++ b/examples/monai-2D-mednist/README_MonAI_Tutorial.rst @@ -2,7 +2,7 @@ Implementing 2D Classification Model with MedNIST Using FEDn ------------------------------------------------------ -This example provides a step-by-step guide to deploying and running a 2D classification model using the MedNIST dataset in a federated environment with the `FEDn framework `__ developed by `Scaleout Systems `__ . This example builds on the centralized example from the `MonAI project `__ and adapts it for federated learning settings, utilizing the same code for ease of transition. +This example provides a step-by-step guide to deploying and running a 2D classification model using the MedNIST dataset in a federated environment with the `FEDn framework `__ developed by `Scaleout Systems `__ . This example builds on the centralized `example `__ from the MonAI project and adapts it for federated learning settings, utilizing the same code for ease of transition. The FEDn framework supports researchers with its robust `open-source SDK `__ and a `public SaaS platform `__ , enabling scalable and efficient federated learning use cases. From 2ecf1e5bc2af55bc9c56e54670a5d79929a6f893 Mon Sep 17 00:00:00 2001 From: root Date: Fri, 14 Jun 2024 15:32:52 +0000 Subject: [PATCH 04/15] preparing a short readme to push create a PR for the MonAI tutorial. --- examples/monai-2D-mednist/README_MonAI_Tutorial.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/examples/monai-2D-mednist/README_MonAI_Tutorial.rst b/examples/monai-2D-mednist/README_MonAI_Tutorial.rst index 81f520a80..83d253f03 100644 --- a/examples/monai-2D-mednist/README_MonAI_Tutorial.rst +++ b/examples/monai-2D-mednist/README_MonAI_Tutorial.rst @@ -1,6 +1,6 @@ -Implementing 2D Classification Model with MedNIST Using FEDn ------------------------------------------------------- +Implementing 2D Classification Model with MedNIST Dataset Using FEDn +-------------------------------------------------------------------- This example provides a step-by-step guide to deploying and running a 2D classification model using the MedNIST dataset in a federated environment with the `FEDn framework `__ developed by `Scaleout Systems `__ . This example builds on the centralized `example `__ from the MonAI project and adapts it for federated learning settings, utilizing the same code for ease of transition. From 718a37d9c7b9f00d4ad508fd04053830f5f7437e Mon Sep 17 00:00:00 2001 From: root Date: Tue, 18 Jun 2024 13:56:51 +0000 Subject: [PATCH 05/15] need numpy version lower then 2.0. --- examples/monai-2D-mednist/client/python_env.yaml | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/examples/monai-2D-mednist/client/python_env.yaml b/examples/monai-2D-mednist/client/python_env.yaml index f7660f304..ec39b5084 100644 --- a/examples/monai-2D-mednist/client/python_env.yaml +++ b/examples/monai-2D-mednist/client/python_env.yaml @@ -6,7 +6,7 @@ build_dependencies: dependencies: - torch==2.2.1 - torchvision==0.17.1 - - fedn==0.9.0 + - fedn - monai-weekly[pillow, tqdm] - - scikit-learn - - tensorboard + - numpy==1.26.4 + - scikit-learn From 18101ecdf80334e38c9a9a6ca9afc42c41840c4f Mon Sep 17 00:00:00 2001 From: mattiasakesson Date: Tue, 18 Jun 2024 18:56:27 +0200 Subject: [PATCH 06/15] not complete yet --- examples/monai-2D-mednist/README.rst | 16 ++++++ examples/monai-2D-mednist/client/data.py | 7 +-- examples/monai-2D-mednist/client/train.py | 12 +++-- examples/monai-2D-mednist/prepare_data.py | 66 +++++++++++++++++++++++ 4 files changed, 93 insertions(+), 8 deletions(-) create mode 100644 examples/monai-2D-mednist/prepare_data.py diff --git a/examples/monai-2D-mednist/README.rst b/examples/monai-2D-mednist/README.rst index c2c536f27..75f3852d6 100644 --- a/examples/monai-2D-mednist/README.rst +++ b/examples/monai-2D-mednist/README.rst @@ -24,6 +24,22 @@ If using pseudo-distributed mode with docker-compose: - `Docker `__ - `Docker Compose `__ +Download and Prepare the data +------------------------------------------- + +Install monai + +.. code-block:: + + pip install monai + +Download and divide the data into parts. Set the number of +data parts as an arguments python prepare_data.py NR-OF-DATAPARTS. In the +below command we divide the dataset into 10 parts. +.. code-block:: + + python prepare_data.py 10 + Creating the compute package and seed model ------------------------------------------- diff --git a/examples/monai-2D-mednist/client/data.py b/examples/monai-2D-mednist/client/data.py index 0a8b5c306..6f3d06fca 100644 --- a/examples/monai-2D-mednist/client/data.py +++ b/examples/monai-2D-mednist/client/data.py @@ -1,6 +1,6 @@ import os import random - +import sys import numpy as np import PIL import torch @@ -33,7 +33,7 @@ def split_data(data_path="data/MedNIST", splits=100, validation_split=0.9): yaml.dump(clients, file, default_flow_style=False) -def get_data(out_dir="data"): +def get_data(out_dir="data", data_splits=10): """Get data from the external repository. :param out_dir: Path to data directory. If doesn't @@ -58,7 +58,7 @@ def get_data(out_dir="data"): else: print("files already exist.") - split_data() + split_data(splits=data_splits) def get_classes(data_path): @@ -145,6 +145,7 @@ def __len__(self): return len(self.image_files) def __getitem__(self, index): + print("__getitem__ path: ", os.path.join(self.data_path, self.image_files[index])) return (self.transforms(os.path.join(self.data_path, self.image_files[index])), DATA_CLASSES[os.path.dirname(self.image_files[index])]) diff --git a/examples/monai-2D-mednist/client/train.py b/examples/monai-2D-mednist/client/train.py index e3cb235c0..63e881893 100644 --- a/examples/monai-2D-mednist/client/train.py +++ b/examples/monai-2D-mednist/client/train.py @@ -65,19 +65,21 @@ def train(in_model_path, out_model_path, data_path=None, client_settings_path=No batch_size = client_settings["batch_size"] max_epochs = client_settings["local_epochs"] num_workers = client_settings["num_workers"] - split_index = client_settings["split_index"] + split_index = os.environ.get("FEDN_DATA_SPLIT_INDEX")#client_settings["split_index"] + print("split index: ", split_index) lr = client_settings["lr"] if data_path is None: data_path = os.environ.get("FEDN_DATA_PATH") - + print("os.path.join(os.path.dirname(data_path), data_splits.yaml: ", os.path.join(os.path.dirname(data_path), "data_splits.yaml")) with open(os.path.join(os.path.dirname(data_path), "data_splits.yaml"), "r") as file: clients = yaml.safe_load(file) image_list = clients["client " + str(split_index)]["train"] - - train_ds = MedNISTDataset(data_path="data/MedNIST", transforms=train_transforms, image_files=image_list) - + print("image_list len: ", len(image_list)) + train_ds = MedNISTDataset(data_path="app/data/MedNIST", transforms=train_transforms, image_files=image_list) + print("train_ds len: ", len(train_ds)) + print("batch_size: ", batch_size, ", num_workers: ", num_workers) train_loader = DataLoader(train_ds, batch_size=batch_size, shuffle=True, num_workers=num_workers) # Load parmeters and initialize model diff --git a/examples/monai-2D-mednist/prepare_data.py b/examples/monai-2D-mednist/prepare_data.py new file mode 100644 index 000000000..80c083549 --- /dev/null +++ b/examples/monai-2D-mednist/prepare_data.py @@ -0,0 +1,66 @@ +import os +import sys +import numpy as np + +import yaml +from monai.apps import download_and_extract + + +def split_data(data_path="data/MedNIST", splits=100, validation_split=0.9): + # create clients + clients = {"client " + str(i): {"train": [], "validation": []} for i in range(splits)} + print("splits: ", splits) + for class_ in os.listdir(data_path): + if os.path.isdir(os.path.join(data_path, class_)): + patients_in_class = [os.path.join(class_, patient) for patient in os.listdir(os.path.join(data_path, class_))] + np.random.shuffle(patients_in_class) + chops = np.int32(np.linspace(0, len(patients_in_class), splits + 1)) + for split in range(splits): + p = patients_in_class[chops[split] : chops[split + 1]] + + valsplit = np.int32(len(p) * validation_split) + + clients["client " + str(split)]["train"] += p[:valsplit] + clients["client " + str(split)]["validation"] += p[valsplit:] + + if split == 0: + print("len p: ", len(p)) + print("valsplit: ", valsplit) + print("p[:valsplit]: ", p[:valsplit]) + print("p[valsplit:]: ", p[valsplit:]) + + with open(os.path.join(os.path.dirname(data_path), "data_splits.yaml"), "w") as file: + yaml.dump(clients, file, default_flow_style=False) + + +def get_data(out_dir="data", data_splits=10): + """Get data from the external repository. + + :param out_dir: Path to data directory. If doesn't + :type data_dir: str + """ + # Make dir if necessary + if not os.path.exists(out_dir): + os.mkdir(out_dir) + + resource = "https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/MedNIST.tar.gz" + md5 = "0bc7306e7427e00ad1c5526a6677552d" + + compressed_file = os.path.join(out_dir, "MedNIST.tar.gz") + + data_dir = os.path.abspath(out_dir) + print("data_dir:", data_dir) + if os.path.exists(data_dir): + print("path exist.") + if not os.path.exists(compressed_file): + print("compressed file does not exist, downloading and extracting data.") + download_and_extract(resource, compressed_file, data_dir, md5) + else: + print("files already exist.") + + split_data(splits=data_splits) + + +if __name__ == "__main__": + # Prepare data if not already done + get_data(data_splits=int(sys.argv[1])) From 367a7d87508117ff51fabf83c6c94bbb3a0de4cc Mon Sep 17 00:00:00 2001 From: mattiasakesson Date: Tue, 18 Jun 2024 19:18:37 +0200 Subject: [PATCH 07/15] compose override --- .../monai-2D-mednist/docker-compose.override.yaml | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/examples/monai-2D-mednist/docker-compose.override.yaml b/examples/monai-2D-mednist/docker-compose.override.yaml index afeaf1437..88fda24d8 100644 --- a/examples/monai-2D-mednist/docker-compose.override.yaml +++ b/examples/monai-2D-mednist/docker-compose.override.yaml @@ -15,13 +15,15 @@ services: service: client environment: <<: *defaults - FEDN_DATA_PATH: /app/package/client/data/MedNIST - FEDN_CLIENT_SETTINGS_PATH: /app/client_settings.yaml + FEDN_DATA_PATH: /app/data/MedNIST + FEDN_CLIENT_SETTINGS_PATH: /app/client_settings.yaml + FEDN_DATA_SPLIT_INDEX: 0 deploy: replicas: 1 volumes: - ${HOST_REPO_DIR:-.}/fedn:/app/fedn - - ${HOST_REPO_DIR:-.}/examples/monai-2D-mednist/client_settings.yaml:/app/client_settings.yaml + - ${HOST_REPO_DIR:-.}/examples/monai-2D-mednist/client_settings.yaml:/app/client_settings.yaml + - ${HOST_REPO_DIR:-.}/examples/monai-2D-mednist/data:/app/data client2: extends: @@ -29,8 +31,12 @@ services: service: client environment: <<: *defaults - FEDN_DATA_PATH: /app/package/client/data/MedNIST + FEDN_DATA_PATH: /app/data/MedNIST + FEDN_CLIENT_SETTINGS_PATH: /app/client_settings.yaml + FEDN_DATA_SPLIT_INDEX: 1 deploy: replicas: 1 volumes: - ${HOST_REPO_DIR:-.}/fedn:/app/fedn + - ${HOST_REPO_DIR:-.}/examples/monai-2D-mednist/client_settings.yaml:/app/client_settings.yaml + - ${HOST_REPO_DIR:-.}/examples/monai-2D-mednist/data:/app/data From 7607b676b8a048986b03733e7cfe0be0c3893ee4 Mon Sep 17 00:00:00 2001 From: mattiasakesson Date: Wed, 19 Jun 2024 11:13:13 +0200 Subject: [PATCH 08/15] working --- examples/monai-2D-mednist/README.rst | 33 +++++++++++--------- examples/monai-2D-mednist/client/data.py | 1 - examples/monai-2D-mednist/client/train.py | 14 +++------ examples/monai-2D-mednist/client/validate.py | 2 +- 4 files changed, 23 insertions(+), 27 deletions(-) diff --git a/examples/monai-2D-mednist/README.rst b/examples/monai-2D-mednist/README.rst index 75f3852d6..96e498ef6 100644 --- a/examples/monai-2D-mednist/README.rst +++ b/examples/monai-2D-mednist/README.rst @@ -24,21 +24,6 @@ If using pseudo-distributed mode with docker-compose: - `Docker `__ - `Docker Compose `__ -Download and Prepare the data -------------------------------------------- - -Install monai - -.. code-block:: - - pip install monai - -Download and divide the data into parts. Set the number of -data parts as an arguments python prepare_data.py NR-OF-DATAPARTS. In the -below command we divide the dataset into 10 parts. -.. code-block:: - - python prepare_data.py 10 Creating the compute package and seed model ------------------------------------------- @@ -72,6 +57,24 @@ Next, generate a seed model (the first model in a global model trail): This will create a seed model called 'seed.npz' in the root of the project. This step will take a few minutes, depending on hardware and internet connection (builds a virtualenv). +Download and Prepare the data +------------------------------------------- + +Install monai + +.. code-block:: + + pip install monai + +Download and divide the data into parts. Set the number of +data parts as an arguments python prepare_data.py NR-OF-DATAPARTS. In the +below command we divide the dataset into 10 parts. +.. code-block:: + + python prepare_data.py 10 + + + Using FEDn Studio ----------------- diff --git a/examples/monai-2D-mednist/client/data.py b/examples/monai-2D-mednist/client/data.py index 6f3d06fca..fa6200589 100644 --- a/examples/monai-2D-mednist/client/data.py +++ b/examples/monai-2D-mednist/client/data.py @@ -145,7 +145,6 @@ def __len__(self): return len(self.image_files) def __getitem__(self, index): - print("__getitem__ path: ", os.path.join(self.data_path, self.image_files[index])) return (self.transforms(os.path.join(self.data_path, self.image_files[index])), DATA_CLASSES[os.path.dirname(self.image_files[index])]) diff --git a/examples/monai-2D-mednist/client/train.py b/examples/monai-2D-mednist/client/train.py index 63e881893..8fc4b05b7 100644 --- a/examples/monai-2D-mednist/client/train.py +++ b/examples/monai-2D-mednist/client/train.py @@ -22,7 +22,6 @@ dir_path = os.path.dirname(os.path.realpath(__file__)) sys.path.append(os.path.abspath(dir_path)) - train_transforms = Compose( [ LoadImage(image_only=True), @@ -54,32 +53,27 @@ def train(in_model_path, out_model_path, data_path=None, client_settings_path=No if client_settings_path is None: client_settings_path = os.environ.get("FEDN_CLIENT_SETTINGS_PATH", dir_path + "/client_settings.yaml") - print("client_settings_path: ", client_settings_path) with open(client_settings_path, "r") as fh: # Used by CJG for local training try: client_settings = dict(yaml.safe_load(fh)) except yaml.YAMLError: raise - print("client settings: ", client_settings) batch_size = client_settings["batch_size"] max_epochs = client_settings["local_epochs"] num_workers = client_settings["num_workers"] - split_index = os.environ.get("FEDN_DATA_SPLIT_INDEX")#client_settings["split_index"] - print("split index: ", split_index) + split_index = os.environ.get("FEDN_DATA_SPLIT_INDEX") #client_settings["split_index"] lr = client_settings["lr"] if data_path is None: data_path = os.environ.get("FEDN_DATA_PATH") - print("os.path.join(os.path.dirname(data_path), data_splits.yaml: ", os.path.join(os.path.dirname(data_path), "data_splits.yaml")) + with open(os.path.join(os.path.dirname(data_path), "data_splits.yaml"), "r") as file: clients = yaml.safe_load(file) image_list = clients["client " + str(split_index)]["train"] - print("image_list len: ", len(image_list)) - train_ds = MedNISTDataset(data_path="app/data/MedNIST", transforms=train_transforms, image_files=image_list) - print("train_ds len: ", len(train_ds)) - print("batch_size: ", batch_size, ", num_workers: ", num_workers) + + train_ds = MedNISTDataset(data_path=data_path, transforms=train_transforms, image_files=image_list) train_loader = DataLoader(train_ds, batch_size=batch_size, shuffle=True, num_workers=num_workers) # Load parmeters and initialize model diff --git a/examples/monai-2D-mednist/client/validate.py b/examples/monai-2D-mednist/client/validate.py index 74292c34f..a3053c119 100644 --- a/examples/monai-2D-mednist/client/validate.py +++ b/examples/monai-2D-mednist/client/validate.py @@ -45,7 +45,7 @@ def validate(in_model_path, out_json_path, data_path=None, client_settings_path= num_workers = client_settings["num_workers"] batch_size = client_settings["batch_size"] - split_index = client_settings["split_index"] + split_index = os.environ.get("FEDN_DATA_SPLIT_INDEX") # client_settings["split_index"] if data_path is None: data_path = os.environ.get("FEDN_DATA_PATH") From 73659c06d586ff2b821b6fa6d2c16b48db831e6a Mon Sep 17 00:00:00 2001 From: mattiasakesson Date: Wed, 19 Jun 2024 12:16:11 +0200 Subject: [PATCH 09/15] add requirements --- examples/monai-2D-mednist/README.rst | 85 +++++++++------------- examples/monai-2D-mednist/requirements.txt | 3 + 2 files changed, 36 insertions(+), 52 deletions(-) create mode 100644 examples/monai-2D-mednist/requirements.txt diff --git a/examples/monai-2D-mednist/README.rst b/examples/monai-2D-mednist/README.rst index 96e498ef6..07f88e370 100644 --- a/examples/monai-2D-mednist/README.rst +++ b/examples/monai-2D-mednist/README.rst @@ -1,15 +1,13 @@ -FEDn Project: MonAI 2D Classification with the MedNIST Dataset (PyTorch) ------------------------------------------------------------------------- +FEDn Project: MNIST (PyTorch) +----------------------------- -This is an example FEDn Project based on the MonAI 2D Classification with the MedNIST Dataset. +This is an example FEDn Project based on the classic hand-written text recognition dataset MNIST. The example is intented as a minimalistic quickstart and automates the handling of training data -by letting the client download and create its partition of the dataset as it starts up. +by letting the client download and create its partition of the dataset as it starts up. -Links: - -- MonAI: https://monai.io/ -- Base example notebook: https://github.com/Project-MONAI/tutorials/blob/main/2d_classification/mednist_tutorial.ipynb -- MedNIST dataset: https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/MedNIST.tar.gz + **Note: These instructions are geared towards users seeking to learn how to work + with FEDn in local development mode using Docker/docker-compose. We recommend all new users + to start by following the Quickstart Tutorial: https://fedn.readthedocs.io/en/stable/quickstart.html** Prerequisites ------------- @@ -17,18 +15,17 @@ Prerequisites Using FEDn Studio: - `Python 3.8, 3.9, 3.10 or 3.11 `__ -- `A FEDn Studio account `__ +- `A FEDn Studio account `__ If using pseudo-distributed mode with docker-compose: - `Docker `__ - `Docker Compose `__ - Creating the compute package and seed model ------------------------------------------- -Install fedn: +Install fedn: .. code-block:: @@ -39,7 +36,7 @@ Clone this repository, then locate into this directory: .. code-block:: git clone https://github.com/scaleoutsystems/fedn.git - cd fedn/examples/monai-2D-mednist + cd fedn/examples/mnist-pytorch Create the compute package: @@ -55,63 +52,47 @@ Next, generate a seed model (the first model in a global model trail): fedn run build --path client -This will create a seed model called 'seed.npz' in the root of the project. This step will take a few minutes, depending on hardware and internet connection (builds a virtualenv). - -Download and Prepare the data -------------------------------------------- - -Install monai - -.. code-block:: - - pip install monai - -Download and divide the data into parts. Set the number of -data parts as an arguments python prepare_data.py NR-OF-DATAPARTS. In the -below command we divide the dataset into 10 parts. -.. code-block:: - - python prepare_data.py 10 - - +This will create a seed model called 'seed.npz' in the root of the project. This step will take a few minutes, depending on hardware and internet connection (builds a virtualenv). Using FEDn Studio ----------------- Follow the guide here to set up your FEDn Studio project and learn how to connect clients (using token authentication): `Studio guide `__. -On the step "Upload Files", upload 'package.tgz' and 'seed.npz' created above. +On the step "Upload Files", upload 'package.tgz' and 'seed.npz' created above. + -Connecting clients: -=================== +Modifing the data split: +======================== -**NOTE: In case a different data path needs to be set, use the env variable FEDN_DATA_PATH.** +The default traning and test data for this example is downloaded and split direcly by the client when it starts up (see 'startup' entrypoint). +The number of splits and which split used by a client can be controlled via the environment variables ``FEDN_NUM_DATA_SPLITS`` and ``FEDN_DATA_PATH``. +For example, to split the data in 10 parts and start a client using the 8th partiton: .. code-block:: export FEDN_PACKAGE_EXTRACT_DIR=package - export FEDN_DATA_PATH=./data/ - export FEDN_CLIENT_SETTINGS_PATH=/client_settings.yaml + export FEDN_NUM_DATA_SPLITS=10 + export FEDN_DATA_PATH=./data/clients/8/mnist.pt fedn client start -in client.yaml --secure=True --force-ssl +The default is to split the data into 2 partitions and use the first partition. + + Connecting clients using Docker: ================================ -For convenience, there is a Docker image hosted on ghrc.io with fedn preinstalled. To start a client using Docker: +For convenience, there is a Docker image hosted on ghrc.io with fedn preinstalled. To start a client using Docker: .. code-block:: docker run \ -v $PWD/client.yaml:/app/client.yaml \ - -v $PWD/client_settings.yaml:/app/client_settings.yaml \ -e FEDN_PACKAGE_EXTRACT_DIR=package \ - -e FEDN_DATA_PATH=./data/ \ - -e FEDN_CLIENT_SETTINGS_PATH=/app/client_settings.yaml \ + -e FEDN_NUM_DATA_SPLITS=2 \ + -e FEDN_DATA_PATH=/app/package/data/clients/1/mnist.pt \ ghcr.io/scaleoutsystems/fedn/fedn:0.9.0 run client -in client.yaml --force-ssl --secure=True -**NOTE: The following instructions are only for SDK-based client communication and for local development environments using Docker.** - - Local development mode using Docker/docker compose -------------------------------------------------- @@ -126,8 +107,8 @@ Start a pseudo-distributed FEDn network using docker-compose: -f docker-compose.override.yaml \ up -This starts up local services for MongoDB, Minio, the API Server, one Combiner and two clients. -You can verify the deployment using these urls: +This starts up local services for MongoDB, Minio, the API Server, one Combiner and two clients. +You can verify the deployment using these urls: - API Server: http://localhost:8092/get_controller_status - Minio: http://localhost:9000 @@ -142,18 +123,18 @@ Upload the package and seed model to FEDn controller using the APIClient. In Pyt client.set_active_package("package.tgz", helper="numpyhelper") client.set_active_model("seed.npz") -You can now start a training session with 5 rounds (default): +You can now start a training session with 5 rounds (default): .. code-block:: client.start_session() -Automate experimentation with several clients +Automate experimentation with several clients ============================================= -If you want to scale the number of clients, you can do so by modifying ``docker-compose.override.yaml``. For example, -in order to run with 3 clients, change the environment variable ``FEDN_NUM_DATA_SPLITS`` to 3, and add one more client -by copying ``client1`` and setting ``FEDN_DATA_PATH`` to ``/app/package/data3/`` +If you want to scale the number of clients, you can do so by modifying ``docker-compose.override.yaml``. For example, +in order to run with 3 clients, change the environment variable ``FEDN_NUM_DATA_SPLITS`` to 3, and add one more client +by copying ``client1`` and setting ``FEDN_DATA_PATH`` to ``/app/package/data/clients/3/mnist.pt`` Access message logs and validation data from MongoDB diff --git a/examples/monai-2D-mednist/requirements.txt b/examples/monai-2D-mednist/requirements.txt new file mode 100644 index 000000000..0e2857824 --- /dev/null +++ b/examples/monai-2D-mednist/requirements.txt @@ -0,0 +1,3 @@ +monai +PyYAML +numpy==1.26.4 \ No newline at end of file From d9872d2cb5a4c43659ffd821d861b4fa2e015c7b Mon Sep 17 00:00:00 2001 From: mattiasakesson Date: Wed, 19 Jun 2024 12:16:52 +0200 Subject: [PATCH 10/15] fix data --- examples/monai-2D-mednist/client/data.py | 51 ------------------------ 1 file changed, 51 deletions(-) diff --git a/examples/monai-2D-mednist/client/data.py b/examples/monai-2D-mednist/client/data.py index fa6200589..de7acc8a2 100644 --- a/examples/monai-2D-mednist/client/data.py +++ b/examples/monai-2D-mednist/client/data.py @@ -1,11 +1,8 @@ import os import random -import sys import numpy as np import PIL import torch -import yaml -from monai.apps import download_and_extract dir_path = os.path.dirname(os.path.realpath(__file__)) abs_path = os.path.abspath(dir_path) @@ -13,54 +10,6 @@ DATA_CLASSES = {"AbdomenCT": 0, "BreastMRI": 1, "CXR": 2, "ChestCT": 3, "Hand": 4, "HeadCT": 5} -def split_data(data_path="data/MedNIST", splits=100, validation_split=0.9): - # create clients - clients = {"client " + str(i): {"train": [], "validation": []} for i in range(splits)} - - for class_ in os.listdir(data_path): - if os.path.isdir(os.path.join(data_path, class_)): - patients_in_class = [os.path.join(class_, patient) for patient in os.listdir(os.path.join(data_path, class_))] - np.random.shuffle(patients_in_class) - chops = np.int32(np.linspace(0, len(patients_in_class), splits + 1)) - for split in range(splits): - p = patients_in_class[chops[split] : chops[split + 1]] - valsplit = np.int32(len(p) * validation_split) - - clients["client " + str(split)]["train"] += p[:valsplit] - clients["client " + str(split)]["validation"] += p[valsplit:] - - with open(os.path.join(os.path.dirname(data_path), "data_splits.yaml"), "w") as file: - yaml.dump(clients, file, default_flow_style=False) - - -def get_data(out_dir="data", data_splits=10): - """Get data from the external repository. - - :param out_dir: Path to data directory. If doesn't - :type data_dir: str - """ - # Make dir if necessary - if not os.path.exists(out_dir): - os.mkdir(out_dir) - - resource = "https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/MedNIST.tar.gz" - md5 = "0bc7306e7427e00ad1c5526a6677552d" - - compressed_file = os.path.join(out_dir, "MedNIST.tar.gz") - - data_dir = os.path.abspath(out_dir) - print("data_dir:", data_dir) - if os.path.exists(data_dir): - print("path exist.") - if not os.path.exists(compressed_file): - print("compressed file does not exist, downloading and extracting data.") - download_and_extract(resource, compressed_file, data_dir, md5) - else: - print("files already exist.") - - split_data(splits=data_splits) - - def get_classes(data_path): """Get a list of classes from the dataset From 23c5f2a35124a2bc9b634ebfcbe48009e8560eb2 Mon Sep 17 00:00:00 2001 From: mattiasakesson Date: Wed, 19 Jun 2024 12:17:49 +0200 Subject: [PATCH 11/15] fix data.py --- examples/monai-2D-mednist/client/data.py | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/examples/monai-2D-mednist/client/data.py b/examples/monai-2D-mednist/client/data.py index de7acc8a2..c8a8a4e0b 100644 --- a/examples/monai-2D-mednist/client/data.py +++ b/examples/monai-2D-mednist/client/data.py @@ -97,6 +97,5 @@ def __getitem__(self, index): return (self.transforms(os.path.join(self.data_path, self.image_files[index])), DATA_CLASSES[os.path.dirname(self.image_files[index])]) -if __name__ == "__main__": - # Prepare data if not already done - get_data() + + From 5f324b42edb0cd5890c198338951da0df8ac3ce6 Mon Sep 17 00:00:00 2001 From: mattiasakesson Date: Wed, 19 Jun 2024 14:02:52 +0200 Subject: [PATCH 12/15] correct readme --- examples/monai-2D-mednist/README.rst | 63 ++++++++++++++++++---------- 1 file changed, 42 insertions(+), 21 deletions(-) diff --git a/examples/monai-2D-mednist/README.rst b/examples/monai-2D-mednist/README.rst index 07f88e370..f88f3d31d 100644 --- a/examples/monai-2D-mednist/README.rst +++ b/examples/monai-2D-mednist/README.rst @@ -1,13 +1,15 @@ -FEDn Project: MNIST (PyTorch) ------------------------------ +FEDn Project: MonAI 2D Classification with the MedNIST Dataset (PyTorch) +------------------------------------------------------------------------ -This is an example FEDn Project based on the classic hand-written text recognition dataset MNIST. +This is an example FEDn Project based on the MonAI 2D Classification with the MedNIST Dataset. The example is intented as a minimalistic quickstart and automates the handling of training data by letting the client download and create its partition of the dataset as it starts up. - **Note: These instructions are geared towards users seeking to learn how to work - with FEDn in local development mode using Docker/docker-compose. We recommend all new users - to start by following the Quickstart Tutorial: https://fedn.readthedocs.io/en/stable/quickstart.html** +Links: + +- MonAI: https://monai.io/ +- Base example notebook: https://github.com/Project-MONAI/tutorials/blob/main/2d_classification/mednist_tutorial.ipynb +- MedNIST dataset: https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/MedNIST.tar.gz Prerequisites ------------- @@ -22,6 +24,7 @@ If using pseudo-distributed mode with docker-compose: - `Docker `__ - `Docker Compose `__ + Creating the compute package and seed model ------------------------------------------- @@ -36,7 +39,7 @@ Clone this repository, then locate into this directory: .. code-block:: git clone https://github.com/scaleoutsystems/fedn.git - cd fedn/examples/mnist-pytorch + cd fedn/examples/monai-2D-mednist Create the compute package: @@ -54,29 +57,43 @@ Next, generate a seed model (the first model in a global model trail): This will create a seed model called 'seed.npz' in the root of the project. This step will take a few minutes, depending on hardware and internet connection (builds a virtualenv). +Download and Prepare the data +------------------------------------------- + +Install monai + +.. code-block:: + + pip install -r requirements.txt + +Download and divide the data into parts. Set the number of +data parts as an arguments python prepare_data.py NR-OF-DATAPARTS. In the +below command we divide the dataset into 10 parts. +.. code-block:: + + python prepare_data.py 10 + + + Using FEDn Studio ----------------- Follow the guide here to set up your FEDn Studio project and learn how to connect clients (using token authentication): `Studio guide `__. On the step "Upload Files", upload 'package.tgz' and 'seed.npz' created above. +Connecting clients: +=================== -Modifing the data split: -======================== - -The default traning and test data for this example is downloaded and split direcly by the client when it starts up (see 'startup' entrypoint). -The number of splits and which split used by a client can be controlled via the environment variables ``FEDN_NUM_DATA_SPLITS`` and ``FEDN_DATA_PATH``. -For example, to split the data in 10 parts and start a client using the 8th partiton: +**NOTE: In case a different data path needs to be set, use the env variable FEDN_DATA_PATH.** .. code-block:: export FEDN_PACKAGE_EXTRACT_DIR=package - export FEDN_NUM_DATA_SPLITS=10 - export FEDN_DATA_PATH=./data/clients/8/mnist.pt - fedn client start -in client.yaml --secure=True --force-ssl - -The default is to split the data into 2 partitions and use the first partition. + export FEDN_DATA_PATH=/data/ + export FEDN_CLIENT_SETTINGS_PATH=/client_settings.yaml + export export FEDN_DATA_SPLIT_INDEX=0 + fedn client start -in client.yaml --secure=True --force-ssl Connecting clients using Docker: ================================ @@ -87,12 +104,16 @@ For convenience, there is a Docker image hosted on ghrc.io with fedn preinstalle docker run \ -v $PWD/client.yaml:/app/client.yaml \ + -v $PWD/client_settings.yaml:/app/client_settings.yaml \ -e FEDN_PACKAGE_EXTRACT_DIR=package \ - -e FEDN_NUM_DATA_SPLITS=2 \ - -e FEDN_DATA_PATH=/app/package/data/clients/1/mnist.pt \ + -e FEDN_DATA_PATH=./data/ \ + -e FEDN_CLIENT_SETTINGS_PATH=/app/client_settings.yaml \ ghcr.io/scaleoutsystems/fedn/fedn:0.9.0 run client -in client.yaml --force-ssl --secure=True +**NOTE: The following instructions are only for SDK-based client communication and for local development environments using Docker.** + + Local development mode using Docker/docker compose -------------------------------------------------- @@ -134,7 +155,7 @@ Automate experimentation with several clients If you want to scale the number of clients, you can do so by modifying ``docker-compose.override.yaml``. For example, in order to run with 3 clients, change the environment variable ``FEDN_NUM_DATA_SPLITS`` to 3, and add one more client -by copying ``client1`` and setting ``FEDN_DATA_PATH`` to ``/app/package/data/clients/3/mnist.pt`` +by copying ``client1`` and setting ``FEDN_DATA_PATH`` to ``/app/package/data3/`` Access message logs and validation data from MongoDB From 79b91dc5cc1c76189fb926535221b4414d4a2c1d Mon Sep 17 00:00:00 2001 From: mattiasakesson Date: Wed, 19 Jun 2024 14:04:57 +0200 Subject: [PATCH 13/15] remove split_index from client_settings.yaml --- examples/monai-2D-mednist/client_settings.yaml | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/examples/monai-2D-mednist/client_settings.yaml b/examples/monai-2D-mednist/client_settings.yaml index f7bccb303..468c78802 100644 --- a/examples/monai-2D-mednist/client_settings.yaml +++ b/examples/monai-2D-mednist/client_settings.yaml @@ -1,6 +1,5 @@ lr: 0.01 -batch_size: 32 -local_epochs: 10 +batch_size: 8 +local_epochs: 1 num_workers: 1 sample_size: 30 -split_index: 4 From 42de9586265417aa43a32d8a0690dfaf9f895641 Mon Sep 17 00:00:00 2001 From: root Date: Wed, 19 Jun 2024 14:26:23 +0000 Subject: [PATCH 14/15] We have now changed the way dataset is created for the example and also update the README file. --- examples/monai-2D-mednist/README.rst | 6 ++++-- examples/monai-2D-mednist/client/train.py | 2 +- examples/monai-2D-mednist/client/validate.py | 2 +- 3 files changed, 6 insertions(+), 4 deletions(-) diff --git a/examples/monai-2D-mednist/README.rst b/examples/monai-2D-mednist/README.rst index f88f3d31d..6b3d91e26 100644 --- a/examples/monai-2D-mednist/README.rst +++ b/examples/monai-2D-mednist/README.rst @@ -91,7 +91,7 @@ Connecting clients: export FEDN_PACKAGE_EXTRACT_DIR=package export FEDN_DATA_PATH=/data/ export FEDN_CLIENT_SETTINGS_PATH=/client_settings.yaml - export export FEDN_DATA_SPLIT_INDEX=0 + export FEDN_DATA_SPLIT_INDEX=0 fedn client start -in client.yaml --secure=True --force-ssl @@ -105,9 +105,11 @@ For convenience, there is a Docker image hosted on ghrc.io with fedn preinstalle docker run \ -v $PWD/client.yaml:/app/client.yaml \ -v $PWD/client_settings.yaml:/app/client_settings.yaml \ + -v $PWD/data:/app/data \ -e FEDN_PACKAGE_EXTRACT_DIR=package \ - -e FEDN_DATA_PATH=./data/ \ + -e FEDN_DATA_PATH=/app/data/ \ -e FEDN_CLIENT_SETTINGS_PATH=/app/client_settings.yaml \ + -e FEDN_DATA_SPLIT_INDEX=0 \ ghcr.io/scaleoutsystems/fedn/fedn:0.9.0 run client -in client.yaml --force-ssl --secure=True diff --git a/examples/monai-2D-mednist/client/train.py b/examples/monai-2D-mednist/client/train.py index 8fc4b05b7..f590fd4f7 100644 --- a/examples/monai-2D-mednist/client/train.py +++ b/examples/monai-2D-mednist/client/train.py @@ -73,7 +73,7 @@ def train(in_model_path, out_model_path, data_path=None, client_settings_path=No image_list = clients["client " + str(split_index)]["train"] - train_ds = MedNISTDataset(data_path=data_path, transforms=train_transforms, image_files=image_list) + train_ds = MedNISTDataset(data_path=data_path+'/MedNIST/', transforms=train_transforms, image_files=image_list) train_loader = DataLoader(train_ds, batch_size=batch_size, shuffle=True, num_workers=num_workers) # Load parmeters and initialize model diff --git a/examples/monai-2D-mednist/client/validate.py b/examples/monai-2D-mednist/client/validate.py index a3053c119..61684867c 100644 --- a/examples/monai-2D-mednist/client/validate.py +++ b/examples/monai-2D-mednist/client/validate.py @@ -55,7 +55,7 @@ def validate(in_model_path, out_json_path, data_path=None, client_settings_path= image_list = clients["client " + str(split_index)]["validation"] - val_ds = MedNISTDataset(data_path="data/MedNIST", transforms=val_transforms, image_files=image_list) + val_ds = MedNISTDataset(data_path=data_path+"/MedNIST/", transforms=val_transforms, image_files=image_list) val_loader = DataLoader(val_ds, batch_size=batch_size, shuffle=True, num_workers=num_workers) From b775342a8bbf9faf1c45c3ad46c0f2dd631e1f6c Mon Sep 17 00:00:00 2001 From: mattiasakesson <33224977+mattiasakesson@users.noreply.github.com> Date: Thu, 20 Jun 2024 10:30:05 +0200 Subject: [PATCH 15/15] Update README.rst fix incorrect instruction regarding scaling up to 3 clients. --- examples/monai-2D-mednist/README.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/examples/monai-2D-mednist/README.rst b/examples/monai-2D-mednist/README.rst index 6b3d91e26..cb46047ed 100644 --- a/examples/monai-2D-mednist/README.rst +++ b/examples/monai-2D-mednist/README.rst @@ -60,7 +60,7 @@ This will create a seed model called 'seed.npz' in the root of the project. This Download and Prepare the data ------------------------------------------- -Install monai +Install requirements: .. code-block:: @@ -157,7 +157,7 @@ Automate experimentation with several clients If you want to scale the number of clients, you can do so by modifying ``docker-compose.override.yaml``. For example, in order to run with 3 clients, change the environment variable ``FEDN_NUM_DATA_SPLITS`` to 3, and add one more client -by copying ``client1`` and setting ``FEDN_DATA_PATH`` to ``/app/package/data3/`` +by copying ``client1``. Access message logs and validation data from MongoDB