Skip to content

Deep Learning System for Cancer Classification Experiments Using RNA-Seq Data or Whole Slide Images or Both (Masters thesis)

Notifications You must be signed in to change notification settings

peterdonnelly1/pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

(Note: use 'source' as shown, so that the shell scripts will execute in the current shell session)


1 Install the NVIDIA Docker Container Runtime:

For GPU support, the NVIDIA Container Runtime is required on the system running Docker.
Note: there is no  NVIDIA Container Runtime for Windows

  source ./SCRIPT_TO_INSTALL_NVIDIA_CONTAINER_RUNTIME.sh

2 Build and run the CLASSI docker image

  source ./____BUILD_AND_RUN_THE_CLASSI_DOCKER_ENVIRONMENT.sh  
  
You will be left in the CLASSI container in directory 'pipeline'

3 Use gdc-fetch ('d' option to begin with) to get some TCGA data:

eg: fetch and pre-process TCGA Sarcoma RNA-Seq data (1.1GB - <60min):  

  python ./gdc-fetch.py --debug=9 --dataset sarc --case_filter="filters/TCGA-SARC_case_filter"  --file_filter="filters/GLOBAL_file_filter_UQ" --max_cases=5000 --max_files=10  --output_dir=source_data/sarc  

eg: fetch and pre-process TCGA stomach cancer dataset (image and RNA-Seq) (~650GB - allow 24hrs+):   

  python ./gdc-fetch.py --debug=9 --dataset stad --case_filter="filters/TCGA-STAD_case_filter"  --file_filter="filters/GLOBAL_file_filter_UQ"  --max_cases=5000 --max_files=10  --output_dir=source_data/stad  
  python ./gdc-fetch.py --debug=9 --dataset stad --case_filter="filters/TCGA-STAD_case_filter"  --file_filter="filters/GLOBAL_file_filter_SVS" --max_cases=5000 --max_files=10  --output_dir=source_data/stad  

4 Run an experiment. These ones are fully scripted:

 ./do_all_RUN_ME_TO_SEE_RNASEQ_PROCESSING.sh                     <<< requires sarc rna-seq       dataset
 ./do_all_RUN_ME_TO_SEE_IMAGE_PROCESSING.sh                      <<< requires stad image         dataset
 ./do_all_RUN_ME_TO_SEE_CLUSTERING_USING_SCIKIT_SPECTRAL.sh      <<< requires kidn rna-seq       dataset
 ./do_all_RUN_ME_TO_SEE_MULTIMODAL_IMAGE_PLUS_RNA_PROCESSING.sh  <<< requires stad image and rna datasets
 ./experiment_1.sh                                               <<< requires lots of rna-seq    datasets

once you're comfortable, use CLASSI commands to run experiments:

./do_all.sh -d sarc -i rna   -z DENSE     -b 22 -o 50  -H 2200  -7 0.2                -c UNIMODE_CASE -A 3  -r True
./do_all.sh -d sarc -i rna   -z DENSE     -b 22 -o 50  -H 2200  -7 0.2                -c UNIMODE_CASE -A 3  -R 8
./do_all.sh -d stad -i rna   -z DENSE     -b 88 -o 100 -H "1000 2000"   -7 ".18 .20"  -c UNIMODE_CASE -A 2
./do_all.sh -d stad -i image -a VGG11     -b 64 -o 4  -1 .2  -L .0003 -f 100  -T  64  -c UNIMODE_CASE -A 4
./do_all.sh -d stad -i image -a RESNET152 -b 64 -o 4  -1 .2  -L .0003 -f 9    -T 256  -c UNIMODE_CASE -A 4  -r True

To monitor experiments and see results:

during an experiment:
   monitor progress via container console output
   observe learning curves with any browser pointing to http://localhost:6006
   
after an experiment has completed:
   run 'gimp' inside the container to view images produced by classi. 
    eg. cd logs; gimp 230102_0247__01 ... bar_chart_AL.png &

To edit configuration files:

geany                   > /dev/null 2>&1 &
geany do_all.sh         > /dev/null 2>&1 &
geany conf/variables.sh > /dev/null 2>&1 &

To enter running classi container with a bash shell

sudo docker exec -it classi bash

To stop/start classi container:

sudo docker stop  classi
sudo docker start classi

To delete classi container: sudo docker rm classi

this does not delete the classi image

To delete the classi image:

sudo docker rmi -f classi

you will have to build it again if you do this. 
Building should be fast since Docker will have cached everything that did not change since the last build.

I gratefuly acknowledge the authors of the following software used in CLASSI:

dpcca
The core learning engine of ‘CLASSI’ buids on Gregory Gundersen’s ‘dpcca'. An attractive feature of this software is that it permits neural network models to be dynamically selected at run-time
Code: https://github.com/gwgundersen/dpcca
Paper: "End-to-end Training of Deep Probabilistic CCA on Paired Biomedical Observations"
Paper: http://auai.org/uai2019/proceedings/papers/340.pdf

SPCN
GPU version of 'Structure Preserving Color Normalisation' by D. Anand, G. Ramakrishnan and A. Sethi
Code: https://github.com/goutham7r/spcn
Paper: "Fast GPU-enabled Color Normalization for Digital pathology, International Conference on Systems, Signals and Image Processing, Osijek, Croatia (2019), pp. 219-224
Paper: https://ieeexplore.ieee.org/document/8787328/

Reinhard Stain Colour Normalisation:
Takumi Ando, University of Tokyo, Tokyo, Japan
Code: https://github.com/tand826


Each portion of the code is governed by its owner's respective licence(s)

All of our code portions are governed by the CC BY-NC, as follows: Copyright © 2020-2023, Peter Donnelly. This work is licensed under the Creative Commons Attribution-NonCommercial 4.0 International license. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc/4.0/

About

Deep Learning System for Cancer Classification Experiments Using RNA-Seq Data or Whole Slide Images or Both (Masters thesis)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages