Skip to content

hainesmichaelc/cs598-final-project

Repository files navigation

CS598 Deep Learning for Healthcare - Final Project Submission

Author: Michael Haines | mhaines2@illinois.edu

Citation to Original Paper

This notebook is an attempt to recreate a model similar to CEHR-GAN-BERT from a paper titled "Few-Shot Learning with Semi-Supervised Transformers for Electronic Health Records".

Runtime Dependencies

The runtime dependencies can be found in the requirements.txt file, and can be installed using pip install -r requirements.txt.

Data Download Instructions

The code uses the ICU module of the MIMIC-IV dataset. This dataset is protected, and you must request access from PhysioNet before downloading. After downloading the data, upload the data into your personal Google Drive in a ./data/mimiciv/ directory. When properly configured, the ICU module should be available from the ./data/mimiciv/icu/ sub-directory of your Google Drive root directory.

Pre-processing Notebook

The Pre-processing.ipynb notebook contains the end-to-end pre-processing required to produce the tokenized datasets. This creates more detailed cohort data, visualizations, and summary statistics along the way. Note that only the tokenized datasets, vocabularies, and charts are published to source control; this notebook will create:

  • Pre-processed cohort of patient visits used for model training in ./data/cohort/ directory
  • Summary statistics for features in ./data/summary/ directory

Model Training Notebook

The ModelTraining.ipynb notebook trains all BERT-based models used in the analysis. Given that this code requires a GPU to run efficiently, and still takes several hours to complete, I have comitted the results of my experiments to source control in the ./saved_models/ directory. Please use Google Colab if you wish to re-train the models.

Evaluation Notebook

The Evaluation.ipynb notebook takes the results of the models that I trained and performs the evaluations.

BONUS - Descriptive Notebook

I am turning in the evaluation notebook as my descriptive notebook. However, if that one is not descriptive enough, further description can be found in the Pre-processing.ipynb and ModelTraining.ipynb notebooks.

References

To get started, I leaned on the following code bases:

Thank you to the original authors for providing a great starting point for the analysis.

About

Final project for CS 598 - Deep Learning for Healthcare

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •