We can take existing CT datasets, run VISTA3D inference on it and use the results to generate training conversation for M3.
export PYTHONPATH=${PWD}/..
ROOT_DIR=../../data/experts/vista3d/inference_results
OUT_FILEPREFIX="../../data/experts/vista3d/llama_gen_expert_data_vista3d_what"
python expert_train_data_cxr.py --in_datapath ${IN_DATAPATH} --root_dir ${ROOT_DIR} --out_fileprefix ${OUT_FILEPREFIX}
Here, we take a brain MRI dataset, run the BRATS segmentation model inference on it. The results can be used to generate training conversation for M3.
In our work, we used the MRI data from the BRATS 2018 challenge.
- Target: 3 tumor subregions
- Task: Segmentation
- Modality: MRI
- Size: 285 3D volumes (4 channels each)
First, you need to run model inference on your dataset. Please follow the instructions from the MONAI bundle.
Next, extract 2D training slices from the MRI images and inference results
export PYTHONPATH=${PWD}/..
DATALIST=./brats_mri_segmentation/datalist.json
INPUT_IMAGE_DIR=/path/to/BRATS2018
INPUT_LABEL_DIR=/path/to/brats_mri_segmentation_eval
OUTPUT_DIR=../../data/experts/brats/slices
python expert_extract_slices_brats.py --datalist_filename ${DATALIST} --input_image_dir ${INPUT_IMAGE_DIR} --input_label_dir ${INPUT_LABEL_DIR} --output_dir ${OUTPUT_DIR}
Finally, we combine the example images to generate a training conversation for M3.
export PYTHONPATH=${PWD}/..
META_DATA=../../data/experts/brats/slices/extracted_slices_meta.json
ROOT_DIR=../../data/experts/brats/slices
OUT_FILEPREFIX=../../data/experts/brats/llama_gen_expert_data_brats
python expert_train_data_brats.py --in_meta_data ${META_DATA} --images_root ${ROOT_DIR} --out_fileprefix ${OUT_FILEPREFIX}
For details on how to prepare training & evaluation data with an TorchXRayVision expert model ensemble, see here.