R environment > 3.6.1
data.table 1.14.2
Boruta 7.0.0
ggplot2 3.3.5
ggpmisc 0.4.5
rpart 4.1-15
randomForest 4.6-14
rminer 1.4.6
reshape2 1.4.4
gridExtra 2.3
TrainingX.csv: genera abundance of training data. Each row represent one sample, and each column represent one genus
TrainingY.csv: gene abundance of training data. Each row represent one sample, and each column represent one gene
TrainingX.csv and TrainingY.csv should have the same number of rows.
RealdataX.csv: genera abundance of testing data. Each row represent one sample, and each column represent one genus
RealdataY.csv: gene abundance of testing data. Each row represent one sample, and each column represent one gene
RealdataX.csv and RealdataY.csv should have the same number of rows.
Step1: Select important variables
perl Model-step01.pl -TrainingX TrainingX.csv -TrainingY TrainingY.csv > step1.sh
sh step1.sh
Step2: Construct the models and evaluate the performance using real data
perl Model-step02.pl -TrainingX TrainingX.csv -TrainingY TrainingY.csv -RealdataX RealdataX.csv -RealdataY RealdataY.csv > step2.sh
sh step2.sh
Step3: Collect the parameters of model preformance and predictive values. "Model-performance-list.txt" was generated by step2 script
perl Model-step03.pl Model-performance-list.txt Model-Detail.csv