Skip to content

Inland aquatic resistome genes abundance prediction from genera community by machine learning methods

Notifications You must be signed in to change notification settings

orctyr/InlandML

Repository files navigation

Please note that this pipeline needs R packages.

R Packages Requirements:

R environment > 3.6.1
data.table 1.14.2
Boruta 7.0.0
ggplot2 3.3.5
ggpmisc 0.4.5
rpart 4.1-15
randomForest 4.6-14 
rminer 1.4.6
reshape2 1.4.4
gridExtra 2.3

Input file description

TrainingX.csv: genera abundance of training data. Each row represent one sample, and each column represent one genus

TrainingY.csv: gene abundance of training data. Each row represent one sample, and each column represent one gene

TrainingX.csv and TrainingY.csv should have the same number of rows.

RealdataX.csv: genera abundance of testing data. Each row represent one sample, and each column represent one genus

RealdataY.csv: gene abundance of testing data. Each row represent one sample, and each column represent one gene

RealdataX.csv and RealdataY.csv should have the same number of rows.

pipeline usage

Step1: Select important variables

perl Model-step01.pl -TrainingX TrainingX.csv -TrainingY TrainingY.csv > step1.sh
sh step1.sh

Step2: Construct the models and evaluate the performance using real data

perl Model-step02.pl -TrainingX TrainingX.csv -TrainingY TrainingY.csv -RealdataX RealdataX.csv -RealdataY RealdataY.csv > step2.sh
sh step2.sh

Step3: Collect the parameters of model preformance and predictive values. "Model-performance-list.txt" was generated by step2 script

perl Model-step03.pl Model-performance-list.txt Model-Detail.csv

About

Inland aquatic resistome genes abundance prediction from genera community by machine learning methods

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages