Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Support for Metadata #4

Merged
merged 13 commits into from
Jun 12, 2024
Merged

Adding Support for Metadata #4

merged 13 commits into from
Jun 12, 2024

Conversation

emarinier
Copy link
Member

This is a basic first pass of adding support for metadata.

There are other things that need to be updated in the pipeline, but this PR tries to tackle the metadata-related things and only changes other things if there very important (although such related things might need another PR).

Command

rm -rf results; nextflow run main.nf -profile docker --input tests/data/samplesheets/samplesheet.csv --outdir results -params-file assets/parameters.yaml

Input

samplesheet.csv:

sample,mlst_alleles,metadata_1,metadata_2,metadata_3,metadata_4,metadata_5,metadata_6,metadata_7,metadata_8
S1,https://raw.githubusercontent.com/phac-nml/clustersplitter/dev/tests/data/profiles/S1.mlst.json,1,"Escherichia coli","EHEC/STEC","Canada","O157:H7",21,"2024/05/30","beef"
S2,https://raw.githubusercontent.com/phac-nml/clustersplitter/dev/tests/data/profiles/S2.mlst.json,1,"Escherichia coli","EHEC/STEC","The United States","O157:H7",55,"2024/05/21","milk"
S3,https://raw.githubusercontent.com/phac-nml/clustersplitter/dev/tests/data/profiles/S3.mlst.json,2,"Escherichia coli","EPEC","France","O125",14,"2024/04/30","cheese"
S4,https://raw.githubusercontent.com/phac-nml/clustersplitter/dev/tests/data/profiles/S4.mlst.json,2,"Escherichia coli","EPEC","France","O125",35,"2024/04/22","cheese"
S5,https://raw.githubusercontent.com/phac-nml/clustersplitter/dev/tests/data/profiles/S5.mlst.json,3,"Escherichia coli","EAEC","Canada","O126:H27",61,"2012/09/01","milk"
S6,https://raw.githubusercontent.com/phac-nml/clustersplitter/dev/tests/data/profiles/S6.mlst.json,unassociated,"Escherichia coli","EAEC","Canada","O111:H21",43,"2011/12/25","fruit"

config.json

    {
        "outlier_thresh": "25",
        "clustering_method": "average",
        "clustering_threshold": "500,100,75,50,25,15,10,5,2,1,0",
        "min_cluster_members": 2,
        "partition_column_name": "outbreak",
        "id_column_name": "sample_id",
        "only_report_labeled_columns": "False",
        "skip_qa": "False",
        
        "grouped_metadata_columns":{ 
            "outbreak":{"data_type": "None","label":"National Outbreak Code","default":"","display":"True"},
            "organism":{"data_type": "None","label":"Organism","default":"","display":"True"},
            "subtype":{"data_type": "None","label":"Subtype","default":"","display":"True"},
            "country":{"data_type": "categorical","label":"Country of Collection","default":"","display":"True"},
            "serovar":{"data_type": "categorical","label":"Serovar","default":"","display":"True"},
            "age":{"data_type": "desc_stats","label":"Patient Age (years)","default":"","display":"True"},
            "date":{"data_type": "min_max","label":"Date","default":"","display":"True"},
            "source":{"data_type": "categorical","label":"Source Type","default":"","display":"True"}
        },

        "linelist_columns":{
            "sample":{"data_type": "None","label":"Sample","default":"","display":"True"},
            "outbreak":{"data_type": "None","label":"National Outbreak Code","default":"","display":"True"},
            "organism":{"data_type": "None","label":"Organism","default":"","display":"True"},
            "subtype":{"data_type": "None","label":"Subtype","default":"","display":"True"},
            "country":{"data_type": "categorical","label":"Country of Collection","default":"","display":"True"},
            "serovar":{"data_type": "categorical","label":"Serovar","default":"","display":"True"},
            "age":{"data_type": "desc_stats","label":"Patient Age (years)","default":"","display":"True"},
            "date":{"data_type": "min_max","label":"Date","default":"","display":"True"},
            "source":{"data_type": "categorical","label":"Source Type","default":"","display":"True"}
        }    
    }

parameters.yaml

partition_column: "outbreak"
metadata_1_header : "outbreak"
metadata_2_header : "organism"
metadata_3_header : "subtype"
metadata_4_header : "country"
metadata_5_header : "serovar"
metadata_6_header : "age"
metadata_7_header : "date"
metadata_8_header : "source"

Output

cluster_summary.tsv

National Outbreak Code	Organism	Subtype	Country of Collection	Serovar	Patient Age (years)	Date	Source Type	date_max_value	count_age_43	count_age_21	count_serovar_O125	count_serovar_O157:H7	count_age_55	date_min_value	count_country_The United States	max_dist	count_age_14	count_outliers	count_source_fruit	count_subtype_EAEC	min_dist	age_max_value	count_source_beef	count_serovar_O126:H27	median_dist	count_subtype_EHEC/STEC	age_min_value	count_country_France	age_mean_value	count_serovar_O111:H21	mean_dist	count_organism_Escherichia coli	count_source_milk	count_subtype_EPEC	count_members	count_age_35	count_age_61	age_median_value	count_source_cheese	count_country_Canada
1	Escherichia coli	EHEC/STEC	Canada,The United States	O157:H7	21,55	2024/05/21,2024/05/30	beef,milk	55.0	0	0	0	2	0	21.0	1	3.0	0	0	0	0	3.0	55.0	1	0	3.0	0	21.0	0	38.0	0	3.0	0	1	0	2	0	0	38.0	0	1
2	Escherichia coli	EPEC	France	O125	14,35	2024/04/22,2024/04/30	cheese	35.0	0	0	2	0	0	14.0	0	2.0	0	0	0	0	2.0	35.0	0	0	2.0	0	14.0	2	24.5	0	2.0	0	0	0	2	0	0	24.5	2	0
3	Escherichia coli	EAEC	Canada	O126:H27	61	2012/09/01	milk	61.0	0	0	0	0	0	61.0	0	0	0	0	0	0	0	61.0	0	1	0	0	61.0	0	61.0	0	0	0	1	0	1	0	0	61.0	0	1
unassociated	Escherichia coli	EAEC	Canada	O111:H21	43	2011/12/25	fruit	43.0	0	0	0	0	0	43.0	0	0	0	0	1	0	0	43.0	0	0	0	0	43.0	0	43.0	1	0	0	0	0	1	0	0	43.0	0	1

metadata.included.tsv

sample	outbreak	organism	subtype	country	serovar	age	date	source
S1	1	Escherichia coli	EHEC/STEC	Canada	O157:H7	21	2024/05/30	beef
S2	1	Escherichia coli	EHEC/STEC	The United States	O157:H7	55	2024/05/21	milk
S3	2	Escherichia coli	EPEC	France	O125	14	2024/04/30	cheese
S4	2	Escherichia coli	EPEC	France	O125	35	2024/04/22	cheese
S5	3	Escherichia coli	EAEC	Canada	O126:H27	61	2012/09/01	milk
S6	unassociated	Escherichia coli	EAEC	Canada	O111:H21	43	2011/12/25	fruit

@emarinier emarinier self-assigned this May 31, 2024
Copy link

github-actions bot commented May 31, 2024

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 0b818ab

+| ✅ 147 tests passed       |+
#| ❔  27 tests were ignored |#
!| ❗   6 tests had warnings |!

❗ Test warnings:

  • nextflow_config - Config manifest.version should end in dev: 0.1.0
  • schema_lint - Schema $id should be https://raw.githubusercontent.com/phac-nml/clustersplitter/master/nextflow_schema.json
    Found https://raw.githubusercontent.com/phac-nml/iridanextexample/main/nextflow_schema.json
  • schema_description - No description provided in schema for parameter: av_html
  • schema_description - No description provided in schema for parameter: ar_config
  • schema_description - No description provided in schema for parameter: ar_thresholds
  • nfcore_yml - nf-core version not set in .nf-core.yml

❔ Tests ignored:

  • files_exist - File is ignored: assets/nf-core-clustersplitter_logo_light.png
  • files_exist - File is ignored: docs/images/nf-core-clustersplitter_logo_light.png
  • files_exist - File is ignored: docs/images/nf-core-clustersplitter_logo_dark.png
  • files_exist - File is ignored: .github/workflows/awstest.yml
  • files_exist - File is ignored: .github/workflows/awsfulltest.yml
  • files_exist - File is ignored: lib/Utils.groovy
  • files_exist - File is ignored: lib/WorkflowMain.groovy
  • files_exist - File is ignored: lib/NfcoreTemplate.groovy
  • files_exist - File is ignored: lib/WorkflowSnvphylnfc.groovy
  • nextflow_config - Config variable ignored: manifest.name
  • nextflow_config - Config variable ignored: manifest.homePage
  • files_unchanged - File ignored due to lint config: LICENSE or LICENSE.md or LICENCE or LICENCE.md
  • files_unchanged - File ignored due to lint config: .github/CONTRIBUTING.md
  • files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/bug_report.yml
  • files_unchanged - File ignored due to lint config: .github/PULL_REQUEST_TEMPLATE.md
  • files_unchanged - File ignored due to lint config: .github/workflows/branch.yml
  • files_unchanged - File ignored due to lint config: .github/workflows/linting.yml
  • files_unchanged - File ignored due to lint config: assets/email_template.html
  • files_unchanged - File ignored due to lint config: assets/email_template.txt
  • files_unchanged - File ignored due to lint config: assets/sendmail_template.txt
  • files_unchanged - File does not exist: assets/nf-core-clustersplitter_logo_light.png
  • files_unchanged - File does not exist: docs/images/nf-core-clustersplitter_logo_light.png
  • files_unchanged - File does not exist: docs/images/nf-core-clustersplitter_logo_dark.png
  • files_unchanged - File ignored due to lint config: docs/README.md
  • actions_awstest - 'awstest.yml' workflow not found: /home/runner/work/clustersplitter/clustersplitter/.github/workflows/awstest.yml
  • actions_awsfulltest - actions_awsfulltest
  • pipeline_name_conventions - pipeline_name_conventions

✅ Tests passed:

Run details

  • nf-core/tools version 2.14.1
  • Run at 2024-06-10 17:51:41

@emarinier emarinier requested review from mattheww95 and apetkau May 31, 2024 19:46
Copy link
Collaborator

@mattheww95 mattheww95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@emarinier emarinier requested a review from kylacochrane June 4, 2024 16:14
Copy link
Contributor

@kylacochrane kylacochrane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good Eric 👍

Copy link
Member

@apetkau apetkau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much @emarinier . This is great 😄 . Comments below

Copy link
Member

@apetkau apetkau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me. Thanks so much @emarinier 😄

@emarinier emarinier merged commit 2ed8f1d into dev Jun 12, 2024
4 checks passed
@apetkau apetkau deleted the metadata branch August 19, 2024 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants