Skip to content

Funannotate pipeline steps Confusion #1096

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ufaroooq opened this issue Feb 27, 2025 · 0 comments
Open

Funannotate pipeline steps Confusion #1096

ufaroooq opened this issue Feb 27, 2025 · 0 comments

Comments

@ufaroooq
Copy link

HI,

I am folloowing Funannotate pipeline for annotation of fungal-genome for which i also have RNA-Seq data. I followed this tutorial on Funannotate-Docs.

I am just confused between some steps because some tools are coming up again and again and i cannot understand if i should run them over again.

So this it the pipeline i followed.

  1. Funannotate Clean: to remove conbtigs < 1000bp

  2. Funannotate sort: to sort contigs (big -> small)

  3. Repeats Annotation using Repeat Modeler/ Repeat Masker

  4. Funannotate-Train

funannotate train \
    --input $assembly_in --out $Fanno_out/$prefix/ --left $f1_32931_R1 --right $f1_32931_R2 \
    --jaccard_clip --no_trimmomatic --memory 200G --species "Fo" --isolate 33921 --cpus 94
  1. Funannotate Predict
funannotate predict --input $assembly_in -o $Fanno_out/$prefix/ \
    --species "Fo" --isolate 33921 --augustus_species fusarium --busco_seed_species fusarium --cpus 94
  1. Funannotate Update
funannotate update --input $Fanno_out/$prefix/ --jaccard_clip --no_trimmomatic \
    --species "Fo" --isolate 33921 --memory 200G --cpus 94

This is where i am confused.

  1. Interproscan:
    At this point tutorial says to run funannotate iprscan which was generating an empty file for me without any log to identify issue, mentioned in one of my old unanswered post #Interproscan creates empty iprscan.xml file Interproscan creates empty iprscan.xml file #987

So I installed Interproscan-v5.73-104.0 locally and also installed SignalP-4.1, Phobius-1.0.1 and TmHMM-2.0c as they were showing up as deprecated analysis and i wanted to include them.

interproscan \
    --input $Fanno_out/update_results/proteins.fa \
    --output-file-base $Fanno_out/interproscan_results/proteins.fa.interproscan \
    --cpu 90 --disable-precalc --goterms --iprlookup --pathways --seqtype p \
    --formats XML,TSV,GFF3 \
    --excl-applications SignalP_GRAM_NEGATIVE, SignalP_GRAM_POSITIVE \
    --tempdir $Fanno_out/interproscan_results/ --verbose

#Command:
grep -v "#" proteins.fa.interproscan.gff3 | awk '{print $2}' | sort | uniq

#Results: (what type of peatures where predicted in interproscan resulting file)
CDD, Coils, FunFam, Gene3D, Hamap, MobiDBLite, NCBIfam, PANTHER, Pfam, Phobius, PIRSF, PIRSR, PRINTS, ProSitePatterns, ProSiteProfiles, SFLD, SignalP_EUK, SMART, SUPERFAM, LY, TMHMM
  1. antiSMASH Fungi:
    I tried to run antismash via funannotate remote -i fun -m antismash -e your-email@domain.edu but it gave me errors because of the presenvce of multiple CDS with similar coordinates.

So i ran **antiSMASH-v7.1.0** which i also installed locally and ran following this command

antismash \
    --taxon fungi --cpus 94 --verbose --debug --genefinding-tool none --no-abort-on-invalid-records \
    --fullhmmer --cassis --clusterhmmer --tigrfam --asf --cc-mibig --cb-general --cb-subclusters --cb-knownclusters --pfam2go --rre --smcog-trees --tfbs \
    --output-basename antismash --output-dir $Fanno_out/$prefix/antismash_results \
    $Fanno_out/$prefix/update_results/resulting.gbk

Although, this also generated similar errors but runnign it with --no-abort-on-invalid-records ignored those scaffolds with multiple CDS. (I will look back into it for a solution but for now it worked)

  1. Phobius:
    NOw tutorial suggestes to run Phobius, and i am confused if i should run it or not as it already ran during the Interproscan step.
phobius -png -gp -plp \
    $Fanno_out/proteins.fa > $Fanno_out/phobius_results/phobius.txt

## this is still runnig and i am not sure if ir ir running or stuck
  1. SignalP:
    Should i also run SignalP individually and then pass the resulting file to Funannotate annotate step or let annotate step itselp run SignalP itself. it is installed.

  2. Funannotate annotate:
    Finally i ran withoput phobius results passed

funannotate annotate \
        --input $Fanno_out/ \
        --antismash $Fanno_out/antismash_results/antismash.gbk \
        --iprscan $Fanno_out/interproscan_results/proteins.fa.xml \
        --cpus 94

Here are the results from funannotate check --show-versions

funannotate_dependencies.txt

Questiuons:

  1. SHould i run SignalP individually ?
  2. should i run Phobius individually or one which ran with Interproscan is enough ?
  3. Is this complete approach to annotate assemblies correct or i am missing anything ?

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant