Skip to content

Commit

Permalink
Quartz sync: Jan 5, 2024, 5:23 PM
Browse files Browse the repository at this point in the history
  • Loading branch information
dubssieg committed Jan 5, 2024
1 parent f1cc15b commit ef1219a
Show file tree
Hide file tree
Showing 19 changed files with 323 additions and 33 deletions.
7 changes: 7 additions & 0 deletions content/Building a graph/minigraph-cactus.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,13 @@ The reference will satisfy the following properties:
+ Be used to divide the graph into chromosomes
One can define multiple references, but it won't help for clipping (but for filter?), cyclicity, nor nodes in forward orientation purposes.

> [!WARNING] Warning
> minigraph-cactus is **NOT RECOMMENDED** (see [this discussion](https://github.com/orgs/ComparativeGenomicsToolkit/discussions/1252)) for genomes that have a higher mash distance than 0.02 from the reference; it may [yield a warning](https://github.com/ComparativeGenomicsToolkit/cactus/blob/v2.7.0/src/cactus/refmap/cactus_minigraph.py#L288-L291) but may not do it properly.
> Solutions are :
> + Align with an aligner like **Progressive Cactus** from a tree (`mashtree` can be useful)
> + Cut down sequences to match the threshold
> + Try PGGB
### Control input sequence order

To create graph with sequence in a specific order that you can control, using the argument `minigraphSortInput="none"` disables default sorting by mash distance. It is to be specified in the cactus config file.
Expand Down
9 changes: 7 additions & 2 deletions content/Building a graph/minigraph.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,13 @@ You can specify any number of `.fasta`/`.fa` files, as well as `.gfa` graph file
+ `c` enables base-level alignment
+ `x` is to specify a preset, here `ggs`, which is a simple algorithm for incremental graph generation

> [!IMPORTANT] Publication and availability
> Publication is [available](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02168-z), and source code is available [here](https://github.com/lh3/minigraph)
The output will be in **rGFA** format, a sub-type of GFA1 that adds information about positions in the graph but removes information of genomes' origins. In rGFA, you don't have W-lines or P-lines that do serves to get the information of which fragment goes to which genome.
It's a development choice
It's a [development choice](https://github.com/lh3/minigraph/issues/27) that was made [in the formalism of rGFA](https://github.com/lh3/minigraph/issues/26), because H. Li see his tool as a way to [embed multiple genomes on a reference](https://github.com/nf-core/pangenome/issues/20), and not doing something which is reference-free.

A pull request was made in 2022, adding [P-lines support to minigraph](https://github.com/lh3/minigraph/pull/77) but was never accepted. However, one can get this version by getting the associated commit ID.

> [!WARNING] Warning
> minigraph outputs nodes prefixed with `s` ; with some tools (such as odgi) it may cause crashes.
> minigraph outputs nodes prefixed with `s` ; with some tools (such as odgi) it may cause crashes. To convert those rGFA's to standard GFA files, [you can use gfautil](https://github.com/vgteam/vg/issues/3129)
248 changes: 248 additions & 0 deletions content/Publications/Publications.canvas

Large diffs are not rendered by default.

7 changes: 5 additions & 2 deletions content/Working with graphs/Tools/bubblegun.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,10 @@ Tool for detecting bubbles and superbubbles in De-bruijn/variation graphs. Sever
- Extracting two random paths from each bubble chain for haplotyping
- Extracting information from long reads aligned to bubble chains

Publication available [here](https://academic.oup.com/bioinformatics/article/38/17/4217/6633304), source code available [here](https://github.com/fawaz-dabbaghieh/bubble_gun).
> [!IMPORTANT] Publication and availability
> Publication available [here](https://academic.oup.com/bioinformatics/article/38/17/4217/6633304), source code available [here](https://github.com/fawaz-dabbaghieh/bubble_gun).
> [!WARNING] Warning
> The function `bfs` in the package starts an infinite loop if target node is on a end of the graph.
> The function `bfs` in the package starts an infinite loop if target node is on a end of the graph.
The tool, written in **Python**, is both usable in command-line and as imports in other Python scripts/programs.
34 changes: 34 additions & 0 deletions content/Working with graphs/Tools/gfaffix.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
---
title: GFAffix
---
It aims to compress shared sequences that are distributed along multiple paths where one path should not have a fork (meaning we have two nodes that could be merged without any consequence on the graph information, for instance).

![[gfaffix-illustration.png]]

> [!IMPORTANT] Publication and availability
> GFAffix appears to be **not published as of december 2023**. A preprint is in writing (see [this issue](https://github.com/marschall-lab/GFAffix/issues/9) of GFAffix, but it was delayed.) Source code is available [here](https://github.com/marschall-lab/GFAffix).
# Installation
Requires **rust**, and is available through conda.

```bash
conda create .env-gfaffix
conda activate .env-gfaffx

conda install -c conda-forge rust
conda install -c bioconda gfaffix

conda deactivate
```

To run GFAffix, the command is: `gfaffix <input_gfa> -o <output_gfa>`.

> [!NOTE] Note
> The last step of [[pggb]] applies GFAffix (taken from the docs: "Finally, we apply gfaffix to remove forks where both alternatives have the same sequence.") and [[minigraph-cactus]] applies it in it's last step (`cactus-graphmap-join`); however, if applying GFAffix on a PGGB graph returns the same graph, it is not the case for minigraph-cactus. We can expect that GFAffix is not the last step of `cactus-graphmap-join`, or is ran with exclusion patterns.
# GFAffix and [[editions]]

From the definition of [[editions]] I came with, I wanted to see how GFAffix impacted the resulting graph and the distance to other graphs. Without any surprise as the tool is present in both pipelines, the impact of running GFAffix is marginal.

![[gfaffix_clustering.png]]

However, on graphs constructed solely using seqwish, the impact of GFAffix is not marginal: 55 editions for a graph with 820 nodes and two haplotypes
29 changes: 0 additions & 29 deletions content/Working with graphs/Tools/gfafix.md

This file was deleted.

18 changes: 18 additions & 0 deletions content/Working with graphs/catalog.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
---
title: Cataloging pangenomic tools
---
> [!NOTE] Information
> With hundred of tools are labelled as 'pangenome graph' or 'variation graph' on github, it is technically impossible to have a complete and comprehensive catalog of tools.
This section will try to cover as much tools as it can, pointing to existing catalogs and more in-depth descriptions of tools when I used them.

Known catalogs or blogs:
+ [Catalog](https://pangenome.github.io/) from the PGGB team

Tools:
+ [[bubblegun]], a bubble and superbubble caller
+ [[gfaffix]], a tool to simplify graphs
+ [[gfapy]], a python library to handle GFA format
+ [[odgi]], a toolkit for pangenomes
+ [[gfagraphs]] (own work) a library to handle GFA format
+ [[pancat]] (own work) a small toolkit for pangenomes
4 changes: 4 additions & 0 deletions content/Working with graphs/editions.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
---
title: Compare pangenome graphs
---
> [!NOTE] Information
> I am the author of `pancat`. Thus I only describe it's priciple, keep it in mind while reading this. The method was first presented [here](https://hal.science/hal-04320771v1) and is currently **not** published.
In order to asses how a graph is different from another, the idea was to compare segmentation between the two graphs.
Binary file added content/_imgs/bar_algorithm.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added content/_imgs/cactus_graph.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added content/_imgs/cactus_tree.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added content/_imgs/caf_algorithm.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added content/_imgs/gfaffix-illustration.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added content/_imgs/injection.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added content/_imgs/net_chains_threads.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added content/_imgs/progressive_cactus.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added content/_imgs/snarl_decomposition.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added content/_imgs/superbubbles.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added content/_imgs/types_of_graphs.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit ef1219a

Please sign in to comment.