diff --git a/content/Building a graph/minigraph.md b/content/Building a graph/minigraph.md index ca018f6ff2a2e..3654892ee1bef 100644 --- a/content/Building a graph/minigraph.md +++ b/content/Building a graph/minigraph.md @@ -23,4 +23,6 @@ It's a [development choice](https://github.com/lh3/minigraph/issues/27) that was A pull request was made in 2022, adding [P-lines support to minigraph](https://github.com/lh3/minigraph/pull/77) but was never accepted. However, one can get this version by getting the associated commit ID. > [!WARNING] Warning -> minigraph outputs nodes prefixed with `s` ; with some tools (such as odgi) it may cause crashes. To convert those rGFA's to standard GFA files, [you can use gfautil](https://github.com/vgteam/vg/issues/3129) \ No newline at end of file +> minigraph outputs nodes prefixed with `s` ; with some tools (such as odgi) it may cause crashes. To convert those rGFA's to standard GFA files, [you can use gfautil](https://github.com/vgteam/vg/issues/3129) + +It may be possible to get some kind of paths in a rGFA using `vg convert` according to [this answer](https://github.com/pangenome/odgi/issues/546#issuecomment-1893382366) \ No newline at end of file diff --git a/content/Pancat and GfaGraphs/GfaGraphs/gfagraphs.md b/content/Pancat and GfaGraphs/GfaGraphs/gfagraphs.md index 0302de3325901..c78189a6df929 100644 --- a/content/Pancat and GfaGraphs/GfaGraphs/gfagraphs.md +++ b/content/Pancat and GfaGraphs/GfaGraphs/gfagraphs.md @@ -1,4 +1,7 @@ --- title: "GfaGraphs: Python abstraction layer for GFA graph format" --- -![[library_flowchart.png]] \ No newline at end of file +![[library_flowchart.png]] +Known limitations: ++ As of now, not scaling well in terms of memory for huge graphs (like full HPRC) as 256G of RAM is not sufficient to load PGGB and MGC graphs in memory at the same time ++ Takes a long time to load huge graphs (many hours for HPRC aswell) \ No newline at end of file diff --git a/content/Publications/Generalities.canvas b/content/Publications/Generalities.canvas deleted file mode 100644 index 78a171aff1989..0000000000000 --- a/content/Publications/Generalities.canvas +++ /dev/null @@ -1,7 +0,0 @@ -{ - "nodes":[ - {"id":"8057802cee776f46","x":-80,"y":-380,"width":305,"height":90,"type":"text","text":"Alignement sur pangénome de référence pour identification rapide et efficace de souches"}, - {"id":"7199ad6e27a9f6b8","x":-80,"y":-260,"width":305,"height":100,"type":"text","text":""} - ], - "edges":[] -} \ No newline at end of file diff --git a/content/Useful commands/sequences.md b/content/Useful commands/sequences.md index 9d2dbb5c8812b..c63ea711cbb75 100644 --- a/content/Useful commands/sequences.md +++ b/content/Useful commands/sequences.md @@ -4,4 +4,10 @@ title: Interact with sequences Get statistics on sequences: + awk command to get the size of all lectures in a file : `awk '/^>/{if (l!="") print l; print; l=0; next}{l+=length($0)}END{print l}' unique.fasta |paste - -` + samtools command to index a file (useful for pggb): `samtools faidx myfile.fasta` -+ Replace string in file: `"s/thing_to_replace/thing_replacing/g" file > out` \ No newline at end of file ++ Replace string in file: `"s/thing_to_replace/thing_replacing/g" file > out` + +Get fast stats on a GFA file: ++ Print all number of lines types: ` $OUTPUT +# INPUT is a .og file +# OUTPUT is a new .gfa file +# -g stands for "convert to gfa" +``` +# Python bindings + +> [!WARNING] Warning +> It exists an older implementation of bindings, which is the one referenced in the readsthedocs.io, HOWEVER it is not [the one which should be used](https://github.com/pangenome/odgi/blob/master/test/python/odgi_ffi.md) for [performance reasons](https://github.com/pangenome/odgi/blob/master/test/python/odgi_performance.md) as well as stability issues... + +According to the documentation, `odgi_ffi` is meant to be used more as a tool to build a Python library than being the actual Python library. + +> Note that odgi also has an older high-level Python API `import odgi` that is somewhat obsolete. Instead you should probably use below `import odgi_ffi` lower level API to construct your own library. + +In order to fix segfaults, set `LD_PRELOAD=libjemalloc.so.2` before running Python scripts. However, I could not get it to work in any way, as if I can as the time I'm writing those lines import the bindings, I could not load a graph, giving an error when I try to (`RuntimeError: Error rewinding to load non-magic-prefixed SerializableHandleGraph`) . Given the maintainers are not implying that the [library is not fully stable](https://github.com/pangenome/odgi/issues/425#issuecomment-1305566300) I won't settle on it for the future. \ No newline at end of file diff --git a/content/Working with graphs/Tools/vg.md b/content/Working with graphs/Tools/vg.md new file mode 100644 index 0000000000000..114354288b1f3 --- /dev/null +++ b/content/Working with graphs/Tools/vg.md @@ -0,0 +1,16 @@ +--- +title: VG toolkit +--- +> [!WARNING] Warning +> vg commands on graphs that are compressed **does not work**. It will raise a 'invalid graph type' error. +## Convert from GFA1.1 to GFA1 + +`vg convert in.gfa -W -f > out.gfa` ++ `-W` stands for suppress W-lines ++ `-f` is to output to file +## Convert from vg, json to GFA +`vg view [-J|-V|-F] input_graph -g > out.gfa` + +## Call bubbles on graph to get variants +`vg deconstruct -p ref graph.gfa > variants.vcf` ++ `-p [STR]` stands for the path to use as reference to call variants \ No newline at end of file diff --git a/content/Working with graphs/visualize.md b/content/Working with graphs/visualize.md index a8f4b0b1665b2..d9e388fb05339 100644 --- a/content/Working with graphs/visualize.md +++ b/content/Working with graphs/visualize.md @@ -4,8 +4,9 @@ title: How to visualize a pangenome ## Dynamic representations + [gfaviz](https://github.com/ggonnella/gfaviz) by ggonnella. Supports GFA1 and GFA2 formats -+ [pancat](https://github.com/Tharos-ux/pancat) *own work*. Supports theorically all GFA types, feel free to open issues! ++ [pancat](https://github.com/Tharos-ux/pancat) (*own work*). Supports theorically all GFA types, feel free to open issues! + [bandage](https://rrwick.github.io/Bandage/) by R. Wick. ++ [gfaestus](https://github.com/chfi/gfaestus) by C. Fischer. ## Static representations diff --git a/content/_imgs/Pasted image 20240115144532.png b/content/_imgs/Pasted image 20240115144532.png new file mode 100644 index 0000000000000..13700a4edbfc5 Binary files /dev/null and b/content/_imgs/Pasted image 20240115144532.png differ diff --git a/content/_notes/feedback_pangenomes.md b/content/_notes/feedback_pangenomes.md new file mode 100644 index 0000000000000..aa4d937d3c8e4 --- /dev/null +++ b/content/_notes/feedback_pangenomes.md @@ -0,0 +1,49 @@ +--- +title: Feedback on pangenome graph construction +--- +Our object : a 'variation graph' (which is not a De Bruijn graph) which contains nodes with labels. + +To construct: ++ pairwise alignment + + with software designed for full-genome alignment + + MSA (multiple sequence alignment) ++ graph construction + + create nodes and edges + + save paths ++ post-process (optionnal) + + pruning + + `gfaffix` at some point + + topological simplification + + compression + + ... + +Used tools today: ++ Variation Graph (VG) ++ Minigraph (MG) + + From an alignment `minigraph --ggen -L -c ` + + The graph is relative to reference: if we can't align on it, we don't put it in graph + + L parameter lowered makes minigraph much slower and yield issues + + Higher L parameters can help align more diverging sequences ++ Minigraph-Cactus (MGC) + + It is possible to give a guide tree + + High level SV graph from MG + + This graph is used as backbone + + Put something as 'reference': this sequence won't be clipped nor cycled ++ PanGenome Graph Builder (PGGB) + + Curate data before to disassemble chromosomes (tutorials available, where?) + + Huge possibilities: how to cluster chromosomes that are close together? + + Use of `wfmash` for pairwise all-vs-all alignment + + For graph induction: `seqwish` + + Smoothing with `smoothxg` + + May add paths that are not even describing a genome? + + Notion of consensus path elaborated [here](https://github.com/pangenome/smoothxg/issues/37) + + Keeps a consensus and destroys some paths that does not follow + + From the author: + + Many things should be removed + + As of now, they don't even use it internally + + Output of seqwish: should be default output but very large file + + Problem: algorithms like stochastic gradient descent on multi-thread implies that 'seeds' are not fixed: we can have different graphs from the same data + ![[Pasted image 20240115144532.png]] + + Post process with `gfaffix` and `odgi` + +Cycles are a problem for future usage of graphs. Implement a tool to 'linearize' a graph? \ No newline at end of file diff --git a/content/_publications/A draft human pangenome reference.md b/content/_publications/A draft human pangenome reference.md new file mode 100644 index 0000000000000..a20aca16f538e --- /dev/null +++ b/content/_publications/A draft human pangenome reference.md @@ -0,0 +1,3 @@ +URL: https://www.nature.com/articles/s41586-023-05896-x + +Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals1 . These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample. \ No newline at end of file diff --git a/content/_publications/Building pangenome graphs.md b/content/_publications/Building pangenome graphs.md new file mode 100644 index 0000000000000..13b4f1f8c6135 --- /dev/null +++ b/content/_publications/Building pangenome graphs.md @@ -0,0 +1,3 @@ +URL: https://www.biorxiv.org/content/10.1101/2023.04.05.535718v1 + +Pangenome graphs can represent all variation between multiple genomes, but existing methods for constructing them are biased due to reference-guided approaches. In response, we have developed PanGenome Graph Builder (PGGB), a reference-free pipeline for constructing unbi-ased pangenome graphs. PGGB uses all-to-all whole-genome alignments and learned graph embeddings to build and iteratively refine a model in which we can identify variation, measure conservation, detect recombination events, and infer phylogenetic relationships. \ No newline at end of file diff --git a/content/_publications/Cactus - Algorithms for genome multiple sequence alignment.md b/content/_publications/Cactus - Algorithms for genome multiple sequence alignment.md new file mode 100644 index 0000000000000..a449f24fe1193 --- /dev/null +++ b/content/_publications/Cactus - Algorithms for genome multiple sequence alignment.md @@ -0,0 +1,3 @@ +DOI : https://doi.org/10.1101%2Fgr.123356.111 + +Much attention has been given to the problem of creating reliable multiple sequence alignments in a model incorporating substitutions, insertions, and deletions. Far less attention has been paid to the problem of optimizing alignments in the presence of more general rearrangement and copy number variation. Using Cactus graphs, recently introduced for representing sequence alignments, we describe two complementary algorithms for creating genomic alignments. We have implemented these algorithms in the new “Cactus” alignment program. We test Cactus using the Evolver genome evolution simulator, a comprehensive new tool for simulation, and show using these and existing simulations that Cactus significantly outperforms all of its peers. Finally, we make an empirical assessment of Cactus's ability to properly align genes and find interesting cases of intra-gene duplication within the primates. \ No newline at end of file diff --git a/content/_publications/Cactus Graphs for Genome Comparisons.md b/content/_publications/Cactus Graphs for Genome Comparisons.md new file mode 100644 index 0000000000000..bf6305f114411 --- /dev/null +++ b/content/_publications/Cactus Graphs for Genome Comparisons.md @@ -0,0 +1,8 @@ +DOI: 10.1089/cmb.2010.0252 + +We introduce a data structure, analysis, and visualization scheme called a cactus graph for +comparing sets of related genomes. In common with multi-break point graphs and A-Bruijn +graphs, cactus graphs can represent duplications and general genomic rearrangements, but +additionally, they naturally decompose the common substructures in a set of related genomes +into a hierarchy of chains that can be visualized as two-dimensional multiple alignments and +nets that can be visualized in circular genome plots. \ No newline at end of file diff --git a/content/_publications/Construction and representation of human pangenome graphs.md b/content/_publications/Construction and representation of human pangenome graphs.md new file mode 100644 index 0000000000000..61948a4bd6b2c --- /dev/null +++ b/content/_publications/Construction and representation of human pangenome graphs.md @@ -0,0 +1,3 @@ +URL: https://pasteur.hal.science/pasteur-04126278/ + +As a single reference genome cannot possibly represent all the variation present across human individuals, pangenome graphs have been introduced to incorporate population diversity within a wide range of genomic analyses. Several data structures have been proposed for representing collections of genomes as pangenomes, in particular graphs. In this work we collect all publicly available high-quality human haplotypes and constructed the largest human pangenome graphs to date, incorporating 52 individuals in addition to two synthetic references (CHM13 and GRCh38). We build variation graphs and de Bruijn graphs of this collection using five of the state-of-the-art tools: Bifrost , mdbg , Minigraph , Minigraph-Cactus and pggb . We examine differences in the way each of these tools represents variations between input sequences, both in terms of overall graph structure and representation of specific genetic loci. This work sheds light on key differences between pangenome graph representations, informing end-users on how to select the most appropriate graph type for their application. \ No newline at end of file diff --git a/content/_publications/Distance indexing and seed clustering in sequence graphs.md b/content/_publications/Distance indexing and seed clustering in sequence graphs.md new file mode 100644 index 0000000000000..7298b55a22b01 --- /dev/null +++ b/content/_publications/Distance indexing and seed clustering in sequence graphs.md @@ -0,0 +1,12 @@ +URL: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7355256/pdf/btaa446.pdf + +Motivation: Graph representations of genomes are capable of expressing more genetic variation and can therefore +better represent a population than standard linear genomes. However, due to the greater complexity of genome +graphs relative to linear genomes, some functions that are trivial on linear genomes become much more difficult in +genome graphs. Calculating distance is one such function that is simple in a linear genome but complicated in a +graph context. In read mapping algorithms such distance calculations are fundamental to determining if seed align- +ments could belong to the same mapping. +Results: We have developed an algorithm for quickly calculating the minimum distance between positions on a se- +quence graph using a minimum distance index. We have also developed an algorithm that uses the distance index +to cluster seeds on a graph. We demonstrate that our implementations of these algorithms are efficient and practical +to use for a new generation of mapping algorithms based upon genome graphs. \ No newline at end of file diff --git a/content/_publications/GBZ file format for pangenome graphs.md b/content/_publications/GBZ file format for pangenome graphs.md new file mode 100644 index 0000000000000..aedc7690bbbaf --- /dev/null +++ b/content/_publications/GBZ file format for pangenome graphs.md @@ -0,0 +1,4 @@ +URL: https://pubmed.ncbi.nlm.nih.gov/36179091/ + +**Motivation:** Pangenome graphs representing aligned genome assemblies are being shared in the text-based Graphical Fragment Assembly format. As the number of assemblies grows, there is a need for a file format that can store the highly repetitive data space efficiently. +**Results:** We propose the GBZ file format based on data structures used in the Giraffe short-read aligner. The format provides good compression, and the files can be efficiently loaded into in-memory data structures. We provide compression and decompression tools and libraries for using GBZ graphs, and we show that they can be efficiently used on a variety of systems. \ No newline at end of file diff --git a/content/_publications/Gap-Sensitive Colinear Chaining Algorithms for Acyclic Pangenome Graphs.md b/content/_publications/Gap-Sensitive Colinear Chaining Algorithms for Acyclic Pangenome Graphs.md new file mode 100644 index 0000000000000..2dcf4080f91af --- /dev/null +++ b/content/_publications/Gap-Sensitive Colinear Chaining Algorithms for Acyclic Pangenome Graphs.md @@ -0,0 +1,3 @@ +URL: https://www.liebertpub.com/doi/10.1089/cmb.2023.0186 + +A pangenome graph can serve as a better reference for genomic studies because it allows a compact representation of multiple genomes within a species. Aligning sequences to a graph is critical for pangenome-based resequencing. The seed-chain-extend heuristic works by finding short exact matches between a sequence and a graph. In this heuristic, colinear chaining helps identify a good cluster of exact matches that can be combined to form an alignment. Colinear chaining algorithms have been extensively studied for aligning two sequences with various gap costs, including linear, concave, and convex cost functions. However, extending these algorithms for sequence-to-graph alignment presents significant challenges. Recently, Makinen et al. introduced a sparse dynamic programming framework that exploits the small path cover property of acyclic pangenome graphs, enabling efficient chaining. However, this framework does not consider gap costs, limiting its practical effectiveness. We address this limitation by developing novel problem formulations and provably good chaining algorithms that support a variety of gap cost functions. These functions are carefully designed to enable fast chaining algorithms whose time requirements are parameterized in terms of the size of the minimum path cover. Through an empirical evaluation, we demonstrate the superior performance of our algorithm compared with existing aligners. When mapping simulated long reads to a pangenome graph comprising 95 human haplotypes, we achieved 98.7% precision while leaving <2% of reads unmapped. \ No newline at end of file diff --git a/content/_publications/Movi - a fast and cache-efficient full-text pangenome index.md b/content/_publications/Movi - a fast and cache-efficient full-text pangenome index.md new file mode 100644 index 0000000000000..d4112e1db0552 --- /dev/null +++ b/content/_publications/Movi - a fast and cache-efficient full-text pangenome index.md @@ -0,0 +1,3 @@ +URL : https://www.biorxiv.org/content/10.1101/2023.11.04.565615v1.full + +Efficient pangenome indexes are promising tools for many applications, including rapid classification of nanopore sequencing reads. Recently, a compressed-index data structure called the “move structure” was proposed as an alternative to other BWT-based indexes like the FM index and r-index. The move structure uniquely achieves both O(r) space and O(1)-time queries, where r is the number of runs in the pangenome BWT. We implemented Movi, an efficient tool for building and querying move-structure pangenome indexes. While the size of the Movi’s index is larger than the r-index, it scales at a smaller rate for pangenome references, as its size is exactly proportional to r, the number of runs in the BWT of the reference. Movi can compute sophisticated matching queries needed for classification – such as pseudo-matching lengths – at least ten times faster than the fastest available methods. Movi achieves this speed by leveraging the move structure’s strong locality of reference, incurring close to the minimum possible number of cache misses for queries against large pangenomes. Movi’s fast constant-time query loop makes it well suited to real-time applications like adaptive sampling for nanopore sequencing, where decisions must be made in a small and predictable time interval. \ No newline at end of file diff --git a/content/_publications/Pangenome graph construction from genome alignments with Minigraph-Cactus.md b/content/_publications/Pangenome graph construction from genome alignments with Minigraph-Cactus.md new file mode 100644 index 0000000000000..66c19e92eb0f4 --- /dev/null +++ b/content/_publications/Pangenome graph construction from genome alignments with Minigraph-Cactus.md @@ -0,0 +1,3 @@ +URL: https://www.nature.com/articles/s41587-023-01793-w + +Pangenome references address biases of reference genomes by storing a representative set of diverse haplotypes and their alignment, usually as a graph. Alternate alleles determined by variant callers can be used to construct pangenome graphs, but advances in long-read sequencing are leading to widely available, high-quality phased assemblies. Constructing a pangenome graph directly from assemblies, as opposed to variant calls, leverages the graph’s ability to represent variation at different scales. Here we present the Minigraph-Cactus pangenome pipeline, which creates pangenomes directly from whole-genome alignments, and demonstrate its ability to scale to 90 human haplotypes from the Human Pangenome Reference Consortium. The method builds graphs containing all forms of genetic variation while allowing use of current mapping and genotyping tools. We measure the effect of the quality and completeness of reference genomes used for analysis within the pangenomes and show that using the CHM13 reference from the Telomere-to-Telomere Consortium improves the accuracy of our methods. We also demonstrate construction of a _Drosophila melanogaster_ pangenome. \ No newline at end of file diff --git a/content/_publications/Progressive Cactus is a multiple-genome aligner for the thousand-genome era.md b/content/_publications/Progressive Cactus is a multiple-genome aligner for the thousand-genome era.md new file mode 100644 index 0000000000000..55207b862b9fb --- /dev/null +++ b/content/_publications/Progressive Cactus is a multiple-genome aligner for the thousand-genome era.md @@ -0,0 +1,3 @@ +DOI: https://www.nature.com/articles/s41586-020-2871-y + +New genome assemblies have been arriving at a rapidly increasing pace, thanks to decreases in sequencing costs and improvements in third-generation sequencing technologies[1](https://www.nature.com/articles/s41586-020-2871-y#ref-CR1 "Eid, J. et al. Real-time DNA sequencing from single polymerase molecules. Science 323, 133–138 (2009)."),[2](https://www.nature.com/articles/s41586-020-2871-y#ref-CR2 "Weisenfeld, N. I., Kumar, V., Shah, P., Church, D. M. & Jaffe, D. B. Direct determination of diploid genome sequences. Genome Res. 27, 757–767 (2017)."),[3](https://www.nature.com/articles/s41586-020-2871-y#ref-CR3 "Jain, M., Olsen, H. E., Paten, B. & Akeson, M. The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biol. 17, 239 (2016)."). For example, the number of vertebrate genome assemblies currently in the NCBI (National Center for Biotechnology Information) database[4](https://www.nature.com/articles/s41586-020-2871-y#ref-CR4 "Kitts, P. A. et al. Assembly: a resource for assembled genomes at NCBI. Nucleic Acids Res. 44 (D1), D73–D80 (2016).") increased by more than 50% to 1,485 assemblies in the year from July 2018 to July 2019. In addition to this influx of assemblies from different species, new human de novo assemblies[5](https://www.nature.com/articles/s41586-020-2871-y#ref-CR5 "Jain, M. et al. Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat. Biotechnol. 36, 338–345 (2018).") are being produced, which enable the analysis of not only small polymorphisms, but also complex, large-scale structural differences between human individuals and haplotypes. This coming era and its unprecedented amount of data offer the opportunity to uncover many insights into genome evolution but also present challenges in how to adapt current analysis methods to meet the increased scale. Cactus[6](https://www.nature.com/articles/s41586-020-2871-y#ref-CR6 "Paten, B. et al. Cactus: algorithms for genome multiple sequence alignment. Genome Res. 21, 1512–1528 (2011)."), a reference-free multiple genome alignment program, has been shown to be highly accurate, but the existing implementation scales poorly with increasing numbers of genomes, and struggles in regions of highly duplicated sequences. Here we describe progressive extensions to Cactus to create Progressive Cactus, which enables the reference-free alignment of tens to thousands of large vertebrate genomes while maintaining high alignment quality. We describe results from an alignment of more than 600 amniote genomes, which is to our knowledge the largest multiple vertebrate genome alignment created so far. \ No newline at end of file diff --git a/content/Publications/Publications.canvas b/content/_publications/Publications.canvas similarity index 70% rename from content/Publications/Publications.canvas rename to content/_publications/Publications.canvas index e49bc630fcb22..d3626c3825a9d 100644 --- a/content/Publications/Publications.canvas +++ b/content/_publications/Publications.canvas @@ -2,7 +2,6 @@ "nodes":[ {"id":"9ee4e8e1010c6b03","type":"text","text":"Permet de subdiviser le problème en sous-problèmes indépendants","x":1400,"y":-140,"width":400,"height":70}, {"id":"167906313c410a7c","type":"text","text":"Depuis n'importe quel noeud du graphe, cette propriété permet de définir une structure d'arbre hiérarchique récursive.","x":1400,"y":-380,"width":400,"height":100}, - {"id":"58bad2b9c05a7f76","type":"text","text":"# Cactus Graphs for Genome Comparisons\nDOI: 10.1089/cmb.2010.0252\n\nWe introduce a data structure, analysis, and visualization scheme called a cactus graph for\ncomparing sets of related genomes. In common with multi-break point graphs and A-Bruijn\ngraphs, cactus graphs can represent duplications and general genomic rearrangements, but\nadditionally, they naturally decompose the common substructures in a set of related genomes\ninto a hierarchy of chains that can be visualized as two-dimensional multiple alignments and\nnets that can be visualized in circular genome plots.","x":-870,"y":-1380,"width":780,"height":300,"color":"4"}, {"id":"919383e2a232b1fe","type":"text","text":"Break-point graphs, multi-breakpoint graphs, A-Brujin graphs","x":680,"y":-1500,"width":555,"height":60}, {"id":"a6826d2c452750ad","type":"text","text":"Souvent NP difficile pour 3 génomes ou plus","x":680,"y":-1410,"width":555,"height":50}, {"id":"884872dc5076dca6","type":"text","text":"Préalablement, la notion d'intervalles conservés au sein de set de permutations signées a été montrée, et il a été montré que ces intervalles pouvaient à la fois être imbriqués et organisés en séquences. Cela permettait à la structure d'avoir une forme semblable à un arbre, efficace pour le calcul comme le stockage.","x":80,"y":-1290,"width":440,"height":210}, @@ -16,7 +15,6 @@ {"id":"66a9ce421a4c5ec9","type":"text","text":"Alignement multi-séquence peut être représenté sous forme de matrice ou de DAGs, mais les réarrangements à grande échelle mettent à mal ces approches.","x":80,"y":-1440,"width":435,"height":120}, {"id":"acbc449c2d39d522","type":"text","text":"Un *thread* est un chemin d'*adjacencies* et segments alternés connectés par des *caps* qui est encadré par des *adjacencies* connectées à des *caps*","x":604,"y":-1050,"width":460,"height":120}, {"id":"1e1e30ad4bcc8914","type":"text","text":"Un *net* est un graphe où tous les noeuds sont des *end* et chaque arête représente un set d'adjacences entre les deux *caps* qu'il connecte","x":1064,"y":-900,"width":405,"height":120}, - {"id":"53c74e047c34032d","type":"text","text":"Papier très clair sur les définitions, nécessaire pour comprendre le suivant","x":-900,"y":-1440,"width":370,"height":80,"color":"3"}, {"id":"3aba27866e336e46","type":"text","text":"Dense et difficile à interpréter, les arêtes sont des géodésiques qui traversent les noeuds","x":1592,"y":-610,"width":360,"height":110}, {"id":"1041d25cf9b9250b","type":"text","text":"Peuvent être décomposés en composantes plus petites","x":1592,"y":-490,"width":360,"height":60}, {"id":"c8d5f84f9751bd4c","type":"file","file":"_imgs/net_chains_threads.jpg","x":1500,"y":-1026,"width":400,"height":391}, @@ -69,12 +67,7 @@ {"id":"c7e28ab1b6d1dfff","type":"text","text":"Une chaîne (cyclique) est une séquence cyclique de *chain pairs* dans le même cycle dans le graphe cactus et ordonnés dans ce cycle","x":-1337,"y":2560,"width":425,"height":140}, {"id":"c54f8b5a1531dfe0","type":"text","text":"Une séquence maximum de *bridge pairs* connectés par des noeuds incidents de degré 2 est une *(acyclic) chain*","x":-1712,"y":2668,"width":335,"height":120}, {"id":"cdfed778307857a7","type":"text","text":"Une paire distincte de noeuds dans le graphe bidirigé sont une *bridge pair* si ils projettent sur le même noeud et que leurs deux arêtes noires incidentes sont des *bridges*","x":-1712,"y":2478,"width":335,"height":164}, - {"id":"bd2ac0b855e308f1","type":"text","text":"# Progressive Cactus is a multiple-genome aligner for the thousand-genome era\nDOI: https://www.nature.com/articles/s41586-020-2871-y\n\nNew genome assemblies have been arriving at a rapidly increasing pace, thanks to decreases in sequencing costs and improvements in third-generation sequencing technologies[1](https://www.nature.com/articles/s41586-020-2871-y#ref-CR1 \"Eid, J. et al. Real-time DNA sequencing from single polymerase molecules. Science 323, 133–138 (2009).\"),[2](https://www.nature.com/articles/s41586-020-2871-y#ref-CR2 \"Weisenfeld, N. I., Kumar, V., Shah, P., Church, D. M. & Jaffe, D. B. Direct determination of diploid genome sequences. Genome Res. 27, 757–767 (2017).\"),[3](https://www.nature.com/articles/s41586-020-2871-y#ref-CR3 \"Jain, M., Olsen, H. E., Paten, B. & Akeson, M. The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biol. 17, 239 (2016).\"). For example, the number of vertebrate genome assemblies currently in the NCBI (National Center for Biotechnology Information) database[4](https://www.nature.com/articles/s41586-020-2871-y#ref-CR4 \"Kitts, P. A. et al. Assembly: a resource for assembled genomes at NCBI. Nucleic Acids Res. 44 (D1), D73–D80 (2016).\") increased by more than 50% to 1,485 assemblies in the year from July 2018 to July 2019. In addition to this influx of assemblies from different species, new human de novo assemblies[5](https://www.nature.com/articles/s41586-020-2871-y#ref-CR5 \"Jain, M. et al. Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat. Biotechnol. 36, 338–345 (2018).\") are being produced, which enable the analysis of not only small polymorphisms, but also complex, large-scale structural differences between human individuals and haplotypes. This coming era and its unprecedented amount of data offer the opportunity to uncover many insights into genome evolution but also present challenges in how to adapt current analysis methods to meet the increased scale. Cactus[6](https://www.nature.com/articles/s41586-020-2871-y#ref-CR6 \"Paten, B. et al. Cactus: algorithms for genome multiple sequence alignment. Genome Res. 21, 1512–1528 (2011).\"), a reference-free multiple genome alignment program, has been shown to be highly accurate, but the existing implementation scales poorly with increasing numbers of genomes, and struggles in regions of highly duplicated sequences. Here we describe progressive extensions to Cactus to create Progressive Cactus, which enables the reference-free alignment of tens to thousands of large vertebrate genomes while maintaining high alignment quality. We describe results from an alignment of more than 600 amniote genomes, which is to our knowledge the largest multiple vertebrate genome alignment created so far.","x":-3180,"y":2920,"width":1188,"height":417,"color":"4"}, {"id":"800d68620586fe43","type":"text","text":"Ajout d'un *input guide tree* qui permet de divisier le problème général en une série de sous-problèmes","x":-2450,"y":3680,"width":425,"height":87,"color":"1"}, - {"id":"464d3a859a82198b","type":"text","text":"# Cactus: Algorithms for genome multiple sequence alignment\nDOI : https://doi.org/10.1101%2Fgr.123356.111\n\nMuch attention has been given to the problem of creating reliable multiple sequence alignments in a model incorporating substitutions, insertions, and deletions. Far less attention has been paid to the problem of optimizing alignments in the presence of more general rearrangement and copy number variation. Using Cactus graphs, recently introduced for representing sequence alignments, we describe two complementary algorithms for creating genomic alignments. We have implemented these algorithms in the new “Cactus” alignment program. We test Cactus using the Evolver genome evolution simulator, a comprehensive new tool for simulation, and show using these and existing simulations that Cactus significantly outperforms all of its peers. Finally, we make an empirical assessment of Cactus's ability to properly align genes and find interesting cases of intra-gene duplication within the primates.","x":-1151,"y":-277,"width":920,"height":360,"color":"4"}, - {"id":"c17ec7ef81f35b4b","type":"text","text":"# Superbubbles, ultrabubbles and cacti\nURL : https://pubmed.ncbi.nlm.nih.gov/29461862/\n\nA superbubble is a type of directed acyclic subgraph with single distinct source and sink vertices. In genome assembly and genetics, the possible paths through a superbubble can be considered to represent the set of possible sequences at a location in a genome. Bidirected and biedged graphs are a generalization of digraphs that are increasingly being used to more fully represent genome assembly and variation problems. In this study, we define snarls and ultrabubbles, generalizations of superbubbles for bidirected and biedged graphs, and give an efficient algorithm for the detection of these more general structures. Key to this algorithm is the cactus graph, which, we show, encodes the nested decomposition of a graph into snarls and ultrabubbles within its structure. We propose and demonstrate empirically that this decomposition on bidirected and biedged graphs solves a fundamental problem by defining genetic sites for any collection of genomic variations, including complex structural variations, without need for any single reference genome coordinate system. Further, the nesting of the decomposition gives a natural way to describe and model variations contained within large variations, a case not currently dealt with by existing formats (e.g., variant cell format (VCF)).","x":-1420,"y":1314,"width":1220,"height":360,"color":"4"}, - {"id":"8392d7487ecced04","type":"text","text":"cacti jamais défini ?","x":-441,"y":1284,"width":250,"height":60,"color":"3"}, - {"id":"8ff05f3c7e0b7038","type":"text","text":"# Pangenome graph construction from genome alignments with Minigraph-Cactus\nURL: https://www.nature.com/articles/s41587-023-01793-w\n\nPangenome references address biases of reference genomes by storing a representative set of diverse haplotypes and their alignment, usually as a graph. Alternate alleles determined by variant callers can be used to construct pangenome graphs, but advances in long-read sequencing are leading to widely available, high-quality phased assemblies. Constructing a pangenome graph directly from assemblies, as opposed to variant calls, leverages the graph’s ability to represent variation at different scales. Here we present the Minigraph-Cactus pangenome pipeline, which creates pangenomes directly from whole-genome alignments, and demonstrate its ability to scale to 90 human haplotypes from the Human Pangenome Reference Consortium. The method builds graphs containing all forms of genetic variation while allowing use of current mapping and genotyping tools. We measure the effect of the quality and completeness of reference genomes used for analysis within the pangenomes and show that using the CHM13 reference from the Telomere-to-Telomere Consortium improves the accuracy of our methods. We also demonstrate construction of a _Drosophila melanogaster_ pangenome.","x":-5520,"y":4600,"width":1200,"height":360,"color":"4"}, {"id":"799f68e892314b6b","type":"text","text":"Utilise des assemblies ancestrales reconstruites afin de combiner les sous-alignements","x":-1932,"y":3707,"width":290,"height":120}, {"id":"a4cd608c024b8594","type":"text","text":"2 à 5 génomes par sous-alignement","x":-1747,"y":3919,"width":250,"height":60}, {"id":"6dc45fab5aead6a1","type":"text","text":"découpage récursif selon le guide tree","x":-2066,"y":3919,"width":250,"height":60}, @@ -86,13 +79,11 @@ {"id":"0607cb8ec901ea4e","type":"text","text":"**Impact de l'input guide tree**","x":-2620,"y":4069,"width":275,"height":60}, {"id":"f4d55f5538722fc0","type":"text","text":"Non négligeable si l'arbre est incorrect ou inconnu","x":-2780,"y":4270,"width":263,"height":100}, {"id":"b67aca38385cc0e2","type":"text","text":"Effet réduit par l'ajout de la notion d'extragroupe, voire de plusieurs extragroupes","x":-2482,"y":4270,"width":337,"height":100}, - {"id":"15e35fa2d6d51a3d","type":"text","text":"# Gap-Sensitive Colinear Chaining Algorithms for Acyclic Pangenome Graphs\nURL: https://www.liebertpub.com/doi/10.1089/cmb.2023.0186\n\nA pangenome graph can serve as a better reference for genomic studies because it allows a compact representation of multiple genomes within a species. Aligning sequences to a graph is critical for pangenome-based resequencing. The seed-chain-extend heuristic works by finding short exact matches between a sequence and a graph. In this heuristic, colinear chaining helps identify a good cluster of exact matches that can be combined to form an alignment. Colinear chaining algorithms have been extensively studied for aligning two sequences with various gap costs, including linear, concave, and convex cost functions. However, extending these algorithms for sequence-to-graph alignment presents significant challenges. Recently, Makinen et al. introduced a sparse dynamic programming framework that exploits the small path cover property of acyclic pangenome graphs, enabling efficient chaining. However, this framework does not consider gap costs, limiting its practical effectiveness. We address this limitation by developing novel problem formulations and provably good chaining algorithms that support a variety of gap cost functions. These functions are carefully designed to enable fast chaining algorithms whose time requirements are parameterized in terms of the size of the minimum path cover. Through an empirical evaluation, we demonstrate the superior performance of our algorithm compared with existing aligners. When mapping simulated long reads to a pangenome graph comprising 95 human haplotypes, we achieved 98.7% precision while leaving <2% of reads unmapped.","x":-4080,"y":4230,"width":1240,"height":420,"color":"4"}, {"id":"ccad44d8e99b297f","type":"text","text":"Comparaison contre :\n+ minigraph\n+ graphaligner\n+ graphchainer\n+ (minimap2)","x":-3818,"y":4820,"width":240,"height":180}, {"id":"473c33069dc9b5e1","type":"text","text":"Prend en compte des fonctions de gap lors de l'alignement de séquences sur un pangénome au format d'un DAG","x":-3460,"y":4820,"width":422,"height":100,"color":"1"}, {"id":"b857d70c30b2a368","type":"text","text":"Du coup, applications ? Parce que les graphes MGC ou PGGB ne sont pas des DAG","x":-3464,"y":5000,"width":430,"height":80,"color":"2"}, {"id":"4b073f56cef38de7","type":"text","text":"Peut-être nécessité d'un outil qui transforme des pangenome graphs en DAGs ?","x":-2988,"y":4980,"width":297,"height":120,"color":"5"}, {"id":"2cd0774814970b9a","type":"text","text":"binary tree qui n'a pas besoin d'être complètement résolu","x":-2362,"y":3949,"width":250,"height":102}, - {"id":"8eb4486aa40f0ee0","type":"text","text":"Nouveau papier en préparation \"Haplotype-aware Sequence-to-Graph Alignment\"","x":-3265,"y":4160,"width":454,"height":90,"color":"3"}, {"id":"f51cc8b5120d57bd","type":"text","text":"Des *chain pairs* contigues dans une *chain* partagent deux côtés opposés d'une même arête noire","x":-1294,"y":2738,"width":340,"height":100}, {"id":"654e9a95fdb293f5","type":"file","file":"_imgs/superbubbles.jpg","x":-540,"y":1724,"width":400,"height":179}, {"id":"837c14414c238af7","type":"text","text":"Strictement disjointes","x":-644,"y":2180,"width":250,"height":60}, @@ -100,7 +91,6 @@ {"id":"129f89296035d1b3","type":"text","text":"Strictement imbriquées","x":-641,"y":2000,"width":245,"height":65}, {"id":"e4eeaa710c4ce695","type":"text","text":"Propriétés d'imbrication strictes","x":-679,"y":2090,"width":320,"height":60}, {"id":"9e629313e68e4600","type":"text","text":"Format HAL et HAL toolkit :\n+ format d'alignement gardant cette notion d'arbre\n+ format qui peut être modifié ; ajout ou retrait de génomes sans avoir à tout recalculer depuis zéro","x":-1377,"y":2876,"width":356,"height":209}, - {"id":"af918cdf44aa146d","type":"text","text":"# Distance indexing and seed clustering in sequence graphs\nURL: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7355256/pdf/btaa446.pdf\n\nMotivation: Graph representations of genomes are capable of expressing more genetic variation and can therefore\nbetter represent a population than standard linear genomes. However, due to the greater complexity of genome\ngraphs relative to linear genomes, some functions that are trivial on linear genomes become much more difficult in\ngenome graphs. Calculating distance is one such function that is simple in a linear genome but complicated in a\ngraph context. In read mapping algorithms such distance calculations are fundamental to determining if seed align-\nments could belong to the same mapping.\nResults: We have developed an algorithm for quickly calculating the minimum distance between positions on a se-\nquence graph using a minimum distance index. We have also developed an algorithm that uses the distance index\nto cluster seeds on a graph. We demonstrate that our implementations of these algorithms are efficient and practical\nto use for a new generation of mapping algorithms based upon genome graphs.","x":-1171,"y":3140,"width":1060,"height":400,"color":"4"}, {"id":"c4e7f42fad8cc6dd","type":"text","text":"### Ultrabubble\n\n+ Une superbubble dans un digraphe est une ultrabubble dans le *biedged graph* équivalent. \n+ Une ultrabubble est un snarl si sa composante séparée est acyclique et ne contient pas de *tips*.\n","x":-418,"y":2784,"width":615,"height":185,"color":"1"}, {"id":"a76a161d695d3749","type":"text","text":"### Superbubble\nN'importe quelle paire de noeuds distincts (x,y) dans le graphe forment une superbubble ssi :\n+ y est atteignable depuis x\n+ les noeuds atteignables depuis x sans dépasser y sont identiques aux noeuds atteignables depuis y sans dépasser x\n+ Le sous-graphe obtenu par ce set est acyclique\n+ aucun autre noeud du set ne forme une paire avec x, y ou les 2 qui remplit tous les critères précédents","x":-260,"y":2033,"width":482,"height":370,"color":"1"}, {"id":"3bac4b15a3b76c6d","type":"text","text":"### Snarl\nGénéralisation aux *biedged graphs* : les *snarls* sont des sous-graphes minimaux dont tous les noeus sont connectés au plus par deux arêtes au reste du graphe (2-BEC) \nUn *snarl* dans un biedged graph a ces propriétés :\n+ le retrait des arêtes noires entrant dans x et y déconnectent le graphe, formant une composante séparée\n+ il n'existe pas au sein de l'ensemble de noeuds un noued z tel que {x,z} ou {z,y} satisfait le critère précédent","x":349,"y":2292,"width":595,"height":320,"color":"1"}, @@ -116,11 +106,6 @@ {"id":"b3ab8ceb14f25665","type":"text","text":"Objectif : à partir du snarl tree, trouver une distance minimale entre deux points dans le graphe","x":-573,"y":3670,"width":321,"height":115,"color":"1"}, {"id":"79d85b0031c6ce3f","type":"text","text":"Initialisation avec chaque position dans un cluster séparé, puis aggrégation progressive en suivant le *snarl tree*","x":222,"y":4202,"width":360,"height":120}, {"id":"24a49c27acfb97ef","type":"text","text":"A chaque étape, annotation du cluster avec deux *bondary distance* : les plus courtes distances depuis n'importe laquelle des positions jusqu'aux *boundaries* de la structure","x":174,"y":4396,"width":456,"height":120}, - {"id":"c6fb2f397f3cf773","type":"text","text":"# The design and construction of reference pangenome graphs with minigraph\nURL: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02168-z\n\nThe recent advances in sequencing technologies enable the assembly of individual genomes to the quality of the reference genome. How to integrate multiple genomes from the same species and make the integrated representation accessible to biologists remains an open challenge. Here, we propose a graph-based data model and associated formats to represent multiple genomes while preserving the coordinate of the linear reference genome. We implement our ideas in the minigraph toolkit and demonstrate that we can efficiently construct a pangenome graph and compactly encode tens of thousands of structural variants missing from the current reference genome.","x":-6520,"y":3120,"width":1140,"height":260,"color":"4"}, - {"id":"b6da29614bf625f9","type":"text","text":"# Construction and representation of human pangenome graphs\nURL: https://pasteur.hal.science/pasteur-04126278/\n\nAs a single reference genome cannot possibly represent all the variation present across human individuals, pangenome graphs have been introduced to incorporate population diversity within a wide range of genomic analyses. Several data structures have been proposed for representing collections of genomes as pangenomes, in particular graphs. In this work we collect all publicly available high-quality human haplotypes and constructed the largest human pangenome graphs to date, incorporating 52 individuals in addition to two synthetic references (CHM13 and GRCh38). We build variation graphs and de Bruijn graphs of this collection using five of the state-of-the-art tools: Bifrost , mdbg , Minigraph , Minigraph-Cactus and pggb . We examine differences in the way each of these tools represents variations between input sequences, both in terms of overall graph structure and representation of specific genetic loci. This work sheds light on key differences between pangenome graph representations, informing end-users on how to select the most appropriate graph type for their application.","x":-3631,"y":5510,"width":1170,"height":335,"color":"4"}, - {"id":"959c6871b98e5f24","type":"text","text":"Travail comparatif sur :\n+ Bifrost\n+ mdbg\n+ Minigraph\n+ Minigraph-cactus\n+ PGGB","x":-2345,"y":5480,"width":250,"height":230}, - {"id":"2ad06502c64301be","type":"text","text":"Colored compacted de Brujin graphs (ccdbg)","x":-2002,"y":5535,"width":250,"height":60}, - {"id":"c1747efd88b1391c","type":"text","text":"Graphes de variation","x":-2002,"y":5618,"width":250,"height":60}, {"id":"a07c9f18c29efd3c","type":"text","text":"Continuation de cette définition de **snarls** pour définir la variation","x":-4520,"y":5200,"width":320,"height":80}, {"id":"395da1694bf23728","type":"text","text":"Construction d'un SV graph avec **minigraph**","x":-5280,"y":5240,"width":250,"height":80,"color":"3"}, {"id":"565c2f071796974b","type":"text","text":"Enlever les alignements incomplets et fallacieux correspondant à de coutes chaînes visitées par un grand nombre de séquences","x":-5030,"y":5715,"width":400,"height":130}, @@ -141,17 +126,38 @@ {"id":"992f3cf1e697d0f7","type":"text","text":"Analyse de permutations entre P et Q","x":-4880,"y":1780,"width":260,"height":90}, {"id":"e275bd2463802d6a","type":"text","text":"Décomposition successive des parallélogrammes de projection en modifiant des bits dans P et Q","x":-4880,"y":1895,"width":260,"height":165}, {"id":"416157ccf3ea4181","type":"text","text":"Tant que toutes les Q-runs n'ont pas un poids inférieur à 2d :\n+ On prend la run de poids maximal\n+ On la splitte\n+ On recalcule","x":-4936,"y":2240,"width":372,"height":180}, - {"id":"436955ae04bc294f","type":"text","text":"Méthode pour remplacer la BWT sur des pangénomes","x":-5120,"y":1300,"width":280,"height":80}, {"id":"2e27c8d0f6788a54","type":"text","text":"Ne concerne pas directement les graphes de variation","x":-5105,"y":1417,"width":305,"height":83}, {"id":"496273d96b37d9da","type":"text","text":"BWT indexes : basés sur les suffixes/préfixes des BWT successives à travers une fenêtre de lecture, et les overlaps","x":-5105,"y":1540,"width":340,"height":120}, - {"id":"44132eb33e08274c","type":"text","text":"# Movi: a fast and cache-efficient full-text pangenome index\nURL : https://www.biorxiv.org/content/10.1101/2023.11.04.565615v1.full\n\nEfficient pangenome indexes are promising tools for many applications, including rapid classification of nanopore sequencing reads. Recently, a compressed-index data structure called the “move structure” was proposed as an alternative to other BWT-based indexes like the FM index and r-index. The move structure uniquely achieves both O(r) space and O(1)-time queries, where r is the number of runs in the pangenome BWT. We implemented Movi, an efficient tool for building and querying move-structure pangenome indexes. While the size of the Movi’s index is larger than the r-index, it scales at a smaller rate for pangenome references, as its size is exactly proportional to r, the number of runs in the BWT of the reference. Movi can compute sophisticated matching queries needed for classification – such as pseudo-matching lengths – at least ten times faster than the fastest available methods. Movi achieves this speed by leveraging the move structure’s strong locality of reference, incurring close to the minimum possible number of cache misses for queries against large pangenomes. Movi’s fast constant-time query loop makes it well suited to real-time applications like adaptive sampling for nanopore sequencing, where decisions must be made in a small and predictable time interval.","x":-6300,"y":1417,"width":890,"height":466,"color":"4"}, {"id":"c39f8c609227c924","type":"text","text":"Cela garantit qu'au max 2d scans sont requis pour interroger la structure","x":-4385,"y":2380,"width":370,"height":80}, {"id":"cae77ee78a51c888","type":"text","text":"Quand on le parcourt, on ne saute que quelques lignes","x":-5369,"y":2240,"width":289,"height":80}, - {"id":"40cbd71b2e996b34","type":"text","text":"A discuter le 23 janv.","x":-1440,"y":-1260,"width":250,"height":60} + {"id":"40cbd71b2e996b34","type":"text","text":"A discuter le 23 janv.","x":-1440,"y":-1260,"width":250,"height":60}, + {"id":"b2bd2d86c5838364","type":"file","file":"_publications/Cactus - Algorithms for genome multiple sequence alignment.md","x":-1151,"y":-277,"width":920,"height":360,"color":"4"}, + {"id":"a884c10e4fe8529b","type":"text","text":"> An associated article to this work demonstrated that the pangenomes\npresented here can be losslessly stored using a compressed, binary\nrepresentation of GFA in just 3–6 GB despite representing more than\n282 billion bases of individual sequence, with strongly sublinear scaling\nas new genomes are added","x":-3920,"y":-80,"width":640,"height":160}, + {"id":"3f80ff8608c11f00","type":"file","file":"_publications/Cactus Graphs for Genome Comparisons.md","x":-870,"y":-1380,"width":780,"height":300,"color":"4"}, + {"id":"53c74e047c34032d","type":"text","text":"Papier très clair sur les définitions, nécessaire pour comprendre le suivant","x":-460,"y":-1474,"width":370,"height":80,"color":"3"}, + {"id":"8392d7487ecced04","type":"text","text":"cacti jamais défini ?","x":-450,"y":1240,"width":250,"height":50,"color":"3"}, + {"id":"b24cb341ab94c72b","type":"file","file":"_publications/Superbubbles, ultrabubbles and cacti.md","x":-1420,"y":1314,"width":1220,"height":360,"color":"4"}, + {"id":"8c61cb4632b4c27a","type":"file","file":"_publications/Distance indexing and seed clustering in sequence graphs.md","x":-1171,"y":3140,"width":1060,"height":400,"color":"4"}, + {"id":"d2ffb7b6f7de3712","type":"file","file":"_publications/Progressive Cactus is a multiple-genome aligner for the thousand-genome era.md","x":-3180,"y":2920,"width":1188,"height":417,"color":"4"}, + {"id":"8eb4486aa40f0ee0","type":"text","text":"Nouveau papier en préparation \"Haplotype-aware Sequence-to-Graph Alignment\"","x":-3294,"y":4129,"width":454,"height":90,"color":"3"}, + {"id":"e4df005023f988fe","type":"file","file":"_publications/Gap-Sensitive Colinear Chaining Algorithms for Acyclic Pangenome Graphs.md","x":-4080,"y":4230,"width":1240,"height":420,"color":"4"}, + {"id":"114973e1f73f1e7d","type":"file","file":"_publications/Pangenome graph construction from genome alignments with Minigraph-Cactus.md","x":-5520,"y":4600,"width":1200,"height":360,"color":"4"}, + {"id":"0f72a37b84a47211","type":"file","file":"_publications/The design and construction of reference pangenome graphs with minigraph.md","x":-6520,"y":3120,"width":1140,"height":260,"color":"4"}, + {"id":"2ad06502c64301be","type":"text","text":"Colored compacted de Brujin graphs (ccdbg)","x":-4076,"y":713,"width":250,"height":60}, + {"id":"c1747efd88b1391c","type":"text","text":"Graphes de variation","x":-4076,"y":796,"width":250,"height":60}, + {"id":"a757fd0e1fc5f59e","type":"file","file":"_publications/Movi - a fast and cache-efficient full-text pangenome index.md","x":-6300,"y":1417,"width":890,"height":466,"color":"4"}, + {"id":"d7ea1f197b165c65","type":"file","file":"_publications/Construction and representation of human pangenome graphs.md","x":-5705,"y":688,"width":1170,"height":335,"color":"4"}, + {"id":"436955ae04bc294f","type":"text","text":"Méthode pour remplacer la BWT sur des pangénomes","x":-5220,"y":1710,"width":280,"height":80,"color":"1"}, + {"id":"753b3485207254d3","type":"file","file":"_publications/A draft human pangenome reference.md","x":-5160,"y":-180,"width":1021,"height":360,"color":"4"}, + {"id":"959c6871b98e5f24","type":"text","text":"Travail comparatif sur :\n+ Bifrost\n+ mdbg\n+ Minigraph\n+ Minigraph-cactus\n+ PGGB","x":-4460,"y":741,"width":250,"height":230,"color":"1"}, + {"id":"18ab04ccd8deb09f","type":"file","file":"_publications/GBZ file format for pangenome graphs.md","x":-2860,"y":-160,"width":800,"height":320,"color":"4"}, + {"id":"15e57fcfe67aadc7","type":"file","file":"_publications/Unbiased pangenome graphs.md","x":-9360,"y":1380,"width":1120,"height":380,"color":"4"}, + {"id":"2afff5da76c99aa2","type":"file","file":"_publications/Building pangenome graphs.md","x":-9200,"y":2140,"width":820,"height":400,"color":"4"}, + {"id":"1bd14c47bb5791d4","type":"text","text":"Explications sur l'algorithme `seqwish` qui sert à l'induction du graphe pour PGGB","x":-8080,"y":1440,"width":380,"height":80,"color":"1"} ], "edges":[ {"id":"ebe3edacf1266866","fromNode":"8995e775a8dbba70","fromSide":"right","toNode":"f48a4c7882b18109","toSide":"left"}, - {"id":"e116c9f5f8df0bd3","fromNode":"464d3a859a82198b","fromSide":"right","toNode":"8995e775a8dbba70","toSide":"left"}, + {"id":"e116c9f5f8df0bd3","fromNode":"b2bd2d86c5838364","fromSide":"right","toNode":"8995e775a8dbba70","toSide":"left"}, {"id":"e324d44edbab4147","fromNode":"8995e775a8dbba70","fromSide":"right","toNode":"5a563da47acfed7a","toSide":"left"}, {"id":"308a282ebfe97dfd","fromNode":"5a563da47acfed7a","fromSide":"right","toNode":"167906313c410a7c","toSide":"left"}, {"id":"8446850d8fb10db8","fromNode":"5a563da47acfed7a","fromSide":"right","toNode":"9ee4e8e1010c6b03","toSide":"left","label":"usage en genome MSA"}, @@ -174,19 +180,19 @@ {"id":"bdb6b187915a84b5","fromNode":"a25245bd984a162e","fromSide":"bottom","toNode":"a3f8d96e7aceab99","toSide":"top"}, {"id":"be9cfe85a6403cae","fromNode":"9b12143cbb4118a9","fromSide":"bottom","toNode":"a3f8d96e7aceab99","toSide":"top"}, {"id":"04310ed63502617e","fromNode":"572e1342e4d6b1c1","fromSide":"bottom","toNode":"a3f8d96e7aceab99","toSide":"top"}, - {"id":"1df5f821a8c82727","fromNode":"58bad2b9c05a7f76","fromSide":"bottom","toNode":"464d3a859a82198b","toSide":"top","color":"4"}, + {"id":"1df5f821a8c82727","fromNode":"3f80ff8608c11f00","fromSide":"bottom","toNode":"b2bd2d86c5838364","toSide":"top","color":"4"}, {"id":"ba8b3cf16e17d306","fromNode":"a3f8d96e7aceab99","fromSide":"bottom","toNode":"26b0fd371f64a696","toSide":"top","label":"algorithme CAF"}, {"id":"54bcf6f2d1a8269f","fromNode":"26b0fd371f64a696","fromSide":"bottom","toNode":"48faed08f7460406","toSide":"top","label":"algorithme BAR"}, - {"id":"7018b930bdc7a00c","fromNode":"58bad2b9c05a7f76","fromSide":"right","toNode":"66a9ce421a4c5ec9","toSide":"left"}, + {"id":"7018b930bdc7a00c","fromNode":"3f80ff8608c11f00","fromSide":"right","toNode":"66a9ce421a4c5ec9","toSide":"left"}, {"id":"23fa7e35e87f2d46","fromNode":"66a9ce421a4c5ec9","fromSide":"right","toNode":"919383e2a232b1fe","toSide":"left"}, {"id":"517da59aadc7171a","fromNode":"66a9ce421a4c5ec9","fromSide":"right","toNode":"a6826d2c452750ad","toSide":"left"}, {"id":"61839395d8e382ef","fromNode":"498a6976afcf8582","fromSide":"right","toNode":"9f6fc359db1439ff","toSide":"left"}, {"id":"9ac70617c3094f73","fromNode":"498a6976afcf8582","fromSide":"right","toNode":"3fce6a85e9ce9de2","toSide":"left"}, {"id":"f06a3d85df960a81","fromNode":"884872dc5076dca6","fromSide":"right","toNode":"498a6976afcf8582","toSide":"left"}, - {"id":"14395d7c84d6ce19","fromNode":"58bad2b9c05a7f76","fromSide":"right","toNode":"884872dc5076dca6","toSide":"left"}, - {"id":"893320a8783badc3","fromNode":"58bad2b9c05a7f76","fromSide":"right","toNode":"c73aa01ef3b7754c","toSide":"left"}, - {"id":"18446ccc67e981c7","fromNode":"58bad2b9c05a7f76","fromSide":"right","toNode":"4402546a20b4c968","toSide":"left"}, - {"id":"51047abffbe67d81","fromNode":"58bad2b9c05a7f76","fromSide":"right","toNode":"c5bed1188d635294","toSide":"left"}, + {"id":"14395d7c84d6ce19","fromNode":"3f80ff8608c11f00","fromSide":"right","toNode":"884872dc5076dca6","toSide":"left"}, + {"id":"893320a8783badc3","fromNode":"3f80ff8608c11f00","fromSide":"right","toNode":"c73aa01ef3b7754c","toSide":"left"}, + {"id":"18446ccc67e981c7","fromNode":"3f80ff8608c11f00","fromSide":"right","toNode":"4402546a20b4c968","toSide":"left"}, + {"id":"51047abffbe67d81","fromNode":"3f80ff8608c11f00","fromSide":"right","toNode":"c5bed1188d635294","toSide":"left"}, {"id":"1d4e0409dceb0b87","fromNode":"4402546a20b4c968","fromSide":"right","toNode":"acbc449c2d39d522","toSide":"left"}, {"id":"365e46f84db50434","fromNode":"c73aa01ef3b7754c","fromSide":"right","toNode":"acbc449c2d39d522","toSide":"left"}, {"id":"fbaab3197d5013bb","fromNode":"c5bed1188d635294","fromSide":"right","toNode":"acbc449c2d39d522","toSide":"left"}, @@ -197,21 +203,21 @@ {"id":"cbca79064ccf7eaf","fromNode":"bc6ec2440d65a561","fromSide":"right","toNode":"3aba27866e336e46","toSide":"left"}, {"id":"02d175fedb4107bf","fromNode":"bc6ec2440d65a561","fromSide":"right","toNode":"1041d25cf9b9250b","toSide":"left"}, {"id":"4de01744f4501c27","fromNode":"529fdb5950adb38d","fromSide":"right","toNode":"742daf28a78c57f6","toSide":"left"}, - {"id":"6a2c50bf5897173a","fromNode":"58bad2b9c05a7f76","fromSide":"right","toNode":"529fdb5950adb38d","toSide":"left"}, - {"id":"9a525a749a8f5be9","fromNode":"464d3a859a82198b","fromSide":"right","toNode":"739fec564ad1be89","toSide":"top"}, - {"id":"7abe58ad486ef183","fromNode":"464d3a859a82198b","fromSide":"right","toNode":"1daf2c11f1ccebf0","toSide":"top"}, - {"id":"86a80c283fdcec66","fromNode":"c6fb2f397f3cf773","fromSide":"bottom","toNode":"8ff05f3c7e0b7038","toSide":"top","color":"4"}, - {"id":"ccbf97af47d03a2c","fromNode":"bd2ac0b855e308f1","fromSide":"bottom","toNode":"8ff05f3c7e0b7038","toSide":"top","color":"4"}, - {"id":"54e0ce13d6b7d718","fromNode":"c17ec7ef81f35b4b","fromSide":"right","toNode":"a0820e26c5132c08","toSide":"left"}, + {"id":"6a2c50bf5897173a","fromNode":"3f80ff8608c11f00","fromSide":"right","toNode":"529fdb5950adb38d","toSide":"left"}, + {"id":"9a525a749a8f5be9","fromNode":"b2bd2d86c5838364","fromSide":"right","toNode":"739fec564ad1be89","toSide":"top"}, + {"id":"7abe58ad486ef183","fromNode":"b2bd2d86c5838364","fromSide":"right","toNode":"1daf2c11f1ccebf0","toSide":"top"}, + {"id":"86a80c283fdcec66","fromNode":"0f72a37b84a47211","fromSide":"bottom","toNode":"114973e1f73f1e7d","toSide":"top","color":"4"}, + {"id":"ccbf97af47d03a2c","fromNode":"d2ffb7b6f7de3712","fromSide":"bottom","toNode":"114973e1f73f1e7d","toSide":"top","color":"4"}, + {"id":"54e0ce13d6b7d718","fromNode":"b24cb341ab94c72b","fromSide":"right","toNode":"a0820e26c5132c08","toSide":"left"}, {"id":"2889826312059da4","fromNode":"a0820e26c5132c08","fromSide":"right","toNode":"c6d832679d9cf6c2","toSide":"left"}, {"id":"e3c31dadb0a910dd","fromNode":"a0820e26c5132c08","fromSide":"right","toNode":"99944f7644f8775f","toSide":"left"}, {"id":"2b92a6afb54af1ed","fromNode":"a0820e26c5132c08","fromSide":"bottom","toNode":"6a40b6d09c5dcd37","toSide":"left"}, - {"id":"2688c91e67850790","fromNode":"c17ec7ef81f35b4b","fromSide":"right","toNode":"0d67d9ec7a8efcab","toSide":"left"}, + {"id":"2688c91e67850790","fromNode":"b24cb341ab94c72b","fromSide":"right","toNode":"0d67d9ec7a8efcab","toSide":"left"}, {"id":"9e8a908012a6bdd3","fromNode":"0d67d9ec7a8efcab","fromSide":"right","toNode":"b09229bf8670d741","toSide":"left"}, - {"id":"9fc422907c5f9eb0","fromNode":"c17ec7ef81f35b4b","fromSide":"right","toNode":"0ffdd366b904f707","toSide":"left"}, - {"id":"0b947d400cd98ec2","fromNode":"c17ec7ef81f35b4b","fromSide":"right","toNode":"3a1f12b39c04853b","toSide":"left"}, + {"id":"9fc422907c5f9eb0","fromNode":"b24cb341ab94c72b","fromSide":"right","toNode":"0ffdd366b904f707","toSide":"left"}, + {"id":"0b947d400cd98ec2","fromNode":"b24cb341ab94c72b","fromSide":"right","toNode":"3a1f12b39c04853b","toSide":"left"}, {"id":"c346218255b123fb","fromNode":"3a1f12b39c04853b","fromSide":"right","toNode":"9707d09e99a83455","toSide":"left"}, - {"id":"ef8658dfc14ed38f","fromNode":"c17ec7ef81f35b4b","fromSide":"right","toNode":"a76a161d695d3749","toSide":"top"}, + {"id":"ef8658dfc14ed38f","fromNode":"b24cb341ab94c72b","fromSide":"right","toNode":"a76a161d695d3749","toSide":"top"}, {"id":"a0d22ab405871e8d","fromNode":"0ffdd366b904f707","fromSide":"bottom","toNode":"3bac4b15a3b76c6d","toSide":"top"}, {"id":"ffb00171e2e70810","fromNode":"a76a161d695d3749","fromSide":"right","toNode":"3bac4b15a3b76c6d","toSide":"top","color":"1","label":"adaptation aux biedged graphs"}, {"id":"d0d999112b20f6ac","fromNode":"9187b5a93d2e77d8","fromSide":"bottom","toNode":"c7e28ab1b6d1dfff","toSide":"top"}, @@ -231,11 +237,11 @@ {"id":"c6d923407d8fda25","fromNode":"c4e7f42fad8cc6dd","fromSide":"top","toNode":"3bac4b15a3b76c6d","toSide":"bottom","color":"1","label":"généralisation"}, {"id":"cb9f40234840aec2","fromNode":"a76a161d695d3749","fromSide":"bottom","toNode":"c4e7f42fad8cc6dd","toSide":"top","color":"1","label":"extension aux graphes bidirigés"}, {"id":"9b74b92bba69ce36","fromNode":"a76a161d695d3749","fromSide":"left","toNode":"35c597f2a9236a4f","toSide":"right"}, - {"id":"fde3f265977a65ff","fromNode":"c17ec7ef81f35b4b","fromSide":"bottom","toNode":"d1fb5f4183f8fbc0","toSide":"top"}, - {"id":"06c24211732c06b6","fromNode":"bd2ac0b855e308f1","fromSide":"right","toNode":"93fe12ee89f385ed","toSide":"left"}, - {"id":"bd2247f4d68e28c0","fromNode":"bd2ac0b855e308f1","fromSide":"bottom","toNode":"800d68620586fe43","toSide":"left"}, - {"id":"97a7d3b740d57216","fromNode":"464d3a859a82198b","fromSide":"bottom","toNode":"c17ec7ef81f35b4b","toSide":"top","color":"4"}, - {"id":"89a30d2f3147142b","fromNode":"464d3a859a82198b","fromSide":"bottom","toNode":"bd2ac0b855e308f1","toSide":"top","color":"4"}, + {"id":"fde3f265977a65ff","fromNode":"b24cb341ab94c72b","fromSide":"bottom","toNode":"d1fb5f4183f8fbc0","toSide":"top"}, + {"id":"06c24211732c06b6","fromNode":"d2ffb7b6f7de3712","fromSide":"right","toNode":"93fe12ee89f385ed","toSide":"left"}, + {"id":"bd2247f4d68e28c0","fromNode":"d2ffb7b6f7de3712","fromSide":"bottom","toNode":"800d68620586fe43","toSide":"left"}, + {"id":"97a7d3b740d57216","fromNode":"b2bd2d86c5838364","fromSide":"bottom","toNode":"b24cb341ab94c72b","toSide":"top","color":"4"}, + {"id":"89a30d2f3147142b","fromNode":"b2bd2d86c5838364","fromSide":"bottom","toNode":"d2ffb7b6f7de3712","toSide":"top","color":"4"}, {"id":"9d9d59fdf4eb4e8f","fromNode":"800d68620586fe43","fromSide":"right","toNode":"799f68e892314b6b","toSide":"left"}, {"id":"c2830ac1178daa97","fromNode":"799f68e892314b6b","fromSide":"bottom","toNode":"a4cd608c024b8594","toSide":"top"}, {"id":"2c5c354e0741048a","fromNode":"799f68e892314b6b","fromSide":"bottom","toNode":"6dc45fab5aead6a1","toSide":"top"}, @@ -245,34 +251,34 @@ {"id":"0b057cb2697bdeb8","fromNode":"958344047a413d04","fromSide":"right","toNode":"363a63c9986152c9","toSide":"top","label":"assembly ancestrale"}, {"id":"b9c3076fa93276be","fromNode":"a4cd608c024b8594","fromSide":"bottom","toNode":"363a63c9986152c9","toSide":"top","label":"assemblies originales"}, {"id":"078adaa5637ef6b9","fromNode":"958344047a413d04","fromSide":"right","toNode":"6d2888fd017db5c0","toSide":"bottom"}, - {"id":"e0b543b48f85d0f2","fromNode":"bd2ac0b855e308f1","fromSide":"right","toNode":"9e629313e68e4600","toSide":"left"}, + {"id":"e0b543b48f85d0f2","fromNode":"d2ffb7b6f7de3712","fromSide":"right","toNode":"9e629313e68e4600","toSide":"left"}, {"id":"a28c36841d31a3ba","fromNode":"800d68620586fe43","fromSide":"bottom","toNode":"0607cb8ec901ea4e","toSide":"top"}, {"id":"7aaf6003ad328bcc","fromNode":"0607cb8ec901ea4e","fromSide":"bottom","toNode":"b67aca38385cc0e2","toSide":"top"}, {"id":"f89a0b3771dd7f12","fromNode":"0607cb8ec901ea4e","fromSide":"bottom","toNode":"f4d55f5538722fc0","toSide":"top"}, - {"id":"6153bd15466017cb","fromNode":"15e35fa2d6d51a3d","fromSide":"bottom","toNode":"ccad44d8e99b297f","toSide":"top"}, - {"id":"3fc4574ac8d88568","fromNode":"15e35fa2d6d51a3d","fromSide":"bottom","toNode":"473c33069dc9b5e1","toSide":"top"}, + {"id":"6153bd15466017cb","fromNode":"e4df005023f988fe","fromSide":"bottom","toNode":"ccad44d8e99b297f","toSide":"top"}, + {"id":"3fc4574ac8d88568","fromNode":"e4df005023f988fe","fromSide":"bottom","toNode":"473c33069dc9b5e1","toSide":"top"}, {"id":"72cce186facc20fc","fromNode":"473c33069dc9b5e1","fromSide":"bottom","toNode":"b857d70c30b2a368","toSide":"top"}, {"id":"0d2b8af12a718c47","fromNode":"473c33069dc9b5e1","fromSide":"right","toNode":"4b073f56cef38de7","toSide":"top"}, - {"id":"091618dd7a3964e1","fromNode":"b6da29614bf625f9","fromSide":"right","toNode":"959c6871b98e5f24","toSide":"left"}, + {"id":"091618dd7a3964e1","fromNode":"d7ea1f197b165c65","fromSide":"right","toNode":"959c6871b98e5f24","toSide":"left"}, {"id":"b74d4005748a7017","fromNode":"959c6871b98e5f24","fromSide":"right","toNode":"2ad06502c64301be","toSide":"left"}, {"id":"6d1715eb9a5a9c5a","fromNode":"959c6871b98e5f24","fromSide":"right","toNode":"c1747efd88b1391c","toSide":"left"}, - {"id":"9b1ee7d7288087fa","fromNode":"c17ec7ef81f35b4b","fromSide":"bottom","toNode":"af918cdf44aa146d","toSide":"top","color":"4"}, + {"id":"9b1ee7d7288087fa","fromNode":"b24cb341ab94c72b","fromSide":"bottom","toNode":"8c61cb4632b4c27a","toSide":"top","color":"4"}, {"id":"dc5b669fe605a811","fromNode":"c7e28ab1b6d1dfff","fromSide":"bottom","toNode":"f51cc8b5120d57bd","toSide":"top"}, - {"id":"601b64253090e1e8","fromNode":"af918cdf44aa146d","fromSide":"right","toNode":"cafa15b1c72270f0","toSide":"left"}, - {"id":"c5e98cf9cfc6f4c6","fromNode":"af918cdf44aa146d","fromSide":"right","toNode":"69cb9f0bc27775ce","toSide":"left"}, + {"id":"601b64253090e1e8","fromNode":"8c61cb4632b4c27a","fromSide":"right","toNode":"cafa15b1c72270f0","toSide":"left"}, + {"id":"c5e98cf9cfc6f4c6","fromNode":"8c61cb4632b4c27a","fromSide":"right","toNode":"69cb9f0bc27775ce","toSide":"left"}, {"id":"7606d6cdc7a0385f","fromNode":"3bac4b15a3b76c6d","fromSide":"bottom","toNode":"69cb9f0bc27775ce","toSide":"top"}, {"id":"6608b9ba90ddbe0b","fromNode":"69cb9f0bc27775ce","fromSide":"bottom","toNode":"9927540a8f87d695","toSide":"left"}, {"id":"a183b11ae8fa1345","fromNode":"69cb9f0bc27775ce","fromSide":"bottom","toNode":"8ec43f4cf38a035f","toSide":"left"}, {"id":"199c78ed0baf991f","fromNode":"8ec43f4cf38a035f","fromSide":"right","toNode":"9b29d6bc4bf05e07","toSide":"left"}, - {"id":"16fd9d6161412c6f","fromNode":"af918cdf44aa146d","fromSide":"bottom","toNode":"b3ab8ceb14f25665","toSide":"top"}, + {"id":"16fd9d6161412c6f","fromNode":"8c61cb4632b4c27a","fromSide":"bottom","toNode":"b3ab8ceb14f25665","toSide":"top"}, {"id":"bd115360c5acb561","fromNode":"b3ab8ceb14f25665","fromSide":"bottom","toNode":"c00023f9483a734f","toSide":"top"}, {"id":"1dc61bbce548a31c","fromNode":"b3ab8ceb14f25665","fromSide":"bottom","toNode":"41575dca10c5c02d","toSide":"top"}, {"id":"bf01cb52323b6ff1","fromNode":"c00023f9483a734f","fromSide":"bottom","toNode":"012805a4b06cc5b4","toSide":"top"}, {"id":"4e0e2554d28232b3","fromNode":"41575dca10c5c02d","fromSide":"bottom","toNode":"012805a4b06cc5b4","toSide":"top"}, {"id":"69656e75ed4cba68","fromNode":"79d85b0031c6ce3f","fromSide":"bottom","toNode":"24a49c27acfb97ef","toSide":"top"}, {"id":"3748c68fbbc6330c","fromNode":"012805a4b06cc5b4","fromSide":"right","toNode":"79d85b0031c6ce3f","toSide":"left"}, - {"id":"c10eb4c71860abd8","fromNode":"8ff05f3c7e0b7038","fromSide":"bottom","toNode":"a07c9f18c29efd3c","toSide":"left"}, - {"id":"aa4f4021e880fb10","fromNode":"8ff05f3c7e0b7038","fromSide":"bottom","toNode":"395da1694bf23728","toSide":"top"}, + {"id":"c10eb4c71860abd8","fromNode":"114973e1f73f1e7d","fromSide":"bottom","toNode":"a07c9f18c29efd3c","toSide":"left"}, + {"id":"aa4f4021e880fb10","fromNode":"114973e1f73f1e7d","fromSide":"bottom","toNode":"395da1694bf23728","toSide":"top"}, {"id":"b6e51a72ccc8f98c","fromNode":"395da1694bf23728","fromSide":"bottom","toNode":"a126e0f80a9ae345","toSide":"top","color":"3"}, {"id":"38102332928a2203","fromNode":"a126e0f80a9ae345","fromSide":"right","toNode":"565c2f071796974b","toSide":"top"}, {"id":"a341a564be10134c","fromNode":"565c2f071796974b","fromSide":"bottom","toNode":"de1f18f12d0c57b5","toSide":"right"}, @@ -280,9 +286,9 @@ {"id":"55175decf88aca1b","fromNode":"56c7e177b0c92f4b","fromSide":"bottom","toNode":"a126e0f80a9ae345","toSide":"right"}, {"id":"e1f5e3d22abc686a","fromNode":"395da1694bf23728","fromSide":"right","toNode":"56c7e177b0c92f4b","toSide":"top"}, {"id":"76408abc1e2abaa4","fromNode":"a126e0f80a9ae345","fromSide":"bottom","toNode":"502d8f0c671f3d9f","toSide":"right"}, - {"id":"df6322ae76ff3790","fromNode":"44132eb33e08274c","fromSide":"right","toNode":"436955ae04bc294f","toSide":"left"}, - {"id":"9104bae3b43f4220","fromNode":"44132eb33e08274c","fromSide":"right","toNode":"2e27c8d0f6788a54","toSide":"left"}, - {"id":"70bf3f35a35a512a","fromNode":"44132eb33e08274c","fromSide":"right","toNode":"496273d96b37d9da","toSide":"left"}, + {"id":"df6322ae76ff3790","fromNode":"a757fd0e1fc5f59e","fromSide":"right","toNode":"436955ae04bc294f","toSide":"left"}, + {"id":"9104bae3b43f4220","fromNode":"a757fd0e1fc5f59e","fromSide":"right","toNode":"2e27c8d0f6788a54","toSide":"left"}, + {"id":"70bf3f35a35a512a","fromNode":"a757fd0e1fc5f59e","fromSide":"right","toNode":"496273d96b37d9da","toSide":"left"}, {"id":"58d3d1fb918d9674","fromNode":"496273d96b37d9da","fromSide":"right","toNode":"6da72b1e869ca4ff","toSide":"left"}, {"id":"ea2094ca6ebf6252","fromNode":"6da72b1e869ca4ff","fromSide":"right","toNode":"5c4b56ef797d1072","toSide":"left"}, {"id":"151a32d98ee2b804","fromNode":"6da72b1e869ca4ff","fromSide":"right","toNode":"21cd3ec799bf1599","toSide":"left"}, @@ -295,13 +301,17 @@ {"id":"95b7510ec5b62b15","fromNode":"5c4b56ef797d1072","fromSide":"bottom","toNode":"78cc6582b1da88ac","toSide":"right"}, {"id":"a5145eec17c6554c","fromNode":"6da72b1e869ca4ff","fromSide":"bottom","toNode":"78cc6582b1da88ac","toSide":"top"}, {"id":"067f3ec5bdf00b38","fromNode":"5c4b56ef797d1072","fromSide":"bottom","toNode":"922d801361911683","toSide":"top"}, - {"id":"f9b21930c32275eb","fromNode":"44132eb33e08274c","fromSide":"right","toNode":"bfe61575cd63f0bd","toSide":"left"}, + {"id":"f9b21930c32275eb","fromNode":"a757fd0e1fc5f59e","fromSide":"right","toNode":"bfe61575cd63f0bd","toSide":"left"}, {"id":"b646a5da52e1dae9","fromNode":"bfe61575cd63f0bd","fromSide":"right","toNode":"992f3cf1e697d0f7","toSide":"left"}, {"id":"28e086a04f2c42a3","fromNode":"bfe61575cd63f0bd","fromSide":"bottom","toNode":"dd5ad775188134db","toSide":"top"}, {"id":"1f6bc561407363d1","fromNode":"bfe61575cd63f0bd","fromSide":"right","toNode":"e275bd2463802d6a","toSide":"left"}, {"id":"ae4c3bcfb1f2451a","fromNode":"e275bd2463802d6a","fromSide":"bottom","toNode":"416157ccf3ea4181","toSide":"top"}, {"id":"7ed5373c3616fb83","fromNode":"416157ccf3ea4181","fromSide":"right","toNode":"c39f8c609227c924","toSide":"left"}, {"id":"19c47ea14ed92136","fromNode":"dd5ad775188134db","fromSide":"bottom","toNode":"cae77ee78a51c888","toSide":"top"}, - {"id":"9ae3d63809280de1","fromNode":"40cbd71b2e996b34","fromSide":"right","toNode":"58bad2b9c05a7f76","toSide":"left"} + {"id":"9ae3d63809280de1","fromNode":"40cbd71b2e996b34","fromSide":"right","toNode":"3f80ff8608c11f00","toSide":"left"}, + {"id":"95926d3dedce5639","fromNode":"753b3485207254d3","fromSide":"right","toNode":"a884c10e4fe8529b","toSide":"left","color":"4"}, + {"id":"51b8ebcc3fafea38","fromNode":"a884c10e4fe8529b","fromSide":"right","toNode":"18ab04ccd8deb09f","toSide":"left","color":"4","label":"l'article en question"}, + {"id":"eece43191eed580d","fromNode":"15e57fcfe67aadc7","fromSide":"bottom","toNode":"2afff5da76c99aa2","toSide":"top","color":"4"}, + {"id":"6ed113686d0c2fff","fromNode":"15e57fcfe67aadc7","fromSide":"right","toNode":"1bd14c47bb5791d4","toSide":"left"} ] } \ No newline at end of file diff --git a/content/_publications/Superbubbles, ultrabubbles and cacti.md b/content/_publications/Superbubbles, ultrabubbles and cacti.md new file mode 100644 index 0000000000000..e0dc61febeb6c --- /dev/null +++ b/content/_publications/Superbubbles, ultrabubbles and cacti.md @@ -0,0 +1,3 @@ +URL : https://pubmed.ncbi.nlm.nih.gov/29461862/ + +A superbubble is a type of directed acyclic subgraph with single distinct source and sink vertices. In genome assembly and genetics, the possible paths through a superbubble can be considered to represent the set of possible sequences at a location in a genome. Bidirected and biedged graphs are a generalization of digraphs that are increasingly being used to more fully represent genome assembly and variation problems. In this study, we define snarls and ultrabubbles, generalizations of superbubbles for bidirected and biedged graphs, and give an efficient algorithm for the detection of these more general structures. Key to this algorithm is the cactus graph, which, we show, encodes the nested decomposition of a graph into snarls and ultrabubbles within its structure. We propose and demonstrate empirically that this decomposition on bidirected and biedged graphs solves a fundamental problem by defining genetic sites for any collection of genomic variations, including complex structural variations, without need for any single reference genome coordinate system. Further, the nesting of the decomposition gives a natural way to describe and model variations contained within large variations, a case not currently dealt with by existing formats (e.g., variant cell format (VCF)). \ No newline at end of file diff --git a/content/_publications/The design and construction of reference pangenome graphs with minigraph.md b/content/_publications/The design and construction of reference pangenome graphs with minigraph.md new file mode 100644 index 0000000000000..bb48bbe14a523 --- /dev/null +++ b/content/_publications/The design and construction of reference pangenome graphs with minigraph.md @@ -0,0 +1,3 @@ +URL: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02168-z + +The recent advances in sequencing technologies enable the assembly of individual genomes to the quality of the reference genome. How to integrate multiple genomes from the same species and make the integrated representation accessible to biologists remains an open challenge. Here, we propose a graph-based data model and associated formats to represent multiple genomes while preserving the coordinate of the linear reference genome. We implement our ideas in the minigraph toolkit and demonstrate that we can efficiently construct a pangenome graph and compactly encode tens of thousands of structural variants missing from the current reference genome. \ No newline at end of file diff --git a/content/_publications/Unbiased pangenome graphs.md b/content/_publications/Unbiased pangenome graphs.md new file mode 100644 index 0000000000000..4af2125bbbf9d --- /dev/null +++ b/content/_publications/Unbiased pangenome graphs.md @@ -0,0 +1,4 @@ +URL: https://academic.oup.com/bioinformatics/article/39/1/btac743/6854971 + +Motivation: pangenome variation graphs model the mutual alignment of collections of DNA sequences. A set of pairwise alignments implies a variation graph, but there are no scalable methods to generate such a graph from these alignments. Existing related approaches depend on a single reference, a specific ordering of genomes or a _de Bruijn_ model based on a fixed _k_-mer length. A scalable, self-contained method to build pangenome graphs without such limitations would be a key step in pangenome construction and manipulation pipelines. +Results: we design the _seqwish_ algorithm, which builds a variation graph from a set of sequences and alignments between them. We first transform the alignment set into an implicit interval tree. To build up the variation graph, we query this tree-based representation of the alignments to reduce transitive matches into single DNA segments in a sequence graph. By recording the mapping from input sequence to output graph, we can trace the original paths through this graph, yielding a pangenome variation graph. We present an implementation that operates in external memory, using disk-backed data structures and lock-free parallel methods to drive the core graph induction step. We demonstrate that our method scales to very large graph induction problems by applying it to build pangenome graphs for several species. \ No newline at end of file