Quartz sync: Feb 1, 2024, 11:23 AM

dubssieg · Feb 1, 2024 · cf1f4af · cf1f4af
1 parent b491e23
commit cf1f4af
Show file tree

Hide file tree

Showing 24 changed files with 227 additions and 67 deletions.
diff --git a/content/Building a graph/minigraph.md b/content/Building a graph/minigraph.md
@@ -23,4 +23,6 @@ It's a [development choice](https://github.com/lh3/minigraph/issues/27) that was
 A pull request was made in 2022, adding [P-lines support to minigraph](https://github.com/lh3/minigraph/pull/77) but was never accepted. However, one can get this version by getting the associated commit ID.
 
 > [!WARNING] Warning
-> minigraph outputs nodes prefixed with `s` ; with some tools (such as odgi) it may cause crashes. To convert those rGFA's to standard GFA files, [you can use gfautil](https://github.com/vgteam/vg/issues/3129)
+> minigraph outputs nodes prefixed with `s` ; with some tools (such as odgi) it may cause crashes. To convert those rGFA's to standard GFA files, [you can use gfautil](https://github.com/vgteam/vg/issues/3129)
+
+It may be possible to get some kind of paths in a rGFA using `vg convert` according to [this answer](https://github.com/pangenome/odgi/issues/546#issuecomment-1893382366)
diff --git a/content/Pancat and GfaGraphs/GfaGraphs/gfagraphs.md b/content/Pancat and GfaGraphs/GfaGraphs/gfagraphs.md
@@ -1,4 +1,7 @@
 ---
 title: "GfaGraphs: Python abstraction layer for GFA graph format"
 ---
-![[library_flowchart.png]]
+![[library_flowchart.png]]
+Known limitations:
++ As of now, not scaling well in terms of memory for huge graphs (like full HPRC) as 256G of RAM is not sufficient to load PGGB and MGC graphs in memory at the same time
++ Takes a long time to load huge graphs (many hours for HPRC aswell)
diff --git a/content/Publications/Generalities.canvas b/content/Publications/Generalities.canvas
diff --git a/content/Useful commands/sequences.md b/content/Useful commands/sequences.md
@@ -4,4 +4,10 @@ title: Interact with sequences
 Get statistics on sequences:
 + awk command to get the size of all lectures in a file : `awk '/^>/{if (l!="") print l; print; l=0; next}{l+=length($0)}END{print l}' unique.fasta |paste - -`
 + samtools command to index a file (useful for pggb): `samtools faidx myfile.fasta`
-+ Replace string in file: `"s/thing_to_replace/thing_replacing/g" file > out`
++ Replace string in file: `"s/thing_to_replace/thing_replacing/g" file > out`
+
+Get fast stats on a GFA file:
++ Print all number of lines types: `<file.txt sed 's/^\(.\).*/\1/' | sort | uniq -c` 
+
+Schueldule jobs on SLURM cluster:
++ See [here](https://stackoverflow.com/questions/60583279/how-to-make-sbatch-job-run-after-a-previous-one-has-completed) to chain jobs
diff --git a/content/Working with graphs/Tools/odgi.md b/content/Working with graphs/Tools/odgi.md
@@ -1,4 +1,26 @@
 ---
 title: odgi
 ---
-List of [commands](https://odgi.readthedocs.io/en/latest/rst/commands/odgi.html) is available here.
+List of [commands](https://odgi.readthedocs.io/en/latest/rst/commands/odgi.html) is available here.
+# General commands
+## Convert graph formats
+
+To convert from odgi (.og) format to another formats (like GFA for instance) it is possible to use `odgi view`.
+
+```bash
+# Convert to GFA
+odgi view -g -i $INPUT > $OUTPUT
+# INPUT is a .og file
+# OUTPUT is a new .gfa file
+# -g stands for "convert to gfa"
+```
+# Python bindings
+
+> [!WARNING] Warning
+> It exists an older implementation of bindings, which is the one referenced in the readsthedocs.io, HOWEVER it is not [the one which should be used](https://github.com/pangenome/odgi/blob/master/test/python/odgi_ffi.md) for [performance reasons](https://github.com/pangenome/odgi/blob/master/test/python/odgi_performance.md) as well as stability issues...
+
+According to the documentation, `odgi_ffi` is meant to be used more as a tool to build a Python library than being the actual Python library.
+
+> Note that odgi also has an older high-level Python API `import odgi` that is somewhat obsolete. Instead you should probably use below `import odgi_ffi` lower level API to construct your own library.
+
+In order to fix segfaults, set `LD_PRELOAD=libjemalloc.so.2` before running Python scripts. However, I could not get it to work in any way, as if I can as the time I'm writing those lines import the bindings, I could not load a graph, giving an error when I try to (`RuntimeError: Error rewinding to load non-magic-prefixed SerializableHandleGraph`) . Given the maintainers are not implying that the [library is not fully stable](https://github.com/pangenome/odgi/issues/425#issuecomment-1305566300) I won't settle on it for the future.
diff --git a/content/Working with graphs/Tools/vg.md b/content/Working with graphs/Tools/vg.md
@@ -0,0 +1,16 @@
+---
+title: VG toolkit
+---
+> [!WARNING] Warning
+> vg commands on graphs that are compressed **does not work**. It will raise a 'invalid graph type' error.
+## Convert from GFA1.1 to GFA1
+
+`vg convert in.gfa -W -f > out.gfa`
++ `-W` stands for suppress W-lines
++ `-f` is to output to file
+## Convert from vg, json to GFA
+`vg view [-J|-V|-F] input_graph -g > out.gfa`
+
+## Call bubbles on graph to get variants
+`vg deconstruct -p ref graph.gfa > variants.vcf`
++ `-p [STR]` stands for the path to use as reference to call variants
diff --git a/content/Working with graphs/visualize.md b/content/Working with graphs/visualize.md
@@ -4,8 +4,9 @@ title: How to visualize a pangenome
 ## Dynamic representations
 
 + [gfaviz](https://github.com/ggonnella/gfaviz) by ggonnella. Supports GFA1 and GFA2 formats
-+ [pancat](https://github.com/Tharos-ux/pancat) *own work*. Supports theorically all GFA types, feel free to open issues!
++ [pancat](https://github.com/Tharos-ux/pancat) (*own work*). Supports theorically all GFA types, feel free to open issues!
 + [bandage](https://rrwick.github.io/Bandage/) by R. Wick.
++ [gfaestus](https://github.com/chfi/gfaestus) by C. Fischer.
 
 ## Static representations
 

diff --git a/content/_imgs/Pasted image 20240115144532.png b/content/_imgs/Pasted image 20240115144532.png
diff --git a/content/_notes/feedback_pangenomes.md b/content/_notes/feedback_pangenomes.md
@@ -0,0 +1,49 @@
+---
+title: Feedback on pangenome graph construction
+---
+Our object : a 'variation graph' (which is not a De Bruijn graph) which contains nodes with labels.
+
+To construct: 
++ pairwise alignment
+	+ with software designed for full-genome alignment
+	+ MSA (multiple sequence alignment)
++ graph construction
+	+ create nodes and edges
+	+ save paths
++ post-process (optionnal)
+	+ pruning
+	+ `gfaffix` at some point
+	+ topological simplification
+	+ compression
+	+ ...
+
+Used tools today:
++ Variation Graph (VG)
++ Minigraph (MG)
+	+ From an alignment `minigraph --ggen -L <min_size_of_variants> -c <genomes>`
+	+ The graph is relative to reference: if we can't align on it, we don't put it in graph
+	+ L parameter lowered makes minigraph much slower and yield issues
+	+ Higher L parameters can help align more diverging sequences
++ Minigraph-Cactus (MGC)
+	+ It is possible to give a guide tree
+	+ High level SV graph from MG
+	+ This graph is used as backbone
+	+ Put something as 'reference': this sequence won't be clipped nor cycled
++ PanGenome Graph Builder (PGGB)
+	+ Curate data before to disassemble chromosomes (tutorials available, where?)
+		+ Huge possibilities: how to cluster chromosomes that are close together?
+	+ Use of `wfmash` for pairwise all-vs-all alignment 
+	+ For graph induction: `seqwish`
+	+ Smoothing with `smoothxg`
+		+ May add paths that are not even describing a genome?
+		+ Notion of consensus path elaborated [here](https://github.com/pangenome/smoothxg/issues/37)
+		+ Keeps a consensus and destroys some paths that does not follow 
+		+ From the author:
+			+ Many things should be removed
+			+ As of now, they don't even use it internally
+			+ Output of seqwish: should be default output but very large file
+			+ Problem: algorithms like stochastic gradient descent on multi-thread implies that 'seeds' are not fixed: we can have different graphs from the same data 
+	![[Pasted image 20240115144532.png]]
+	+ Post process with `gfaffix` and `odgi`
+
+Cycles are a problem for future usage of graphs. Implement a tool to 'linearize' a graph?
diff --git a/content/_publications/A draft human pangenome reference.md b/content/_publications/A draft human pangenome reference.md
@@ -0,0 +1,3 @@
+URL: https://www.nature.com/articles/s41586-023-05896-x
+
+Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals1 . These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.
diff --git a/content/_publications/Building pangenome graphs.md b/content/_publications/Building pangenome graphs.md
@@ -0,0 +1,3 @@
+URL: https://www.biorxiv.org/content/10.1101/2023.04.05.535718v1
+
+Pangenome graphs can represent all variation between multiple genomes, but existing methods for constructing them are biased due to reference-guided approaches. In response, we have developed PanGenome Graph Builder (PGGB), a reference-free pipeline for constructing unbi-ased pangenome graphs. PGGB uses all-to-all whole-genome alignments and learned graph embeddings to build and iteratively refine a model in which we can identify variation, measure conservation, detect recombination events, and infer phylogenetic relationships.
diff --git a/...ent/_publications/Cactus - Algorithms for genome multiple sequence alignment.md b/...ent/_publications/Cactus - Algorithms for genome multiple sequence alignment.md
@@ -0,0 +1,3 @@
+DOI : https://doi.org/10.1101%2Fgr.123356.111
+
+Much attention has been given to the problem of creating reliable multiple sequence alignments in a model incorporating substitutions, insertions, and deletions. Far less attention has been paid to the problem of optimizing alignments in the presence of more general rearrangement and copy number variation. Using Cactus graphs, recently introduced for representing sequence alignments, we describe two complementary algorithms for creating genomic alignments. We have implemented these algorithms in the new “Cactus” alignment program. We test Cactus using the Evolver genome evolution simulator, a comprehensive new tool for simulation, and show using these and existing simulations that Cactus significantly outperforms all of its peers. Finally, we make an empirical assessment of Cactus's ability to properly align genes and find interesting cases of intra-gene duplication within the primates.
diff --git a/content/_publications/Cactus Graphs for Genome Comparisons.md b/content/_publications/Cactus Graphs for Genome Comparisons.md
@@ -0,0 +1,8 @@
+DOI: 10.1089/cmb.2010.0252
+
+We introduce a data structure, analysis, and visualization scheme called a cactus graph for
+comparing sets of related genomes. In common with multi-break point graphs and A-Bruijn
+graphs, cactus graphs can represent duplications and general genomic rearrangements, but
+additionally, they naturally decompose the common substructures in a set of related genomes
+into a hierarchy of chains that can be visualized as two-dimensional multiple alignments and
+nets that can be visualized in circular genome plots.
diff --git a/content/_publications/Construction and representation of human pangenome graphs.md b/content/_publications/Construction and representation of human pangenome graphs.md
@@ -0,0 +1,3 @@
+URL: https://pasteur.hal.science/pasteur-04126278/
+
+As a single reference genome cannot possibly represent all the variation present across human individuals, pangenome graphs have been introduced to incorporate population diversity within a wide range of genomic analyses. Several data structures have been proposed for representing collections of genomes as pangenomes, in particular graphs. In this work we collect all publicly available high-quality human haplotypes and constructed the largest human pangenome graphs to date, incorporating 52 individuals in addition to two synthetic references (CHM13 and GRCh38). We build variation graphs and de Bruijn graphs of this collection using five of the state-of-the-art tools: Bifrost , mdbg , Minigraph , Minigraph-Cactus and pggb . We examine differences in the way each of these tools represents variations between input sequences, both in terms of overall graph structure and representation of specific genetic loci. This work sheds light on key differences between pangenome graph representations, informing end-users on how to select the most appropriate graph type for their application.
diff --git a/content/_publications/Distance indexing and seed clustering in sequence graphs.md b/content/_publications/Distance indexing and seed clustering in sequence graphs.md
@@ -0,0 +1,12 @@
+URL: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7355256/pdf/btaa446.pdf
+
+Motivation: Graph representations of genomes are capable of expressing more genetic variation and can therefore
+better represent a population than standard linear genomes. However, due to the greater complexity of genome
+graphs relative to linear genomes, some functions that are trivial on linear genomes become much more difficult in
+genome graphs. Calculating distance is one such function that is simple in a linear genome but complicated in a
+graph context. In read mapping algorithms such distance calculations are fundamental to determining if seed align-
+ments could belong to the same mapping.
+Results: We have developed an algorithm for quickly calculating the minimum distance between positions on a se-
+quence graph using a minimum distance index. We have also developed an algorithm that uses the distance index
+to cluster seeds on a graph. We demonstrate that our implementations of these algorithms are efficient and practical
+to use for a new generation of mapping algorithms based upon genome graphs.
diff --git a/content/_publications/GBZ file format for pangenome graphs.md b/content/_publications/GBZ file format for pangenome graphs.md
@@ -0,0 +1,4 @@
+URL: https://pubmed.ncbi.nlm.nih.gov/36179091/
+
+**Motivation:** Pangenome graphs representing aligned genome assemblies are being shared in the text-based Graphical Fragment Assembly format. As the number of assemblies grows, there is a need for a file format that can store the highly repetitive data space efficiently.
+**Results:** We propose the GBZ file format based on data structures used in the Giraffe short-read aligner. The format provides good compression, and the files can be efficiently loaded into in-memory data structures. We provide compression and decompression tools and libraries for using GBZ graphs, and we show that they can be efficiently used on a variety of systems.
diff --git a/...ions/Gap-Sensitive Colinear Chaining Algorithms for Acyclic Pangenome Graphs.md b/...ions/Gap-Sensitive Colinear Chaining Algorithms for Acyclic Pangenome Graphs.md
@@ -0,0 +1,3 @@
+URL: https://www.liebertpub.com/doi/10.1089/cmb.2023.0186
+
+A pangenome graph can serve as a better reference for genomic studies because it allows a compact representation of multiple genomes within a species. Aligning sequences to a graph is critical for pangenome-based resequencing. The seed-chain-extend heuristic works by finding short exact matches between a sequence and a graph. In this heuristic, colinear chaining helps identify a good cluster of exact matches that can be combined to form an alignment. Colinear chaining algorithms have been extensively studied for aligning two sequences with various gap costs, including linear, concave, and convex cost functions. However, extending these algorithms for sequence-to-graph alignment presents significant challenges. Recently, Makinen et al. introduced a sparse dynamic programming framework that exploits the small path cover property of acyclic pangenome graphs, enabling efficient chaining. However, this framework does not consider gap costs, limiting its practical effectiveness. We address this limitation by developing novel problem formulations and provably good chaining algorithms that support a variety of gap cost functions. These functions are carefully designed to enable fast chaining algorithms whose time requirements are parameterized in terms of the size of the minimum path cover. Through an empirical evaluation, we demonstrate the superior performance of our algorithm compared with existing aligners. When mapping simulated long reads to a pangenome graph comprising 95 human haplotypes, we achieved 98.7% precision while leaving <2% of reads unmapped.
diff --git a/...nt/_publications/Movi - a fast and cache-efficient full-text pangenome index.md b/...nt/_publications/Movi - a fast and cache-efficient full-text pangenome index.md
@@ -0,0 +1,3 @@
+URL : https://www.biorxiv.org/content/10.1101/2023.11.04.565615v1.full
+
+Efficient pangenome indexes are promising tools for many applications, including rapid classification of nanopore sequencing reads. Recently, a compressed-index data structure called the “move structure” was proposed as an alternative to other BWT-based indexes like the FM index and r-index. The move structure uniquely achieves both O(r) space and O(1)-time queries, where r is the number of runs in the pangenome BWT. We implemented Movi, an efficient tool for building and querying move-structure pangenome indexes. While the size of the Movi’s index is larger than the r-index, it scales at a smaller rate for pangenome references, as its size is exactly proportional to r, the number of runs in the BWT of the reference. Movi can compute sophisticated matching queries needed for classification – such as pseudo-matching lengths – at least ten times faster than the fastest available methods. Movi achieves this speed by leveraging the move structure’s strong locality of reference, incurring close to the minimum possible number of cache misses for queries against large pangenomes. Movi’s fast constant-time query loop makes it well suited to real-time applications like adaptive sampling for nanopore sequencing, where decisions must be made in a small and predictable time interval.
diff --git a/...ns/Pangenome graph construction from genome alignments with Minigraph-Cactus.md b/...ns/Pangenome graph construction from genome alignments with Minigraph-Cactus.md
@@ -0,0 +1,3 @@
+URL: https://www.nature.com/articles/s41587-023-01793-w
+
+Pangenome references address biases of reference genomes by storing a representative set of diverse haplotypes and their alignment, usually as a graph. Alternate alleles determined by variant callers can be used to construct pangenome graphs, but advances in long-read sequencing are leading to widely available, high-quality phased assemblies. Constructing a pangenome graph directly from assemblies, as opposed to variant calls, leverages the graph’s ability to represent variation at different scales. Here we present the Minigraph-Cactus pangenome pipeline, which creates pangenomes directly from whole-genome alignments, and demonstrate its ability to scale to 90 human haplotypes from the Human Pangenome Reference Consortium. The method builds graphs containing all forms of genetic variation while allowing use of current mapping and genotyping tools. We measure the effect of the quality and completeness of reference genomes used for analysis within the pangenomes and show that using the CHM13 reference from the Telomere-to-Telomere Consortium improves the accuracy of our methods. We also demonstrate construction of a _Drosophila melanogaster_ pangenome.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		URL: https://www.nature.com/articles/s41586-023-05896-x

		Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals1 . These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		URL: https://www.biorxiv.org/content/10.1101/2023.04.05.535718v1

		Pangenome graphs can represent all variation between multiple genomes, but existing methods for constructing them are biased due to reference-guided approaches. In response, we have developed PanGenome Graph Builder (PGGB), a reference-free pipeline for constructing unbi-ased pangenome graphs. PGGB uses all-to-all whole-genome alignments and learned graph embeddings to build and iteratively refine a model in which we can identify variation, measure conservation, detect recombination events, and infer phylogenetic relationships.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		DOI : https://doi.org/10.1101%2Fgr.123356.111

		Much attention has been given to the problem of creating reliable multiple sequence alignments in a model incorporating substitutions, insertions, and deletions. Far less attention has been paid to the problem of optimizing alignments in the presence of more general rearrangement and copy number variation. Using Cactus graphs, recently introduced for representing sequence alignments, we describe two complementary algorithms for creating genomic alignments. We have implemented these algorithms in the new “Cactus” alignment program. We test Cactus using the Evolver genome evolution simulator, a comprehensive new tool for simulation, and show using these and existing simulations that Cactus significantly outperforms all of its peers. Finally, we make an empirical assessment of Cactus's ability to properly align genes and find interesting cases of intra-gene duplication within the primates.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		URL: https://pasteur.hal.science/pasteur-04126278/

		As a single reference genome cannot possibly represent all the variation present across human individuals, pangenome graphs have been introduced to incorporate population diversity within a wide range of genomic analyses. Several data structures have been proposed for representing collections of genomes as pangenomes, in particular graphs. In this work we collect all publicly available high-quality human haplotypes and constructed the largest human pangenome graphs to date, incorporating 52 individuals in addition to two synthetic references (CHM13 and GRCh38). We build variation graphs and de Bruijn graphs of this collection using five of the state-of-the-art tools: Bifrost , mdbg , Minigraph , Minigraph-Cactus and pggb . We examine differences in the way each of these tools represents variations between input sequences, both in terms of overall graph structure and representation of specific genetic loci. This work sheds light on key differences between pangenome graph representations, informing end-users on how to select the most appropriate graph type for their application.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		URL: https://www.liebertpub.com/doi/10.1089/cmb.2023.0186

		A pangenome graph can serve as a better reference for genomic studies because it allows a compact representation of multiple genomes within a species. Aligning sequences to a graph is critical for pangenome-based resequencing. The seed-chain-extend heuristic works by finding short exact matches between a sequence and a graph. In this heuristic, colinear chaining helps identify a good cluster of exact matches that can be combined to form an alignment. Colinear chaining algorithms have been extensively studied for aligning two sequences with various gap costs, including linear, concave, and convex cost functions. However, extending these algorithms for sequence-to-graph alignment presents significant challenges. Recently, Makinen et al. introduced a sparse dynamic programming framework that exploits the small path cover property of acyclic pangenome graphs, enabling efficient chaining. However, this framework does not consider gap costs, limiting its practical effectiveness. We address this limitation by developing novel problem formulations and provably good chaining algorithms that support a variety of gap cost functions. These functions are carefully designed to enable fast chaining algorithms whose time requirements are parameterized in terms of the size of the minimum path cover. Through an empirical evaluation, we demonstrate the superior performance of our algorithm compared with existing aligners. When mapping simulated long reads to a pangenome graph comprising 95 human haplotypes, we achieved 98.7% precision while leaving <2% of reads unmapped.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		URL : https://www.biorxiv.org/content/10.1101/2023.11.04.565615v1.full

		Efficient pangenome indexes are promising tools for many applications, including rapid classification of nanopore sequencing reads. Recently, a compressed-index data structure called the “move structure” was proposed as an alternative to other BWT-based indexes like the FM index and r-index. The move structure uniquely achieves both O(r) space and O(1)-time queries, where r is the number of runs in the pangenome BWT. We implemented Movi, an efficient tool for building and querying move-structure pangenome indexes. While the size of the Movi’s index is larger than the r-index, it scales at a smaller rate for pangenome references, as its size is exactly proportional to r, the number of runs in the BWT of the reference. Movi can compute sophisticated matching queries needed for classification – such as pseudo-matching lengths – at least ten times faster than the fastest available methods. Movi achieves this speed by leveraging the move structure’s strong locality of reference, incurring close to the minimum possible number of cache misses for queries against large pangenomes. Movi’s fast constant-time query loop makes it well suited to real-time applications like adaptive sampling for nanopore sequencing, where decisions must be made in a small and predictable time interval.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		URL: https://www.nature.com/articles/s41587-023-01793-w

		Pangenome references address biases of reference genomes by storing a representative set of diverse haplotypes and their alignment, usually as a graph. Alternate alleles determined by variant callers can be used to construct pangenome graphs, but advances in long-read sequencing are leading to widely available, high-quality phased assemblies. Constructing a pangenome graph directly from assemblies, as opposed to variant calls, leverages the graph’s ability to represent variation at different scales. Here we present the Minigraph-Cactus pangenome pipeline, which creates pangenomes directly from whole-genome alignments, and demonstrate its ability to scale to 90 human haplotypes from the Human Pangenome Reference Consortium. The method builds graphs containing all forms of genetic variation while allowing use of current mapping and genotyping tools. We measure the effect of the quality and completeness of reference genomes used for analysis within the pangenomes and show that using the CHM13 reference from the Telomere-to-Telomere Consortium improves the accuracy of our methods. We also demonstrate construction of a _Drosophila melanogaster_ pangenome.