Skip to content

Commit

Permalink
Adding discussion points to labs, small fixes to plots etc.
Browse files Browse the repository at this point in the history
  • Loading branch information
asabjorklund committed Feb 7, 2025
1 parent 6578e32 commit 6f26989
Show file tree
Hide file tree
Showing 102 changed files with 4,811 additions and 2,228 deletions.
401 changes: 0 additions & 401 deletions compiled/labs/bioc/bioc_05_dge_scoreMarkers.qmd

This file was deleted.

4,082 changes: 2,871 additions & 1,211 deletions compiled/labs/scanpy/scanpy_08_spatial.ipynb

Large diffs are not rendered by default.

44 changes: 35 additions & 9 deletions docs/labs/bioc/bioc_01_qc.html
Original file line number Diff line number Diff line change
Expand Up @@ -207,7 +207,7 @@ <h1 class="title"><i class="fa-solid fa-clipboard-list" aria-label="clipboard-li
<div>
<div class="quarto-title-meta-heading">Published</div>
<div class="quarto-title-meta-contents">
<p class="date">28-Jan-2025</p>
<p class="date">07-Feb-2025</p>
</div>
</div>

Expand Down Expand Up @@ -361,8 +361,8 @@ <h2 data-number="2" class="anchored" data-anchor-id="meta-qc_collate"><span clas
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a><span class="fu">gc</span>()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code> used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
Ncells 11942136 637.8 17360325 927.2 NA 17177925 917.5
Vcells 47518126 362.6 119306430 910.3 65536 119037699 908.2</code></pre>
Ncells 11942215 637.8 17360540 927.2 NA 17177386 917.4
Vcells 47518322 362.6 119307269 910.3 65536 119037895 908.2</code></pre>
</div>
</div>
<p>Here is how the count matrix and the metadata look like for every cell.</p>
Expand Down Expand Up @@ -489,6 +489,19 @@ <h2 data-number="4" class="anchored" data-anchor-id="meta-qc_plotqc"><span class
</div>
</div>
</div>
<div class="callout callout-style-default callout-note callout-titled" title="Discuss">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Discuss
</div>
</div>
<div class="callout-body-container callout-body">
<p>Looking at the violin plots, what do you think are appropriate cutoffs for filtering these samples</p>
</div>
</div>
<p>As you can see, there is quite some difference in quality for these samples, with for instance the covid_15 and covid_16 samples having cells with fewer detected genes and more mitochondrial content. As the ribosomal proteins are highly expressed they will make up a larger proportion of the transcriptional landscape when fewer of the lowly expressed genes are detected. We can also plot the different QC-measures as scatter plots.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb20"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb20-1"><a href="#cb20-1" aria-hidden="true" tabindex="-1"></a><span class="fu">plotColData</span>(sce, <span class="at">x =</span> <span class="st">"total"</span>, <span class="at">y =</span> <span class="st">"detected"</span>, <span class="at">colour_by =</span> <span class="st">"sample"</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
Expand Down Expand Up @@ -638,7 +651,7 @@ <h2 data-number="6" class="anchored" data-anchor-id="meta-qc_sex"><span class="h
<p>When working with human or animal samples, you should ideally constrain your experiments to a single sex to avoid including sex bias in the conclusions. However this may not always be possible. By looking at reads from chromosomeY (males) and XIST (X-inactive specific transcript) expression (mainly female) it is quite easy to determine per sample which sex it is. It can also be a good way to detect if there has been any mislabelling in which case, the sample metadata sex does not agree with the computational predictions.</p>
<p>To get chromosome information for all genes, you should ideally parse the information from the gtf file that you used in the mapping pipeline as it has the exact same annotation version/gene naming. However, it may not always be available, as in this case where we have downloaded public data. R package biomaRt can be used to fetch annotation information. The code to run biomaRt is provided. As the biomart instances are quite often unresponsive, we will download and use a file that was created in advance.</p>
<div class="callout callout-style-default callout-tip callout-titled">
<div class="callout-header d-flex align-content-center" data-bs-toggle="collapse" data-bs-target=".callout-3-contents" aria-controls="callout-3" aria-expanded="false" aria-label="Toggle callout">
<div class="callout-header d-flex align-content-center" data-bs-toggle="collapse" data-bs-target=".callout-4-contents" aria-controls="callout-4" aria-expanded="false" aria-label="Toggle callout">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
Expand All @@ -647,7 +660,7 @@ <h2 data-number="6" class="anchored" data-anchor-id="meta-qc_sex"><span class="h
</div>
<div class="callout-btn-toggle d-inline-block border-0 py-1 ps-1 pe-0 float-end"><i class="callout-toggle"></i></div>
</div>
<div id="callout-3" class="callout-3-contents callout-collapse collapse">
<div id="callout-4" class="callout-4-contents callout-collapse collapse">
<div class="callout-body-container callout-body">
<p>Here is the code to download annotation data from Ensembl using biomaRt. We will not run this now and instead use a pre-computed file in the step below.</p>
<div class="cell">
Expand Down Expand Up @@ -778,7 +791,7 @@ <h2 data-number="7" class="anchored" data-anchor-id="meta-qc_cellcycle"><span cl
<div class="cell-output cell-output-stdout">
<pre><code>
G1 G2M S
4331 833 1747 </code></pre>
4330 838 1743 </code></pre>
</div>
</div>
<p>We can now create a violin plot for the cell cycle scores as well.</p>
Expand Down Expand Up @@ -864,7 +877,7 @@ <h2 data-number="8" class="anchored" data-anchor-id="meta-qc_doublet"><span clas
<div class="sourceCode cell-code" id="cb50"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb50-1"><a href="#cb50-1" aria-hidden="true" tabindex="-1"></a>sce.filt <span class="ot">&lt;-</span> sce.filt[, sce.filt<span class="sc">$</span>scDblFinder.class <span class="sc">==</span> <span class="st">"singlet"</span>]</span>
<span id="cb50-2"><a href="#cb50-2" aria-hidden="true" tabindex="-1"></a><span class="fu">dim</span>(sce.filt)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 18854 6830</code></pre>
<pre><code>[1] 18854 6711</code></pre>
</div>
</div>
<p>To summarize, lets check how many cells we have removed per sample, we started with 1500 cells per sample. Looking back at the intitial QC plots does it make sense that some samples have much fewer cells now?</p>
Expand All @@ -879,7 +892,20 @@ <h2 data-number="8" class="anchored" data-anchor-id="meta-qc_doublet"><span clas
<div class="cell-output cell-output-stdout">
<pre><code>
cov.1 cov.15 cov.16 cov.17 ctrl.13 ctrl.14 ctrl.19 ctrl.5
825 529 352 1047 1102 933 1072 970 </code></pre>
808 514 347 1025 1094 906 1075 942 </code></pre>
</div>
</div>
<div class="callout callout-style-default callout-note callout-titled" title="Discuss">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Discuss
</div>
</div>
<div class="callout-body-container callout-body">
<p>“In this case we ran doublet detection with all samples together since we have very small subsampled datasets. But in a real scenario it should be run one sample at a time. Why is this important do you think?”</p>
</div>
</div>
</section>
Expand Down Expand Up @@ -1258,7 +1284,7 @@ <h2 data-number="10" class="anchored" data-anchor-id="meta-session"><span class=
</div>
</div>
</footer>
<script>var lightboxQuarto = GLightbox({"openEffect":"zoom","selector":".lightbox","loop":true,"closeEffect":"zoom","descPosition":"bottom"});</script>
<script>var lightboxQuarto = GLightbox({"descPosition":"bottom","loop":true,"closeEffect":"zoom","selector":".lightbox","openEffect":"zoom"});</script>



Expand Down
Binary file modified docs/labs/bioc/bioc_01_qc_files/figure-html/cc-plot-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/labs/bioc/bioc_01_qc_files/figure-html/doublet-plot-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/labs/bioc/bioc_01_qc_files/figure-html/doublet-vln-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
18 changes: 15 additions & 3 deletions docs/labs/bioc/bioc_02_dimred.html
Original file line number Diff line number Diff line change
Expand Up @@ -207,7 +207,7 @@ <h1 class="title"><i class="fa-brands fa-hubspot" aria-label="hubspot"></i> Dime
<div>
<div class="quarto-title-meta-heading">Published</div>
<div class="quarto-title-meta-contents">
<p class="date">28-Jan-2025</p>
<p class="date">07-Feb-2025</p>
</div>
</div>

Expand Down Expand Up @@ -412,7 +412,19 @@ <h2 data-number="4" class="anchored" data-anchor-id="meta-dimred_pca"><span clas
</div>
</div>
</div>
<p>Clearly, the sample id contributes to many of the PCs and PC7 but you can also see that many PCs are effected by different QC parameters.</p>
<div class="callout callout-style-default callout-note callout-titled" title="Discuss">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Discuss
</div>
</div>
<div class="callout-body-container callout-body">
<p>Have a look at the plot from <code>plotExplanatoryPCs</code> and the gene loadings plots. Do you think the top components are biologically relevant or more driven by technical noise</p>
</div>
</div>
<p>We can also plot the amount of variance explained by each PC.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb10"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="fu">plot</span>(<span class="fu">attr</span>(<span class="fu">reducedDim</span>(sce, <span class="st">"PCA"</span>), <span class="st">"percentVar"</span>)[<span class="dv">1</span><span class="sc">:</span><span class="dv">50</span>] <span class="sc">*</span> <span class="dv">100</span>, <span class="at">type =</span> <span class="st">"l"</span>, <span class="at">ylab =</span> <span class="st">"% variance"</span>, <span class="at">xlab =</span> <span class="st">"Principal component #"</span>)</span>
Expand Down Expand Up @@ -966,7 +978,7 @@ <h2 data-number="10" class="anchored" data-anchor-id="meta-session"><span class=
</div>
</div>
</footer>
<script>var lightboxQuarto = GLightbox({"loop":true,"selector":".lightbox","closeEffect":"zoom","descPosition":"bottom","openEffect":"zoom"});</script>
<script>var lightboxQuarto = GLightbox({"loop":true,"closeEffect":"zoom","selector":".lightbox","descPosition":"bottom","openEffect":"zoom"});</script>



Expand Down
Binary file modified docs/labs/bioc/bioc_02_dimred_files/figure-html/pca-elbow-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/labs/bioc/bioc_02_dimred_files/figure-html/pca-plot-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/labs/bioc/bioc_02_dimred_files/figure-html/plot-hvg-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/labs/bioc/bioc_02_dimred_files/figure-html/plot-qc-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/labs/bioc/bioc_02_dimred_files/figure-html/plot-tsne-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/labs/bioc/bioc_02_dimred_files/figure-html/plot-umap-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
75 changes: 44 additions & 31 deletions docs/labs/bioc/bioc_03_integration.html
Original file line number Diff line number Diff line change
Expand Up @@ -207,7 +207,7 @@ <h1 class="title"><i class="fa-solid fa-diagram-project" aria-label="diagram-pro
<div>
<div class="quarto-title-meta-heading">Published</div>
<div class="quarto-title-meta-contents">
<p class="date">28-Jan-2025</p>
<p class="date">07-Feb-2025</p>
</div>
</div>

Expand Down Expand Up @@ -361,6 +361,19 @@ <h2 data-number="1" class="anchored" data-anchor-id="meta-int_prep"><span class=
</div>
<p>As you can see, there are a lot of genes that are variable in just one dataset. There are also some genes in the gene set that was selected using all the data without blocking samples, that are not variable in any of the individual datasets. These are most likely genes driven by batch effects.</p>
<p>The best way to select features for integration is to combine the information on variable genes across the dataset. This is what we have in the <code>all</code> section where the information on variable features in the different datasets is combined.</p>
<div class="callout callout-style-default callout-note callout-titled" title="Discuss">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Discuss
</div>
</div>
<div class="callout-body-container callout-body">
<p>Did you understand the difference between running variable gene selection per dataset and combining them vs running it on all samples together. Can you think of any situation where it would be best to run it on all samples and a situation where it should be done by batch?</p>
</div>
</div>
<p>For all downstream integration we will use this set of genes so that it is comparable across the methods. We already used that set of genes in the dimensionality reduction exercise to run scaling and pca.</p>
<p>We also store the variable gene information in the object for use furhter down the line.</p>
<div class="cell">
Expand Down Expand Up @@ -531,28 +544,28 @@ <h2 data-number="4" class="anchored" data-anchor-id="meta-dimred_scanorama"><spa
<span id="cb17-6"><a href="#cb17-6" aria-hidden="true" tabindex="-1"></a><span class="fu">lapply</span>(scelist, dim)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[[1]]
[1] 825 2000
[1] 808 2000

[[2]]
[1] 529 2000
[1] 514 2000

[[3]]
[1] 352 2000
[1] 347 2000

[[4]]
[1] 1047 2000
[1] 1025 2000

[[5]]
[1] 970 2000
[1] 942 2000

[[6]]
[1] 1102 2000
[1] 1094 2000

[[7]]
[1] 933 2000
[1] 906 2000

[[8]]
[1] 1072 2000</code></pre>
[1] 1075 2000</code></pre>
</div>
</div>
<p>Scanorama is implemented in python, but through reticulate we can load python packages and run python functions. In this case we also use the <code>basilisk</code> package for a more clean activation of python environment.</p>
Expand All @@ -567,20 +580,20 @@ <h2 data-number="4" class="anchored" data-anchor-id="meta-dimred_scanorama"><spa
<span id="cb19-7"><a href="#cb19-7" aria-hidden="true" tabindex="-1"></a>}, <span class="at">datas =</span> scelist, <span class="at">genes =</span> genelist, <span class="at">testload=</span><span class="st">"scanorama"</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Found 2000 genes among all datasets
[[0. 0.60113422 0.40625 0.30563515 0.50206186 0.32727273
0.31030303 0.1613806 ]
[0. 0. 0.76748582 0.30245747 0.40515464 0.18903592
0.16068053 0.10775047]
[0. 0. 0. 0.23579545 0.45170455 0.43465909
0.29829545 0.26988636]
[0. 0. 0. 0. 0.25051546 0.06017192
0.11747851 0.20708955]
[0. 0. 0. 0. 0. 0.85154639
0.65979381 0.19962687]
[[0. 0.56420233 0.40057637 0.33365854 0.46287129 0.34777228
0.29331683 0.13861386]
[0. 0. 0.74319066 0.26653696 0.36836518 0.20622568
0.1848249 0.12062257]
[0. 0. 0. 0.26224784 0.44092219 0.4870317
0.28818444 0.23054755]
[0. 0. 0. 0. 0.23673036 0.05268293
0.11414634 0.20930233]
[0. 0. 0. 0. 0. 0.85881104
0.61252654 0.21302326]
[0. 0. 0. 0. 0. 0.
0.78221416 0.47574627]
0.79981718 0.44930233]
[0. 0. 0. 0. 0. 0.
0. 0.68283582]
0. 0.6744186 ]
[0. 0. 0. 0. 0. 0.
0. 0. ]]
Processing datasets (4, 5)
Expand All @@ -589,27 +602,27 @@ <h2 data-number="4" class="anchored" data-anchor-id="meta-dimred_scanorama"><spa
Processing datasets (6, 7)
Processing datasets (4, 6)
Processing datasets (0, 1)
Processing datasets (2, 5)
Processing datasets (0, 4)
Processing datasets (5, 7)
Processing datasets (2, 4)
Processing datasets (2, 5)
Processing datasets (0, 2)
Processing datasets (1, 4)
Processing datasets (0, 5)
Processing datasets (0, 6)
Processing datasets (0, 3)
Processing datasets (1, 3)
Processing datasets (0, 6)
Processing datasets (2, 6)
Processing datasets (2, 7)
Processing datasets (3, 4)
Processing datasets (1, 3)
Processing datasets (2, 3)
Processing datasets (3, 7)
Processing datasets (3, 4)
Processing datasets (2, 7)
Processing datasets (4, 7)
Processing datasets (3, 7)
Processing datasets (1, 5)
Processing datasets (0, 7)
Processing datasets (1, 6)
Processing datasets (3, 6)
Processing datasets (1, 7)</code></pre>
Processing datasets (0, 7)
Processing datasets (1, 7)
Processing datasets (3, 6)</code></pre>
</div>
<div class="sourceCode cell-code" id="cb21"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb21-1"><a href="#cb21-1" aria-hidden="true" tabindex="-1"></a>intdimred <span class="ot">&lt;-</span> <span class="fu">do.call</span>(rbind, integrated.data[[<span class="dv">1</span>]])</span>
<span id="cb21-2"><a href="#cb21-2" aria-hidden="true" tabindex="-1"></a><span class="fu">colnames</span>(intdimred) <span class="ot">&lt;-</span> <span class="fu">paste0</span>(<span class="st">"PC_"</span>, <span class="dv">1</span><span class="sc">:</span><span class="dv">100</span>)</span>
Expand Down Expand Up @@ -1005,7 +1018,7 @@ <h2 data-number="6" class="anchored" data-anchor-id="meta-session"><span class="
</div>
</div>
</footer>
<script>var lightboxQuarto = GLightbox({"descPosition":"bottom","closeEffect":"zoom","loop":true,"openEffect":"zoom","selector":".lightbox"});</script>
<script>var lightboxQuarto = GLightbox({"closeEffect":"zoom","selector":".lightbox","descPosition":"bottom","loop":true,"openEffect":"zoom"});</script>



Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 6f26989

Please sign in to comment.