Skip to content

CRISPR gRNA Target Design

Tessa Alexanian edited this page Jun 4, 2015 · 7 revisions

Much of the excitement around CRISPR systems has centred around their potential for genome editing. However, the use of CRISPR immune systems in eukaryotes is also being explored. A recent study by Seeger & Sohn (2014) targeted Hepatitis B virus in a human cell line. They found that most gRNA targets were able to reduce the number of HBV-positive cells five-fold or more, but mechanism of immunity not entirely clear.

For most gRNA designs, the Cas9 breaks induced short insertions and deletions, likely caused by error-prone Non-Homologous End Joining of the cleaved ends, and the number of indels correlated with the reduction in virulence. One deisng, however, left 55% of cells wild-type while achieving similar reductions. The dynamics of repair and deletion after double-stranded cleavage by Cas9 is a major focus of the mathematical modelling this year.

However, we didn't have time to finish those models before ordering our gRNA constructs, so we decided to design targets for several possible situations.

I. Targeting P1, P3, P5, P6

We chose to target these genes because they have well-established functions related to the dynamics of CaMV inside the plant cell. P2 is not targeted because it is related to interactions with the aphid vector and P4 has a number of isomorphisms that make it somewhat impractical for targeting. You can see more about the functions of each of these genes on the Cauliflower Mosaic Virus wiki page.

II. Multiple targets on P6

P6 trans-activates the 35s mRNA and is responsible for many of the viral defences against the plant cell, making it the obvious choice if only a single gene is targeted. By placing a number of targets on P6, we hope we might disrupt its activity more immediately and induce large deletions when multiple cuts occur at one.

III. Target non-coding locations around promoter

This design lets us examine whether large deletions may be created by flanking a region with Cas9 sites and whether double-stranded breaks alone are descrutive. If CRISPR immunity is mainly conferred by frameshift mutations or small deletions caused by NHEJ, we wouldn't expect this design to offer much reduction in virulence. However, if CRISPR immunity is mainly conferred by degradation of cleaved DNA or large deletions between neighbouring cut sites, we would still expect a significant reduction in virulence. If we see a significant reduction in virulence with this design, we'll have to examine the DNA structure to see if there are large deletions.

IV. Reduce mRNA transciption using dCas9

Perhaps NHEJ is insufficiently error-prone to eliminate the viruses at the pace we need or that non-destructive mutations introduced by NHEJ will change the DNA sequence enough to prevent prevent Cas9 from recognizing the site for a second attempt . In these cases, suppression of the 35S or 19S promoter using dCas9 might do more to prevent virulence that cleavage using Cas9.

gRNA Selection Criteria

To select the gRNA targets, we ran the CaMV genomic sequence through the Benchling CRISPR design tool, identifying 682 possible targets in the CaMV with NGG pam sites. To narrow down the list of targets, we considered four factors: genome position, conservation, off-targeting and efficiency.

Genome Position

The goal with many of the designs is to target a specific gene or genes and induce frameshift mutations. Previous research (Doench et al. 2014) suggests that any frameshifts within the first 50% of a gene should be deleterious enough to prevent it from producing a functional protein. For design I we used a 40% cutoff after looking at the functional regions of the CaMV transcripts. The targets for Designs II-IV were subset by location (correct protein for II and IV, non-coding for III) but didn't have to all be near the beginning of the gene.

Conservation

We want to target areas of the genome that are functionally important and assumed this would correspond well to area that are conserved (i.e. unchanging) between CaMV and closely-related viruses. Targeting conserved areas should also make our gRNA targets more robust, since those areas are unlikely to contain many mutations or isomorphisms among different strains of CaMV.

Off-Targeting

Since we want to provide extra protection for Arabidopsis plants against invading viruses, a major design priority was to avoid cutting apart the Arabidopsis genome by accident. The scoring algorithm

Many regions of the Arabidopsis genome are non-coding, so we also ran a BLAST result of all the off-target matches identified by Benchling. For example, we considered an off-target match in a tDNA insertion site or a putative protein to be less important than one related to a known mRNA or protein.

Efficiency

Someone needs to read the Doench paper though and write this section. Created 4 designs after learning about NHEJ:

Non-overlapping

Matters for Cas9 (mutations kill both) and dCas9 (overlaps/interference)

Protocol

To include: links to data along with steps followed (reproducibility!)

Protocol followed to generate gRNA sequences

  1. Downloaded all genome sequences in the Caulimovrius genus from NCBI (includes CaMV). These sequences are in the file caulimovirus_sequence.fasta.
  2. That FASTA was uploaded to Guidance v2 with default parameters to generate a multiple sequence alignment.
  3. Masked multiple sequence alignment based on base pairs that had at least 0.93% identity across all Caulimovirus. Note that we also tested alignments in the entire Caulimoviridae family, but the sequence homology was so low that there were very few continuous conserved regions.
  4. Uploaded the masked sequences for CaMV to Benchling, which removed all gaps: read-only link.
  5. Ran Benchling CRISPR design, using parameters:
    • entire masked sequence set from selection
    • A. thaliania genome
    • Wild-Type Cas9 NGG PAM
    • 20 bp guide length
    • Genome region None

At this point there are 275 possible targets. Looking at the function of the various genes in the CaMV genome, we decided that we were only interested in targets within gp2,gp4,gp6 or gp7. We are not interested in gp1 because no function has been identified for it, not interested in gp3 because it related to interactions with Aphid vectors and finally not interested in gp5 because there are a number of isomorphisms that make it effortful to reliably target.

After removing PAMs that were partially or wholly masked in the conversation sequence (i.e. that contained 'N' values) and keeping only PAMs within the genes of interest, we were left with 196 possible gRNA targets.

  1. Kept sequences with efficiency score > 0.6 (calculated by Benchling based on Hsu et al.) specificity score > 0.98 (calculated by Benchling based on Doench et al.).
  2. Exported as primers and sanity-checked against positions in CaMV genome.

References

  1. J.G. Doench, E. Hartenian, D.B. Graham, Z. Tothova, M. Hegde, I. Smith et al. (2014). Rational design of highly active gRNAs for CRISPR-Cas9–mediated gene inactivation. Nature Biotechnology, 32, 1262–1267. doi:10.1038/nbt.3026
  2. C. Seeger and J.A. Sohn. (2014). Targeting Hepatitis B Virus With CRISPR/Cas9. Molecular Therapy- Nucleic Acids. 3, e216. doi: 10.1038/mtna.2014.68
  3. P.D. Hsu, D.A. Scott, J.A. Weinstein, F.A. Ran, S. Konermann, V. Agarwala, et al. (2014). DNA targeting specificity of RNA-guided Cas9 nucleases. Nature Biotechnology, 31, 827–832. doi:10.1038/nbt.2647
Clone this wiki locally