CRISPR gRNA Target Design

Much of the excitement around CRISPR systems has centred around their potential for genome editing. However, the use of CRISPR immune systems in eukaryotes is also being explored. A recent study by Seeger & Sohn (2014) targeted Hepatitis B virus in a human cell line. They found that most gRNA targets were able to reduce the number of HBV-positive cells five-fold or more, but mechanism of immunity not entirely clear.

For most gRNA designs, the Cas9 breaks induced short insertions and deletions, likely caused by error-prone Non-Homologous End Joining of the cleaved ends, and the number of indels correlated with the reduction in virulence. One deisng, however, left 55% of cells wild-type while achieving similar reductions. The dynamics of repair and deletion after double-stranded cleavage by Cas9 is a major focus of the mathematical modelling this year.

However, we didn't have time to finish those models before ordering our gRNA constructs, so we decided to design targets for several possible situations.

I. Targeting P1, P3, P5, P6

We chose to target these genes because they have well-established functions related to the dynamics of CaMV inside the plant cell. P2 is not targeted because it is related to interactions with the aphid vector and P4 has a number of isomorphisms that make it somewhat impractical for targeting. You can see more about the functions of each of these genes on the Cauliflower Mosaic Virus wiki page.

II. Multiple targets on P6

P6 trans-activates the 35s mRNA and is responsible for many of the viral defences against the plant cell, making it the obvious choice if only a single gene is targeted. By placing a number of targets on P6, we hope we might disrupt its activity more immediately and induce large deletions when multiple cuts occur at one.

III. Target non-coding locations around promoter

This design lets us examine whether large deletions may be created by flanking a region with Cas9 sites and whether double-stranded breaks alone are descrutive. If CRISPR immunity is mainly conferred by frameshift mutations or small deletions caused by NHEJ, we wouldn't expect this design to offer much reduction in virulence. However, if CRISPR immunity is mainly conferred by degradation of cleaved DNA or large deletions between neighbouring cut sites, we would still expect a significant reduction in virulence. If we see a significant reduction in virulence with this design, we'll have to examine the DNA structure to see if there are large deletions.

IV. Reduce mRNA transciption using dCas9

Perhaps NHEJ is insufficiently error-prone to eliminate the viruses at the pace we need or that non-destructive mutations introduced by NHEJ will change the DNA sequence enough to prevent prevent Cas9 from recognizing the site for a second attempt . In these cases, suppression of the 35S or 19S promoter using dCas9 might do more to prevent virulence that cleavage using Cas9.

gRNA Selection Criteria

To select the gRNA targets, we ran the CaMV genomic sequence through the Benchling CRISPR design tool, identifying 682 possible targets in the CaMV with NGG pam sites. To narrow down the list of targets, we considered four factors: genome position, conservation, off-targeting and efficiency.

Genome Position

The goal with many of the designs is to target a specific gene or genes and induce frameshift mutations. Previous research (Doench et al. 2014) suggests that any frameshifts within the first 50% of a gene should be deleterious enough to prevent it from producing a functional protein. For design I we used a 40% cutoff after looking at the functional regions of the CaMV transcripts. The targets for Designs II-IV were subset by location (correct protein for II and IV, non-coding for III) but didn't have to all be near the beginning of the gene.

Conservation

We want to target areas of the genome that are functionally important and assumed this would correspond well to area that are conserved (i.e. unchanging) between CaMV and closely-related viruses. Targeting conserved areas should also make our gRNA targets more robust, since those areas are unlikely to contain many mutations or isomorphisms among different strains of CaMV.

Off-Targeting

Since we want to provide extra protection for Arabidopsis plants against invading viruses, a major design priority was to avoid cutting apart the Arabidopsis genome by accident. All the gRNA candidates were given off-target scores according to the algorithm employed by the MIT CRISPR design tool (though as implemented by Benchling). Screenings by Hsu et al. (2013) showed that each position in the gRNA contributes differently to binding efficiency and this is accounted for in the off-target score.

Many regions of the Arabidopsis genome are non-coding, so we also ran a BLAST result of all the off-target matches identified by Benchling. For example, we considered an off-target match in a tDNA insertion site or a putative protein to be less important than one related to a known mRNA or protein.

Efficiency

Benchling, the tool used to find candidate gRNA targets, also includes an efficiency score for how well the gRNA is expected to knock out the target. This score comes from Doench et al. (2014). Doench described the algorithm as follows on the Addgene blog:

We examined sequence features that enhance on-target activity of sgRNAs by creating all possible sgRNAs for a panel of genes and assessing, by flow cytometry, which sequences led to complete protein knockout. By examining the nucleotide features of the most-active sgRNAs from a set of 1,841 sgRNAs, we derived scoring rules.

Non-overlapping

In several designs we were interested in using multiple gRNAs to target the same area (for example, for design II we chose four targets within P6). However, we didn't want to have any overlaps in our target sequence, since that would make our system less robust in the face of random mutations in a particular CaMV genome and would potentially cause interference between (d)Cas9 proteins vying to bind the same region. For this reason, only the single best gRNA in a region with many PAM sites could be chosen.

Protocol

The specific steps we followed to generate our gRNA sequences were as follows:

Downloaded all genome sequences in the Caulimovrius genus from NCBI (includes CaMV). These sequences are in the file caulimovirus_sequence.fasta.
That FASTA was uploaded to Guidance v2 with default parameters to generate a multiple sequence alignment.
Masked multiple sequence alignment based on base pairs that had at least 0.93% identity across all Caulimovirus. Note that we also tested alignments in the entire Caulimoviridae family, but the sequence homology was so low that there were very few continuous conserved regions.
Copied the masked sequence for CaMV to Benchling, which removed all gaps (represented as dashes) automatically. The masked sequence can been seen at this read-only link.
We then created a CRISPR design on both the masked genome sequence from Guidance and unmasked CaMV genome, which was imported into Benchling using its NCBI Accession Number (NC_001497). The CRISPR design was run using the following parameters:
- Start: 0, End: 8024 (entire genome)
- PAM: NGG
- Guide length: 20 nt
- Design type: single guide
- Genome for off-targets: TAIR10 (A. thaliania)
- Genome region to exclude for off-targets: none
Benchling identified 682 targets in the unmasked genome and *275 in the masked genome. These targets were exported to Excel using Benchling's export tool. Some gRNAs identified in the masked genome contained 'N' nucleotides (i.e. were flanked by a PAM site but were partially masked) so there were only 235 true gRNA candidates in the masked genome. Note that Benchling has updated their algorithms since our design was carried out so that such partly-masked candidates are not assigned scores.

The gRNA candidates from Benchling may be found in this excel file on our github. The gRNA found in the masked genome are in one sheet and the gRNA in the unmasked genome in another. Each off-target score in the Benchling interface is a link, which brings up the matches found:

Benchling screenshot showing matches that lead to an off-target score.

For every gene accession number provided, we used NCBI Nucleotide to see which off-targets were most important since, as we mentioned before, off-targets in e.g. tDNA insertion sites are less concerning than off-targets in genes. We didn't want to go through this searching for all 682 gRNA, so we subset them based on the criteria of Designs I-IV before looking up the off-target scores:

Design I: To induce early frameshift mutations, the masked candidate gRNA targets were subset to start coordinates within the first 40% of the genes, i.e. nt 364-757 (P1), nt 1830-1985 (P2), nt 3633-4440 (P5), and nt 5756-6380 (P6). No targets in this range we found for P3 in the conservation-masked genome, so gRNA targets from the unmasked sequence were selected. Five top candidates were identified for each gene based on their position and (mostly) their off-target score. One gRNA for each gene was then selected based on BLAST results.
Design II: Four sets of 5 gRNA candidates were chosen for further inspection: the 5 P6 gRNA from Design I, gRNA near the beginning of P6 found in the unmasked genome, gRNA in the middle of P6 and gRNA near the end. One from each set was selected based on BLAST results.
Design III: The noncoding region stretches from nt 7339-7366. Four gRNA near the beginning and five gRNA near the end of the region were found in the unmasked genome (none were found in that region in the masked genome, which is perhaps unsurprising as a non-coding region is less likely to be conserved). Two gRNA were chosen near the beginning and two near the end based on BLAST results.
Design IV: For this design, we wanted to stop expression of the two mRNA transcripts. However, we were using the CaMV 35S promoter elsewhere in our design, so we couldn't target it directly. Instead, we looked at the first 100 nucleotides of each transcript (i.e. as close to the promoter as possible) and chose a few candidates to BLAST, eventually choosing one gRNA target for each transcript.

The top sets of candidates for each design can be found in this excel file on our github and the chosen gRNA are highlighted in green.

References

J.G. Doench, E. Hartenian, D.B. Graham, Z. Tothova, M. Hegde, I. Smith et al. (2014). Rational design of highly active gRNAs for CRISPR-Cas9–mediated gene inactivation. Nature Biotechnology, 32, 1262–1267. doi:10.1038/nbt.3026
C. Seeger and J.A. Sohn. (2014). Targeting Hepatitis B Virus With CRISPR/Cas9. Molecular Therapy- Nucleic Acids. 3, e216. doi: 10.1038/mtna.2014.68
P.D. Hsu, D.A. Scott, J.A. Weinstein, F.A. Ran, S. Konermann, V. Agarwala, et al. (2014). DNA targeting specificity of RNA-guided Cas9 nucleases. Nature Biotechnology, 31, 827–832. doi:10.1038/nbt.2647

Home

Project
Description
Results
Design
Requirements

Lab & Design Documentation
Measurement Interlab Study
sgRNA Swap
Target Plasmid Construction
Gibson Assembly of pCAMBIA

Math Modelling
Cauliflower Mosaic Virus (CaMV)
CaMV Spread within Arabidopsis CRISPR/Cas9 Targeting
Modelling Viral Assembly
Modelling Viral Spread

Bioinformatics/Coding
Coding Guide
Designing sgRNA Targets for CaMV Immunity
PyMOL/PyRosetta for Windows
PyMOL/PyRosetta for Linux
PyMOL/PyRosetta for Mac
Modelling Resources
Modelling Cas9 in PyRosetta
Building PyRosetta from Source PyRosetta Fold Tree ABM Software Comparison

Policy & Practices
Survey Information [Local Agriculture Outreach and Acquiring Virus Testing Facility]([Local Agriculture Outreach and Acquiring Virus Testing Facility](Local Agriculture Outreach and Acquiring Virus Testing Facility))

Teamwide Documentation
Q & A
What does this paper mean?
Outreach
Collaboration
Sponsors

Provide feedback

Saved searches

Use saved searches to filter your results more quickly