CRISPR-Cas Basics

In this section, I will introduce the concepts underlying CRISPR-Cas systems, and explain how this technology was first developed. This background knowledge is a prerequisite for grasping the numerous applications and recent advancements in employing those systems. If you need only a light revision of CRISPR Basics, use the table of contents below to navigate all the way down to DNA Cleavage & CRISPR-Cas Summary.

Last update: Sep 2020

CRISPR Array

"CRISPR" stands for "clusters of regularly interspaced short palindromic repeats.". They were described as an interesting feature in the DNA sequence of Escherichia coli at the end of the 1980s. CRISPR array consists of nucleotide repeats and spacers (Fig.1).

In the middle of the picture, you can see the nucleotide sequences of the repeat unit in E.coli and H. mediterranei. Note that the sequences under the black arrows are complementary to each other. Therefore, the RNA transcript of the nucleotide repeat unit can bind to itself, creating short double-strand RNA (dsRNA) regions (bottom of Fig.1). The purpose of this structural characteristic will be explained in the next paragraphs.

Figure 1. CRISPR array structure

Spacers in the CRISPR array are taken from viruses that have attacked the bacteria before. They are a memory stamp, and it was later shown that bacteria use it to "remember", and later identify and excise viral genes that incorporated themselves into the bacterial genome. In that sense, spacers, and CRISPR arrays, are an important part of bacteria's adaptive immune system. Indeed, in 2007, Rodolphe Barrangou and his team have shown that after viral infection in Streptococcus thermophilus bacteria, new spacers were added to CRISPR array. What is more, the DNA sequences of those spacers were identical to some parts of the viral genome.

When viral infection occurs again, part of CRISPR array is transcribed into CRISPR RNA, crRNA for short. crRNA consists of the region derived from the spacer, and the region derived from the nucleotide repeat.

The spacer derived part of crRNA is used for the identification of viral sequence. The nucleotide repeat part of crRNA makes a short double-stranded RNA and is used to bind crRNA to Cas nuclease. Depending on the type of Cas nuclease, crRNA alone, or crRNA bound to trans ‐activating CRISPR RNA (tracrRNA), is making up the guide RNA (gRNA) used by Cas. Cas nuclease will use the spacer derived part of gRNA to locate the complimentary viral DNA in bacteria's genome sequence, and excise the foreign sequence to fight the infection .

Cas Nucleases in Genome Editing

Cas nucleases work in conjunction with CRISPR array transcripts to make double strand breaks (DSBs) in DNA. Thus, the CRISPR-Cas system can be used in genome editing. The key points in genome editing are the following:

1. The current genome editing technology takes advantage of the intrinsic cellular genomic DNA repair mechanisms, usually homology‐directed repair (HDR) or end‐joining pathways.

2. DSBs greatly increase the probability of incorporating a DNA sequence at or near the break site.

3. From 1. and 2., it follows that targeted nucleases, which are proteins cutting the DNA in specific regions, can direct DNA repair mechanism to the target regions in order to perform genome editing.

As shown in Fig.2 below, we can use sequence-specific DNA nucleases designed to detect and cut double-stranded DNA (dsDNA) at a certain location. The examples of such nucleases are meganucleases, zinc‐finger nucleases, TAL effector nucleases, and Cas9 nucleases. Cas9 is one of several nucleases that can be used in the aforementioned CRISPR-Cas system.

The DSB made by the nucleases is fixed using one of the repair mechanisms of the cell. Non-homologous end joining (NHEJ) and Microhomology-mediated end joining (MMEJ) often result in insert/delete (indel) mutations at the DSB site. Another option is to design a repair template that has regions of homology to the two sides of the DNA break in order to prompt the HDR pathway to produce a sequence change or insertion at the target site.

While TALENs, ZFNs, and meganucleases need to be redesigned for each target DNA sequence, Cas9 nuclease relies on the gRNA template to find and cut the target sequence. Therefore, only the gRNA, and not the Cas nuclease, needs to be re-engineered to target another sequence. This characteristic proved to be an extremely useful and time-saving feature, which ultimately made the CRISPR-Cas system a superior tool in gene editing, allowing for its wide use in research.

Figure 2. Genome editing using targeted nucleases

Cas proteins and Cas nucleases

Upon further investigation of CRISPR-Cas systems, it turned out that they are characterized by extraordinarily high diversity, and Cas9 is just one of many nucleases that can be used in this system. There are many types of Cas proteins and within those several types of Cas nucleases. The structure and mode of actions of the same Cas protein, e.g., Cas9 nuclease, differs between different species of bacteria and archaea. Therefore, the exact characteristics of a particular Cas protein depend on which species we isolate this protein from. To date, the thousands of CRISPR-Cas systems have been systemized into 6 different types depending on the Cas protein, sequence similarity, and organization of Cas genes.

Notably, Cas proteins are named confusingly. Cas protein is not necessarily an enzyme capable of making DSBs. For example, Cas1 is an enzyme involved in the adaptation phase of CRISPR‐Cas‐mediated immunity. On the other hand, the effector nucleases that cut the foreign DNA are Cas9, Cas12, Cas3, Cas8, Cas10, and Cas13. It is important to make the distinction between Cas protein and Cas nuclease. Not all Cas proteins are Cas nucleases, and only Cas nucleases are used in CRISPR-Cas systems to make DSBs. CRISPR-Cas9 and CRISPR-Cas12a are especially popular systems, so for now this website will focus mostly on those two. You can explore CRISPR systems employing other types of Cas nucleases in the last section of this topic (Alternative CRISPR-Cas Systems).

Cas9 and Cas12a owe their popularity of several factors. CRISPR-Cas systems have various structures, but generally, systems using a single protein, rather than a complex, to target and incise dsDNA, are easier to handle and modify for use in other organisms besides bacteria. CRISPR-Cas9 and CRISPR-Cas12a are such single protein systems.

Cas9 belongs to CRISPR-Cas Type II, and Cas12a belongs to Type V. It is also possible to use Type I multicomponent systems (Fig.3) for more sophisticated experiments, but this has not gained immense popularity yet. Type II systems such as CRISPR-Cas9 use a dual guide RNA made up of crRNA and tracrRNA. crRNA is required for identifying target sequence, and tracrRNA is needed for maturation of precursor crRNA and interference with invading sequences. crRNA and tracrRNA can be fused into a single gRNA, as shown by the dashed line here in Fig.4 (bottom left). Type V systems such as CRISPR-Cas12a use short single gRNA made up of crRNA alone.

Figure 3. Types of CRISPR-Cas Systems

In the case of both Cas9 and Cas12a, the gRNA contains a region that is specific to the Cas nuclease. This characteristic marks a distinction between Cas9 and Cas12a, despite them being similar in function and mechanism of action. Different gRNAs used by CRISPR-Cas9 and Cas12b, and their binding to the target sequence, are shown in Fig.4. When gRNA binds to Cas nuclease, the complex is ready to begin the search for the sequence complementary to the target-specific part of gRNA. Note the orange region called PAM sequence, which will be discussed in the next section. The genomic DNA strand at the bottom, which is shown binding to gRNA, is called the target strand, or the non-PAM strand. The strand that has been shifted up by the gRNA-Cas complex is called the non-target strand, PAM strand, or the displaced strand.

Figure 4. gRNAs used by Cas9 and Cas12a

PAM sequence and DNA surveillance

PAM stands for the protospacer adjacent motif. It is a short, signature sequence consisting of several base pairs. The sequence itself varies between CRISPR-Cas systems and organisms. PAM is located adjacent to the target DNA sequence that is to be incised by the CRISPR-Cas system.

To understand why PAMs exist, it is crucial to remember the origins of CRISPR-Cas technology: bacteria's immune system. The bacteria have the parts of the genome identical to the virus that previously attacked it incorporated into its DNA as a memory stamp to identify and fight the virus during future infections. When the virus attacks again, the DNA region is transcribed, and gRNA is loaded into Cas nuclease that starts to scan the bacteria's DNA and excise the viral sequences. However, when Cas nuclease scans the CRISPR array (in bacteria's genome), it obviously encounters a sequence complementary to the gRNA it is bound to. In this case, the nuclease could make a DSB, thus damaging the bacteria's DNA and removing its immune memory. Therefore, bacteria's CRISPR-Cas systems need to be able to differentiate between self and non-self DNA. PAM sequences are a way to achieve this. PAMs are only found in virus-derived DNA, not on the bacterial genomic DNA inside of the CRISPR array. The Cas nucleases target PAMs, and DSBs are only made on the target sequences adjacent to PAMs.

Notably, the PAM sequence is located on the non-target strand. It is also not complimentary to gRNA. The PAM sequence for SpCas9 is 5′‐NGG‐3′ (downstream of the target sequence), and for Asp and LbaCas12a, it is TTTV (upstream of the target sequence). The letters in front of Cas indicate different organisms from which the nuclease originates. When deciding on which Cas nuclease variant to use in a genome editing experiment, it is essential to investigate the nucleotide sequences flanking the gene/sequence of interest. For example, if the target sequence is followed by a TGG or CGG sequence, SpCas9 can be used.

When Cas nuclease interacts with gRNA, the protein changes its conformation. As shown in Fig.5 (gRNA binding/PAM binding), the target-specific part of gRNA is positioned such that it can interact with the DNA strands that the complex is scanning. Cas nucleases "surveil" genomic DNA, looking for a short PAM sequence shown in orange. When Cas nuclease binds to a PAM sequence, DNA double-strand is separated into single strands, and gRNA is matched to the strand opposite the PAM sequence. If the sequences are complementary enough, the RNA‐DNA hybrid called the R‐Loop is extended, and Cas nuclease cleaves the DNA at the sites indicated by the black triangles.

Figure 5. CRISPR-Cas9 and CRISPR-Cas12a modes of action

DNA cleavage & CRISPR-Cas Summary

Cas nucleases have specific domains that are used for making DSBs. Cas9 has two separate nuclease domains. RuvC domain is used to cut the non-target strand where the PAM sequence is located, while HNH cleaves the strand complementary to gRNA (target strand). In the case of SpCas9, the double-strand break is produced 3bp away from NGG PAM (Fig.6.A, red triangles), evenly on both strands, resulting in blunt‐ended DNA fragments.

Cas12a, unlike Cas9, has a single split RuvC domain, which makes DSBs, producing staggered, or "sticky" ends. It cleaves around 18 bp from the PAM on the non-target strand and 24 bp away on the target strand.

After Cas nuclease makes a DSB, cellular repair machinery is activated. When there is no repair template with sequences homologous to those around DSB, the break is repaired by non‐homologous end‐joining (NHEJ) or microhomology‐mediated end joining (MMEJ), which are pathways prone to errors. They often result in small indels, disrupting the gene regulatory elements, or, if the indels are not multiples of 3, can also shift the whole open reading frame, resulting in a different, probably faulty protein altogether. This effectively results in gene knockout. Using a pair of Cas nucleases with spaced targets, it is also possible to induce large deletions or genomic rearrangements, such as inversions or translocations.

Figure 6. Overview of CRISPR-Cas9 and CRISPR-Cas12a

However, if we provide a homologous repair template, more precise DSB repair becomes possible. We can design an artificial template containing the sequence we want to insert, flanked by sequences homologous to the sides of the target locus. HDR pathway is going to use this template for repair, introducing the desired genetic edit. HDR can be used to insert long sequences too. Examples of use could be inserting GFP or another reporter gene near a regulatory element, or insertion of an entire genetic circuit.

Overall, NHEJ is useful for knocking out existing genes, and HDR can be used for precise insertions or replacement of DNA. One thing to note here is that HDR activity in eukaryotic cells depends on the position in the cell cycle and is most active during S-phase.

Sources

Figure 1. Ishino, Y., Krupovic, M., & Forterre, P. (2018). History of CRISPR-Cas from Encounter with a Mysterious Repeated Sequence to Genome Editing Technology. Journal of bacteriology, 200(7), e00580-17. https://doi.org/10.1128/JB.00580-17

Figure 2-5. G. Brett Robb. 2019. Genome Editing with CRISPR‐Cas: An Overview. Current Protocols Vol. 19, Issue 1, https://doi.org/10.1002/cpet.36

Figure 6. Adiego-Pérez, Belén & Randazzo, Paola & Daran, Jean-Marc & Verwaal, René & Roubos, Johannes & Daran-Lapujade, Pascale & Oost, John. (2019). Multiplex genome editing of microorganisms using CRISPR-Cas. FEMS microbiology letters. 366. 10.1093/femsle/fnz086.