Week 2 HW: DNA Read, Write, & Edit

Part 1: Benchling & In-silico Gel Art

Part 3: DNA Design Challenge

3.1. Choose your protein.

I chose the PETase enzyme protein from the bacterium species Ideonella sakaiensis (strain 201-F6). I chose this protein as it was discovered to be important in plastic degradation. Its plastic degradation capabilities means that it allows bioremediation by reducing plastic pollution and promoting a circular economy.

FASTA sequence of PETase:

sp|A0A0K8P6T7|PETH_PISS1 Poly(ethylene terephthalate) hydrolase OS=Piscinibacter sakaiensis OX=1547922 GN=ISF6_4831 PE=1 SV=1 MNFPRASRLMQAAVLGGLMAVSAAATAQTNPYARGPNPTAASLEASAGPFTVRSFTVSRP SGYGAGTVYYPTNAGGTVGAIAIVPGYTARQSSIKWWGPRLASHGFVVITIDTNSTLDQP SSRSSQQMAALRQVASLNGTSSSPIYGKVDTARMGVMGWSMGGGGSLISAANNPSLKAAA PQAPWDSSTNFSSVTVPTLIFACENDSIAPVNSSALPIYDSMSRNAKQFLEINGGSHSCA NSGNSNQALIGKKGVAWMKRFMDNDTRYSTFACENPNSTRVSDFRTANCS

3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.

Reverse translation of PETase:

Reverse Translate Results for 290 residue sequence “Untitled” starting “MNFPRASRLM”

reverse translation of Untitled to a 870 base sequence of most likely codons. atgaactttccgcgcgcgagccgcctgatgcaggcggcggtgctgggcggcctgatggcg gtgagcgcggcggcgaccgcgcagaccaacccgtatgcgcgcggcccgaacccgaccgcg gcgagcctggaagcgagcgcgggcccgtttaccgtgcgcagctttaccgtgagccgcccg agcggctatggcgcgggcaccgtgtattatccgaccaacgcgggcggcaccgtgggcgcg attgcgattgtgccgggctataccgcgcgccagagcagcattaaatggtggggcccgcgc ctggcgagccatggctttgtggtgattaccattgataccaacagcaccctggatcagccg agcagccgcagcagccagcagatggcggcgctgcgccaggtggcgagcctgaacggcacc agcagcagcccgatttatggcaaagtggataccgcgcgcatgggcgtgatgggctggagc atgggcggcggcggcagcctgattagcgcggcgaacaacccgagcctgaaagcggcggcg ccgcaggcgccgtgggatagcagcaccaactttagcagcgtgaccgtgccgaccctgatt tttgcgtgcgaaaacgatagcattgcgccggtgaacagcagcgcgctgccgatttatgat agcatgagccgcaacgcgaaacagtttctggaaattaacggcggcagccatagctgcgcg aacagcggcaacagcaaccaggcgctgattggcaaaaaaggcgtggcgtggatgaaacgc tttatggataacgatacccgctatagcacctttgcgtgcgaaaacccgaacagcacccgc gtgagcgattttcgcaccgcgaactgcagc

3.3 Codon optimization

Codon optimization is important for speed and efficiency in producing the maximum amount of protein from the cell. Rare codons will not have sufficient tRNA anticodons to match causing the organelle ribosome, responsible for protein synthesis to stall.

If the ribosome stalls, this can lead to the formation of structurally abnormal proteins, losing their proper three dimensional shape, leading to dysfunction in protein activity.

I chose E-coli as its genome is well documented in literature, it grows very fast, and is very frequently used as a host for recombinant DNA technology. Moreover, DNA instructions can be changed to match the bacterial machinery of the E. coli bacteria, allowing researchers and scientists to produce PETase quicker and in higher amounts.

Codon optimised sequence for PETase:

ATGAACTTTCCACGTGCCTCCCGTCTGATGCAGGCAGCTGTGCTGGGTGGCCTGATGGCG
GTTAGCGCCGCAGCAACTGCTCAGACCAATCCGTACGCGCGTGGCCCGAACCCGACTGCC
GCGAGCCTGGAGGCCAGCGCAGGTCCGTTCACCGTACGCAGCTTTACCGTGAGCCGTCCG
AGCGGCTACGGCGCAGGTACCGTGTATTACCCGACCAACGCAGGCGGTACCGTAGGCGCA
ATTGCGATTGTGCCGGGCTATACCGCACGCCAGAGCTCCATCAAGTGGTGGGGCCCGCGT
CTGGCCAGCCACGGTTTCGTAGTGATCACCATCGATACCAACAGCACCCTGGATCAGCCG
AGCTCCCGTAGCTCCCAGCAGATGGCAGCACTGCGTCAGGTGGCATCCCTGAACGGTACC
AGCTCCAGCCCGATCTACGGCAAGGTAGATACCGCACGTATGGGCGTGATGGGTTGGAGC
ATGGGCGGTGGCGGCAGCCTGATCTCCGCAGCAAACAACCCGAGCCTGAAAGCAGCAGCA
CCGCAGGCACCGTGGGATAGCTCCACCAACTTCAGCTCCGTGACCGTGCCGACCCTGATC
TTCGCATGCGAAAACGATAGCATCGCACCGGTGAACAGCTCCGCACTGCCGATCTACGAT
AGCATGAGCCGTAACGCGAAACAGTTCCTGGAAATCAACGGTGGCTCCCACAGCTGCGCC
AACAGCGGCAACAGCAACCAGGCATTGATCGGCAAGAAAGGTGTGGCCTGGATGAAACGT
TTCATGGATAACGATACCCGCTACTCCACCTTCGCATGCGAAAACCCGAACAGCACCCGT
GTGAGCGATTTCCGCACCGCAAACTGCAGC

3.4. You have a sequence! Now what?

What technologies could be used to produce this protein from your DNA? Describe in your words the DNA sequence can be transcribed and translated into your protein. You may describe either cell-dependent or cell-free methods, or both.

Firstly, I would use a cell dependent protein production method like recombinant DNA technology, which would involve inserting optimised DNA into a plasmid using recombinases and ligases to cut and join DNA, the plasmid would be transformed into the E-coli culture, the bacteria will be grown and then we can use IPTG to induce recombinant protein expression in the cells. The cells will be lysed and we will isolate and purify the PETase protein (Schütz et al., 2023).

Another technology I could use is a non cell based protein production method which would involve growing cells in a high concentration, bursting them open using high pressure and then using centrifugation to isolate the protein synthesis machinery such as ribosomes and enzymes. Finally, we can add the optimized DNA to the isolated organelles supernatant, which results in PETase being produced.

Part 4: Prepare a Twist DNA Synthesis Order

https://benchling.com/s/seq-oY3lJJH8RajCqnHCK4RG?m=slm-GAs1mnavmWg4pBOHzVnp

.fasta file for TA review :)

PETase_2 TTTACGGCTAGCTCAGTCCTAGGTATAGTGCTAGCCATTAAAGAGGAGAAAGGTACCATGAACTTTCCGCGCGCGAGCC GCCTGATGCAGGCGGCGGTGCTGGGCGGCCTGATGGCGGTGAGCGCGGCGGCGACCGCGCAGACCAACCCGTATGCGCG CGGCCCGAACCCGACCGCGGCGAGCCTGGAAGCGAGCGCGGGCCCGTTTACCGTGCGCAGCTTTACCGTGAGCCGCCCG AGCGGCTATGGCGCGGGCACCGTGTATTATCCGACCAACGCGGGCGGCACCGTGGGCGCGATTGCGATTGTGCCGGGCT ATACCGCGCGCCAGAGCAGCATTAAATGGTGGGGCCCGCGCCTGGCGAGCCATGGCTTTGTGGTGATTACCATTGATAC CAACAGCACCCTGGATCAGCCGAGCAGCCGCAGCAGCCAGCAGATGGCGGCGCTGCGCCAGGTGGCGAGCCTGAACGGC ACCAGCAGCAGCCCGATTTATGGCAAAGTGGATACCGCGCGCATGGGCGTGATGGGCTGGAGCATGGGCGGCGGCGGCA GCCTGATTAGCGCGGCGAACAACCCGAGCCTGAAAGCGGCGGCGCCGCAGGCGCCGTGGGATAGCAGCACCAACTTTAG CAGCGTGACCGTGCCGACCCTGATTTTTGCGTGCGAAAACGATAGCATTGCGCCGGTGAACAGCAGCGCGCTGCCGATT TATGATAGCATGAGCCGCAACGCGAAACAGTTTCTGGAAATTAACGGCGGCAGCCATAGCTGCGCGAACAGCGGCAACA GCAACCAGGCGCTGATTGGCAAAAAAGGCGTGGCGTGGATGAAACGCTTTATGGATAACGATACCCGCTATAGCACCTT TGCGTGCGAAAACCCGAACAGCACCCGCGTGAGCGATTTTCGCACCGCGAACTGCAGCCATCACCATCACCATCATCAC TAACCAGGCATCAAATAAAACGAAAGGCTCAGTCGAAAGACTGGGCCTTTCGTTTTATCTGTTGTTTGTCGGTGAACGC TCTCTACTAGAGTCACACTGGCTCACCTTCGGGTGGGCCTTTCTGCGTTTATAATA

4.6. Choose Your Vector

https://benchling.com/s/seq-qZFBnXDT1X5Wyl0Mw1YJ?m=slm-FoxAu2Oeo4nJoQJoTZMP

Part 5: DNA Read/Write/Edit

5.1 DNA Read

(i) What DNA would you want to sequence (e.g., read) and why? This could be DNA related to human health (e.g. genes related to disease research), environmental monitoring (e.g., sewage waste water, biodiversity analysis), and beyond (e.g. DNA data storage, biobank).

I would want to sequence the primary surface protein gene (the spike protein which determines immunogenicity) of common virus, such as a localized variant of norovirus.

Why I would sequence it:

If I sequence the major capsid protein, VP1, which is encoded by the Open Reading Frame 2 (ORF2) gene from environmental samples (such as water bodies), we can find the virus in a city before the virus transmits. This acts as an early warning system.

Viruses mutate quickly therefore sequencing allows us to monitor virus mutation evolution over time and see if the virus is becoming more communicable or if it has developed resistance to antiviral medications.

For designing a vaccine or treatment, scientists require the genetic code of rapidly mutating viral surface protein which trigger the immune response. Sequencing will let us know which amino acids the virus uses for the construction of a protein to make a replica for triggering the immune response.

Sequencing can identify different variants; for example a normal strain from a mutated strain, therefore we detect changes quickly for early treatment of patients.

(ii) In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why?

Also answer the following questions:

Is your method first-, second- or third-generation or other? How so? What is your input? How do you prepare your input (e.g. fragmentation, adapter ligation, PCR)? List the essential steps. What are the essential steps of your chosen sequencing technology, how does it decode the bases of your DNA sample (base calling)? What is the output of your chosen sequencing technology?

Nanopore Sequencing, which is a third-generation method, would be the technology I would choose because it can sequence extremely long DNA fragments and as data is analyzed instantly as it is passed into the pore, giving actionable insights for mutations occurring in viruses and pathogen surveillance. It also does not require PCR to amplify copies of DNA rapidly as it can read simultaneous molecular signals (MacKenzie and Argyropoulos, 2023).

My input will be genomic DNA or RNA taken from a viral sample.

Preparation of input: DNA is fragmented into the lengths required, but Nanopore sequencing can handle long, unfragmented reads.

The ends of the DNA will be repaired to make them blunt.

Adapter ligation will allow adaptors on the ends of the DNA to pull the DNA into the nanopore.

A tether molecule is added to help DNA find the pore on the sensor.

Nanopore sequencing is different in that it uses electricity instead of chemicals or light to detect bases.

The DNA is passed into a nanopore embedded in an electrically resistant membrane, and a persistent electrical current is passed through the pore. As the DNA strand moves through the pore and its data analysed, the bases A, T, C, G block the opening in a different way.

Each base has a different shape and size, it causes a unique disruption in the electrical current. Deep Learning can look at these electrical shifts throughout time to determine the bases sequence.

The output is an electrical signal file, commonly in .fast5 or .pod5 format.

The processed output of the squiggles is a FASTQ file. It contains the base sequences (A, T, C, G) and it will explain how confident the machine was about each base it read (MacKenzie and Argyropoulos, 2023).

5.2 DNA Write

(i) What DNA would you want to synthesize (e.g., write) and why? These could be individual genes, clusters of genes or genetic circuits, whole genomes, and beyond. As described in class thus far, applications could range from therapeutics and drug discovery (e.g., mRNA vaccines and therapies) to novel biomaterials (e.g. structural proteins), to sensors (e.g., genetic circuits for sensing and responding to inflammation, environmental stimuli, etc.), to art (DNA origamis). If possible, include the specific genetic sequence(s) of what you would like to synthesize! You will have the opportunity to actually have Twist synthesize these DNA constructs! :)

I would like to synthesise the DNA for the Combinatorial Nanobody Display Library (VHH domains) as in drug discovery, antibodies have unparalleled specificity, high binding strength to their target, and can modulate disease-related proteins, which positions them as directed, efficacious therapies in cancer medicine, immunology and infectious diseases. Viruses have antigenic markers which are distinct, localised parts of a foreign molecule which are directly recognised and bound by the immune system, specifically by antibodies, B cells or T cells of the immune system. Antibodies are, however large in size and difficult to reach these epitopes to neutralise the viral particle/cancer cell. This is where nanobodies which are small in size, can be highly resistant to heat, extreme pH and protein cleaving enzymes, can travel to concealed epitopes on viruses or cancer cells that larger human antibodies cannot reach (Muyldermans, 2021).

(ii) What technology or technologies would you use to perform this DNA synthesis and why? Also answer the following questions:

What are the essential steps of your chosen sequencing methods? What are the limitations of your sequencing method (if any) in terms of speed, accuracy, scalability?

Answer: The technologies I plan to use are Trinucleotide-Directed Mutagenesis to produce the diversity of the antibody by creating mutations in the Complementarity-Determining Regions whilst avoiding Termination Codons and Frameshift variants.

We will use PCR to generate diversity in the CDR3 region, a very diverse and flexible region, which touches the virus epitope, by creating different random combinations to reshuffle the DNA, and to create lots of nanobody genes.

Phages will be utilised to display the DNA on the surface, therefore we have the disease/viral/infectious proteins which will be mixed with the phages and phages exhibiting specificity and affinity for that protein will remain.

We will then use Next Generation sequencing and artificial intelligence to look at the left over phages and find better versions iteratively.

Essential steps:

We will fragment the nanobody, breaking it into its constituent parts: CDR1, CDR2, CDR3 and the framework regions.
We will do randomisation of the CDR3 part, using PCR we will use primers to insert random DNA bases at antigen binding sites.
The DNA parts have overlapping ends, so we stitch fragments together in the correct way.
We use DNA polymerase enzyme for DNA Amplification, creating a complete, double-stranded nanobody gene.
The DNA sequence must be recognised for the organism it grows in, so codon optimisation for the host organism is important.
We create mutations in select regions where we don’t create stop codons or frameshift mutations (Webmaster, 2024)

The Inputs:

DNA template: VHH gene.

Primers: Short, bespoke DNA strands. Some will match the VHH gene scaffold, whilst others will contain random mutations.

Enzymes: DNA Polymerase for building DNA and Restriction Enzymes to cut the DNA for plasmid insertion.

Plasmid vectors: the plasmid vector will carry the new library into cells.

E. coli cells: will help to replicate the nanobody gene library.

Limitations of the Method
1. E coli transformation efficiency - how much VHH DNA can be transfected.
2. Bias of primers - we may get less diverse randomised DNA as the same bases can be chosen again and again.
3. PCR DNA amplification may introduce errors whilst DNA is being replicated.
4. Randomising DNA may introduce stop codons, therefore if they occur in the middle of the nanobody, the cell can stop halfway through, and we can end up with a useless fragment of a key.

5.3 DNA Edit

(i) What DNA would you want to edit and why?

In class, George shared a variety of ways to edit the genes and genomes of humans and other organisms. Such DNA editing technologies have profound implications for human health, development, and even human longevity and human augmentation. DNA editing is also already commonly leveraged for flora and fauna, for example in nature conservation efforts, (animal/plant restoration, de-extinction), or in agriculture (e.g. plant breeding, nitrogen fixation). What kinds of edits might you want to make to DNA (e.g., human genomes and beyond) and why?

I would choose to edit out the highly conserved long terminal repeats which act as a promoter and regulatory sequences for viral transcription and replication and surround HIV’s viral genome which integrates into the host T CD4 helper cells, which help the adaptive arm of the immune system to recognize and destroy infected or cancerous cells, control immune responses and provide long-lasting immunity (Hu & Hughes, 2012). HIV is a viral disease that affects millions of people in all countries, and antiretroviral therapy and the immune system cannot easily detect the virus as it remains dormant in the host T cells. Furthermore, antiretroviral resistance may also occur over time, and strict medication adherence for patients must be followed to prevent HIV viral particle replication cycles (Hu and Hughes, 2012).

(ii) What technology or technologies would you use to perform these DNA edits and why?

I would use the CRISPR-Cas 9 gene editing system to edit out the Long Terminal Repeats which are identical and remove HIV from the CD4 T cells. They will stop the dormant, latent virus reservoir from growing and we can use one sgRNA to target these highly conserved LTR regions (Hu & Hughes, 2012), (Wang et al., 2018).

Also answer the following questions:

How does your technology of choice edit DNA? What are the essential steps?

CRISPR-Cas9 works as follows:

1 - We can make sgRNAs with a complementary base pair match to the LTR target sequence (at the 5’ and 3’ LTR ends of the integrated HIV genome) 2 - The sgRNAs will direct the Cas9 nuclease, which binds with guide RNA (gRNA), which directs it to particular DNA sequences within the HIV-1 LTR, producing double-strand breaks which leads to step 3. 3 - We will introduce indel mutations which will trigger the cell’s repair mechanisms (Non homologous end joining) which will inactivate the virus and disrupt viral transcription at the promoter region. This in turn prevents HIV viral replication (Asmamaw & Zawdie, 2021).

What preparation do you need to do (e.g. design steps) and what is the input (e.g. DNA template, enzymes, plasmids, primers, guides, cells) for the editing?

Preparation and design.

Firstly, we must find a conserved LTR region across different HIV strains to target with our sgRNAs and prevent host self DNA targeting and cleavage.

Secondly, the LTR target sequence region can mutate rapidly in HIV, therefore we should use a database with known mutant sequences and design our sgRNA for different mutant strains.

Thirdly, we will perform off target screening, to prevent damage to host self DNA, we must make sure the guide sequence does not match with a human gene (particularly near the PAM, which is important for the Cas9 nuclease to cleave the target DNA and is found 3-4 nucleotides downstream from the cut site).

We have to look into the LTR region which is composed of three subsegments, and find important areas important for viral transcription and replication therefore survival. Previous research has shown the U3, R and U5 regions, NF-κB binding sites (an important transcription factor which regulates cell growth and plays a role in cell development), the TATA box (highly conserved region found in the promoter region, and is a binding site for the TATA-binding protein, initiating transcription) and TAR (used with Tar protein to start viral replication) may be important (Wang et al., 2018).

We then must look for a PAM region.

Hairpin structures may prevent Cas nuclease binding to the sgRNA, therefore we can use insilico tools like Mfold to check if our sgRNA folds on itself (Asmamaw & Zawdie, 2021).

Physical inputs including ordering/creating components such as:

A custom sgRNA oligos with a 20-nt sequence to a vendor who can create a RNA / DNA template for it.
We will need a plasmid when using a cell culture (pLentiCRISPR v2) to put the sequence into the scaffold which carries the Cas9 nuclease and sgRNA complex.
I will make a pair of Forward and Reverse primers which be positioned outside of the target site.
After editing, we will do PCR. The primers will be placed before the 5’ LTR and one after the 3’ LTR. PCR will amplify millions of copies of solely the DNA between those two primers.
We will use gel electrophoresis and if the DNA travels right to the top it means this will be our unedited HIV DNA control and if successful our new band for our edited genome will be positioned much lower down on the gel (Asmamaw & Zawdie, 2021).

Efficiency versus Precision

Efficiency: In a lab environment, results of CRISPR-Cas9 editing may be successful in cell culture, compared to the complexity of the living body, getting the Cas9 enzyme to cleave LTRs in the brain or bone marrow is still in its infancy due to its potential to induce off target cleavage and DNA damage leading to dangerous consequences.

Precision: Using an engineered Cas9 nuclease with an increased DNA targeting efficiency and minimal non-target toxicity whilst editing and which will only cut if the match is 100% perfect, significantly reduces host DNA damage risks in the human genome.

Since we have to target the 5’ and 3’ regions, it important to check that the guide sequence is present at both ends in order to excise the integrated HIV genome in the host CD4 T cells or other immune cells like macrophages (Hunt et al., 2023).

What are the limitations of your editing methods (if any) in terms of efficiency or precision?

Mutations may occur near the 3’ region in the seed region which is important for Cas9 nuclease recognition and binding to the target DNA, which could prevent Cas9/gRNA complex driven cleavage and suppression of viral replication.

References

Asmamaw, M., & Zawdie, B. (2021). Mechanism and applications of crispr/cas-9-mediated genome editing. Biologics : Targets & Therapy, 15, 353–361. https://doi.org/10.2147/BTT.S326422

Wang, G., Zhao, N., Berkhout, B., & Das, A. T. (2018). CRISPR-Cas based antiviral strategies against HIV-1. Virus Research, 244, 321–332. https://doi.org/10.1016/j.virusres.2017.07.020

Muyldermans, S. (2021). A guide to: Generation and design of nanobodies. The Febs Journal, 288(7), 2084–2102. https://doi.org/10.1111/febs.15515

Webmaster, I. (2024, November 18). Synthetic biology technologies for antibody discovery. Isogenica. https://isogenica.com/synthetic-biology-technologies-for-antibody-discovery/

MacKenzie, M., & Argyropoulos, C. (2023). An introduction to nanopore sequencing: Past, present, and future considerations. Micromachines, 14(2), 459. https://doi.org/10.3390/mi14020459

Hu, W.-S., & Hughes, S. H. (2012). Hiv-1 reverse transcription. Cold Spring Harbor Perspectives in Medicine, 2(10), a006882. https://doi.org/10.1101/cshperspect.a006882

Schütz, A., Bernhard, F., Berrow, N., Buyel, J. F., Ferreira-da-Silva, F., Haustraete, J., van den Heuvel, J., Hoffmann, J.-E., de Marco, A., Peleg, Y., Suppmann, S., Unger, T., Vanhoucke, M., Witt, S., & Remans, K. (2023). A concise guide to choosing suitable gene expression systems for recombinant protein production. STAR Protocols, 4(4), 102572. https://doi.org/10.1016/j.xpro.2023.102572

Hunt, J. M. T., Samson, C. A., Rand, A. du, & Sheppard, H. M. (2023). Unintended CRISPR-Cas9 editing outcomes: A review of the detection and prevalence of structural variants generated by gene-editing in human cells. Human Genetics, 142(6), 705–720. https://doi.org/10.1007/s00439-023-02561-1