Homework

Weekly homework submissions:

  • Week 1 HW: Principles and Practices

    First, describe a biological engineering application or tool you want to develop and why. This could be inspired by an idea for your HTGAA class project and/or something for which you are already doing in your research, or something you are just curious about.

  • Week 2 HW: DNA Read, Write, and Edit

    Part 0: Designing your Gel Art Step 1 Step 2 Step 3 Step 4 ECORI HindIII

  • Week 3 HW: Lab Automation

    Your task this week is to Create a Python file to run on an Opentrons liquid handling robot. Review this week’s recitation and this week’s lab for details on the Opentrons and programming it. Generate an artistic design using the GUI at opentrons-art.rcdonovan.com. Using the coordinates from the GUI, follow the instructions in the HTGAA26 Opentrons Colab to write your own Python script which draws your design using the Opentrons. You may use AI assistance for this coding — Google Gemini is integrated into Colab (see the stylized star bottom center); it will do a good job writing functional Python, while you probably need to take charge of the art concept. If you’re a proficient programmer and you’d rather code something mathematical or algorithmic instead of using your GUI coordinates, you may do that instead.

Subsections of Homework

Week 1 HW: Principles and Practices

cover image cover image

First, describe a biological engineering application or tool you want to develop and why. This could be inspired by an idea for your HTGAA class project and/or something for which you are already doing in your research, or something you are just curious about.

I want to develop a biological engineering tool that combines biology, artificial intelligece, and space science to understand how life survives in extreme environments and to use that knowledge to both protect life on Earth and prepare for life beyond it.

My long-term vision is to build AI-system that can analyze extremophile microorganisms and predict how they adapt to harsh conditions like radiation, high salinity, drought, and low nutrients conditions similar to Mars and other extraterrestrial environments.

This tool would integrate, • Genetic data • Morphological traits • Environmental stress factors • Machine learning models to identify survival patterns and adaptive mechanisms.

The purpose is: • Earth protection By understanding how extremophiles survive, we can discover new biological compounds and mechanisms that help with bioremediation, climate resilience, sustainable agriculture, and medicine.

Space Exploration The same data can guide astrobiology research, helping scientists predict whether life could survive on Mars, Europa, Tital, or exoplanets, and how humans might one day safely live.

This idea is inspired by my ongoing research in astrobiology, molecular biology and environmental biotechnology, and by my passion for NASA’s mission of exploring the unknown. My goal is not just to study life, but to use science and AI to protect it on Earth and Beyond.

Next, describe one or more governance/policy goals related to ensuring that this application or tool contributes to an “ethical” future, like ensuring non-malfeasance (preventing harm). Break big goals down into two or more specific sub-goals. Below is one example framework (developed in the context of synthetic genomics) you can choose to use or adapt, or you can develop your own. The example was developed to consider policy goals of ensuring safety and security, alongside other goals, like promoting constructive uses, but you could propose other goals for example, those relating to equity or autonomy.

• This tool combines biological data, artificial intelligence, and space research. It must be governed by clear ethical principles to ensure it is used to protect life, not exploit or harm it. The main governance goal is to ensure non-malfeasance, safety, equity, and responsible scientific use, while preventing misuse in harmful or unethical ways.

Ensure safety, security, and non-malfeasance This tool mush never be used to design harmful organisms, support bioweapons, or damage ecosystems.

  1. Access control & user authentication Only verified researchers, educators, and institutions should be allowed to upload genetic or environmental data. Public users can view educational outputs but cannot manipulate sensitive biological models.

  2. Misuse detection & content filtering AI models should automatically block outputs that could suggest harmful genetic modifications, dangeoirus pathogens, or unethical biological experiments.

  3. Ethical review integration Any high-risk project using the system must require approval from an institutional ethics committee (IRB or biosafety board) before analysis is allowed.

Promote constructive and peaceful use The tool must support science for life, sustainability, and space exploration, not military or exploitative purposes.

  1. Application restrictions The system will prohibit use for military bioengineering, weaponization, or ecological manipulation.

  2. Transparent purpose declarations All users must declare the purpose of their research, which is logged and reviewd to ensure alignment with peaceful, scientific, and humanitarian goals.

Protect biodiversity, locan communities, and indigenous environments

  1. Benefit sharing policies Any discoveries from local microbial data must credit and benefit the region of origin through shared publications, funding, or conservation programs.

  2. Environmental consent protocols Sample collection and data use must follow national biodiversity laws and local permissions.

Equity, transparency, and responsible AI

  1. Open access educational version Students and researchers from developing countries should have free access to learning and visualization features.

  2. Explainable AI models Predictions should be transparent and interpretable, so users understand why the system gives certain results.

Long term planetary responsibility

  1. Planetary protection All space research outputs must follow COSPAR and planetary protection standards to avoid contaminating other worlds.

  2. Dual use Regular audits must be performed to ensure the tool is not being repurposed for harmful dual use research.

Next, describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”). Try to outline a mix of actions (e.g. a new requirement/rule, incentive, or technical strategy) pursued by different “actors” (e.g. academic researchers, companies, federal regulators, law enforcement, etc). Draw upon your existing knowledge and a little additional digging, and feel free to use analogies to other domains (e.g. 3D printing, drones, financial systems, etc.). Purpose: What is done now and what changes are you proposing? Design: What is needed to make it “work”? (including the actor(s) involved - who must opt-in, fund, approve, or implement, etc) Assumptions: What could you have wrong (incorrect assumptions, uncertainties)? Risks of Failure & “Success”: How might this fail, including any unintended consequences of the “success” of your proposed actions?

  1. Action - 1: Mandatory ethical access licensing system purpose Most bioinformatics tools are open access or lightly regulated, meaning sensitive genetic data and AI models can be used by anyone, including harmful purposes.

Therefore, I propose that create a tiered licensing system for users, similar to drone pilot licensing or controlled chemical access, where only approved users can access high risk features.

Design Start from universities, biotech companies, space agencies, and government regulators. Users must complete bioethics & biosafety training. Institutions verify users and issue digital licenses. Platform enforeces role based permisisons and logs all activity. Funded through research grants and institutional subscriptions.

Assumptions Institutions will cooperate and share responsibility. Users will not try to bypass the system. Ethical training meaningfully changes behavior.

Risk of failure and success Failure - Black market versions of the tool could emerge. Success risk - Over regulation could slow down innovation or exclude researchers from low income regions.

  1. Action - 2: Automated dual use risk monitoring Purpose Dual use risks are often discovered only after harm occurs.

Therefore, I propose, embed an AI-monitoring system, like financial fraud detection or airport security screening, that flags suspicious biological queries or outputs.

Design Start from platform developers, cybersecurity teams, independent ethics boards. The AI scans queries for risky patterns. High risk activity triggers human review. Regular third-party audits.

Assumptions AI can accurately distinguish legitimate from harmful research. Researchers accept monitoring as a safety feature.

Risk of failure & success Failure - False positivies could block real research. Success risk - Surveillance concerns or misuse of logs by authorities.

  1. Action - 3: Planetary protection & diversity compilance gateway Purpose Planetry protection and biodiversity laws exist but are not built into digital research tools.

Therefore, I propose to integrate legal and ethical compliance checks into the platform, similar to export control systems or medical ethics approvals.

Design It can be environmental agencies, space agencies (NASA, COSPAR), universities, and government Users must declare sample origin, purpose, and destination of data. System blocks projects that violate conservation or planetary protection rules. Requires global policy alignment.

Assumptions Countries will share standards. Researches will truthfully declare intentions.

Risks of failure and success Failure - Users may falsify data. Success risk - Strict rules may discourage open science.

Next, score (from 1-3 with, 1 as the best, or n/a) each of your governance actions against your rubric of policy goals. The following is one framework but feel free to make your own:

Does the option:Option 1Option 2Option 3
Enhance Biosecurity
• By preventing incidents112
• By helping respond212
Foster Lab Safety
• By preventing incident122
• By helping respond221
Protect the environment
• By preventing incidents221
• By helping respond221
Other considerations
• Minimizing costs and burdens to stakeholders322
• Feasibility?223
• Not impede research223
• Promote constructive applications112

Option 1 - Ethical Licensing Option 2 - AI Option 3 - Compliance Gateway

Last, drawing upon this scoring, describe which governance option, or combination of options, you would prioritize, and why. Outline any trade-offs you considered as well as assumptions and uncertainties. For this, you can choose one or more relevant audiences for your recommendation, which could range from the very local (e.g. to MIT leadership or Cambridge Mayoral Office) to the national (e.g. to President Biden or the head of a Federal Agency) to the international (e.g. to the United Nations Office of the Secretary-General, or the leadership of a multinational firm or industry consortia). These could also be one of the “actor” groups in your matrix.

Audience International research agencies, space agencies, and global science

Recommended strategy - A hybrid governance model Based on the scoring matrix, I would prioritize a combination of Option 1 (Ethical access licensing) and option 2 (AI), with option 3 (Complinace gateway) implemented gradually as an international standard.

This hybrid model provides the stringest balance between biosecurity, lab safety, environmental protection, and scientific freedom.

Why this combination? Ethical access licensing This option scored best for preventing biosecurity and lab safety incidents. It ensures that only trained, verified users can access powerful features.

Prevention is more ethical and cost-effective than response. This system creates a culture of responsibility before harm can occur.

AI Use This option scored highest for helping respond to threats and promoting constructive use. It acts like a biosafety firewall that adapts as new risks emerge.

Even well-trained users can make mistakes. This system provides continuous protection without requiring constant human oversight.

Compliance Gateway Although it scored lower for feasibility, it is essential for planetary protection and biodiversity ethics. Ith should be phased in through international agreements.

IT requires legal alignment across nations and may slow innovation if enforced too early.

Homework Questions from Professor Jacobson

1. Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?

Error Rate of DNA Polymerase

In biological DNA replication, the primary enzymatic machinery is an error-correcting DNA polymerase, which operates via template-dependent 5’-3’ primer extension, supplemented by 5’-3’ error-correcting exonuclease and 3’-5’ proofreading exonuclease activities. This system achieves an error rate of approximately 1:106 (one error per 106 nucleotides incorporated). This fidelity is attained at a throughput of 10 milliseconds per base addition, in stark contrast to chemical synthesis methods, which exhibit an error rate of 1:10^2 and lack inherent correction mechanisms.

Comparison to the Human Genome Length

The human genome is quantified in the slides as approximately 3.2 gigabase pairs (Gbp), equivalent to 3.2 × 109 base pairs. Applying the polymerase error rate of 1:106, a single replication cycle would theoretically introduce circa 3.2 × 103 errors (3.2 × 109 / 106). This disparity highlights a critical vulnerability: uncorrected errors at this scale could precipitate deleterious mutations, oncogenic transformations, or cellular inviability. The slides underscore biology’s adaptive advantage through a throughput-error rate product differential of ~108 relative to chemical approaches, facilitating the replication of extensive genomes with minimal disruption.

Biological Mechanisms for Error Mitigation

To reconcile this discrepancy, biological systems deploy multifaceted error-correction strategies, reducing the effective error rate to ~1:10^9 or lower in vivo.

Mechanisms include:

  1. Intra-synthetic Proofreading The 3’-5’ proofreading exonuclease excises mismatched nucleotides concurrently with polymerization.

  2. Post-incorporation Repair The 5’-3’ exonuclease activity enables excision and resynthesis of erroneous segments.

  3. Ancillary Repair Pathways Mismatch repair systems, such as the MutS complex (Lamers et al., 2000, as cited in the error correction section), perform post-replicative surveillance and rectification.

These processes render biological synthesis inherently “error-correcting,” in opposition to the open-loop paradigms of chemical methods (e.g., phosphoramidite cycles). Consequently, organisms can faithfully replicate large genomes, such as the human 3.2 Gbp, sustaining cellular and evolutionary viability where raw error rates would otherwise prove untenable.

2. How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?

Number of Synonymous DNA Encodings for an Average Human Protein

The genetic code, characterized in the lecture slides as “Life’s Operating System,” consists of 64 codons specifying 20 amino acids and 3 termination signals, with degeneracy enabling multiple nucleotide triplets to encode identical amino acids. The slides indicate that an average human protein comprises 1,036 base pairs (bp), corresponding to approximately 345 codons (1,036 / 3 ≈ 345, excluding the stop codon).

The cardinality of synonymous DNA sequences for such a protein is contingent upon the amino acid composition, with individual residues encoded by 1–6 codons (e.g., 1 for methionine, 6 for leucine). Employing an average degeneracy of ~3 codons per amino acid (reflective of the code’s overall distribution), the theoretical number of encodings approximates 3345, or on the order of 10164 variants (log10(3^345) ≈ 164).

Practical Limitations Precluding Functionality of Many Encodings

  1. Organismal preferences for synonymous codons can attenuate translation efficiency, inducing ribosomal stalling or suboptimal tRNA utilization. Recoding discussions in the slides (e.g., for phage resistance) imply that non-preferred codons may engender expression failure in heterologous hosts.
  2. Sequences predisposed to deleterious folding can impede ribosomal procession or mRNA integrity. Illustrative cases from the slides depict minimum free energy (MFE) configurations at 25°C across GC contents of 10%, 50%, and 90%.
  • Low GC content (e.g., 10%) yields labile structures prone to degradation.
  • Elevated GC content (e.g., 90%) fosters hyperstable hairpins or loops, obstructing translation initiation.
  • These phenomena are governed by base-pairing free energies (A/T ≈ -1.2 kcal/mol; G/C ≈ -2.0 kcal/mol), with GC-rich motifs exacerbating folding propensity and hindering mRNA processing.
  1. Motifs susceptible to endonucleases, such as RNase III in Escherichia coli, precipitate premature mRNA degradation.
  2. Cryptic elements (e.g., promoters, terminators, or splice junctions) can disrupt transcriptional or post-transcriptional regulation.
  3. Gene assembly challenges, particularly for repetitive or GC-biased sequences, amplify inaccuracies in chemical or enzymatic synthesis.
  4. Factors such as tRNA abundance or cellular milieu can preclude functional proteogenesis, necessitating optimization for applications like pharmaceutical or biofuel production.

Homework Questions from Dr. LeProust

1. What’s the most commonly used method for oligo synthesis currently?

The most commonly used method for oligonucleotide (oligo) synthesis currently is solid-phase phosphoramidite chemistry. This approach, developed by Marvin Caruthers in 1981, involves a cyclic process of deblocking, coupling with a phosphoramidite nucleoside, capping unreacted sites, and oxidation, repeated for each nucleotide addition. It is performed on a solid support, such as controlled pore glass (CPG), enabling automated synthesis and high efficiency for short to medium-length oligos.

2. Why is it difficult to make oligos longer than 200nt via direct synthesis?

Synthesizing oligos longer than 200 nucleotides (nt) via direct chemical synthesis is challenging primarily due to the imperfect coupling efficiency in each cycle, typically around 98-99%. As the chain length increases, the yield of full-length product decreases exponentially according to the formula: yield ≈ (efficiency)(n-1), where n is the number of nucleotides. For example, at 99% efficiency, the yield for a 200 nt oligo is approximately 0.99199 ≈ 0.135 (13.5%), but for longer sequences, it drops significantly, leading to low yields, increased truncated products, and higher error rates from side reactions or depurination. Additionally, purification becomes more difficult, and secondary structures in long sequences can hinder synthesis.

3. Why can’t you make a 2000bp gene via direct oligo synthesis?

Direct oligo synthesis cannot produce a 2000 base pair (bp) gene because the length far exceeds the practical limits of chemical synthesis methods like phosphoramidite chemistry, where yields become negligible (e.g., 0.991999 ≈ 10{-8.7}, essentially zero). Genes of this size are double-stranded and require error-free sequences, but direct synthesis accumulates errors and impurities exponentially. Instead, long genes are assembled from shorter oligos (typically 40-300 nt) using enzymatic methods like PCR-based assembly or ligation, as illustrated in classical gene synthesis protocols, to achieve high fidelity and yield.

Homework Question from George Church

Choose ONE of the following three questions to answer; and please cite AI prompts or paper citations used, if any.

What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?

What code would you suggest for AA:AA interactions?

Given the one paragraph abstracts for these real 2026 grant programs sketch a response to one of them or devise one of your own:

The 10 essential amino acids required by all animals (those that cannot be synthesized endogenously in sufficient quantities and must be obtained through the diet) are: arginine, histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, and valine. This list is consistent across various species, including mammals like dogs, pigs, and horses, though some animals (e.g., cats) have an additional requirement for taurine. Slide #4 from Prof. Church’s lecture illustrates the standard genetic code mapping RNA codons to these amino acids (plus others), highlighting the ribosomal translation process that relies on this code to incorporate them into proteins.

The “Lysine Contingency” refers to a fictional genetic failsafe in Jurassic Park, where cloned dinosaurs were engineered to lack the ability to synthesize lysine, forcing dependence on park-supplied supplements to prevent survival if they escaped. Knowing that lysine is one of the 10 essential amino acids reinforces why this approach is inherently flawed: in nature, animals routinely obtain essential amino acids (including lysine) from dietary sources like plants (e.g., beans, soy) or other animals, without needing to synthesize them. Escaped dinosaurs could simply forage or hunt for lysine-rich foods, rendering the contingency ineffective as depicted in the story where they thrive on Isla Nublar. The limitations of single-AA dependency as a safety measure. More robust biocontainment, like full genome recoding to alter multiple codons (as in Syn61Δ3 bacteria), could create true barriers incompatible with wild-type biology, unlike the dietary workaround possible with lysine.

Week 2 HW: DNA Read, Write, and Edit

cover image cover image

Part 0: Designing your Gel Art

Step 1

Step 2

Step 3

Step 4

ECORI

HindIII

BamHI

KpnI

EcoRV

SacI

SalI

All restriction enzyme

Part 2: Gel Art - Restriction Digests and Gel Electrophoresis

Part 3: DNA Design Challenge

3.1. Choose your protein. In recitation, we discussed that you will pick a protein for your homework that you find interesting. Which protein have you chosen and why? Using one of the tools described in recitation (NCBI, UniProt, google), obtain the protein sequence for the protein you chose. [Example from our group homework, you may notice the particular format — The example below came from UniProt] >sp|P03609|LYS_BPMS2 Lysis protein OS=Escherichia phage MS2 OX=12022 PE=2 SV=1 METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLL EAVIRTVTTLQQLLT

I choose Tau protein, because it is a microtubule associated protein that helps regulate neuronal microtubules, and it’s strongly connected to neurodegenerative disease (tauopathies, including Alzheimer’s disease). Tau is also interesting computationally because it has multiple splice isoforms and large intrinsically disordered regions.

UniProt Accession - P10636 Entry name - TAU_HUMAN Protein - Microtubule associated protein tau Gene - MAPT

>sp|P10636|TAU_HUMAN Microtubule-associated protein tau OS=Homo sapiens OX=9606 GN=MAPT PE=1 SV=5 MAEPRQEFEVMEDHAGTYGLGDRKDQGGYTMHQDQEGDTDAGLKESPLQTPTEDGSEEPG SETSDAKSTPTAEDVTAPLVDEGAPGKQAAAQPHTEIPEGTTAEEAGIGDTPSLEDEAAG HVTQEPESGKVVQEGFLREPGPPGLSHQLMSGMPGAPLLPEGPREATRQPSGTGPEDTEG GRHAPELLKHQLLGDLHQEGPPLKGAGGKERPGSKEEVDEDRDVDESSPQDSPPSKASPA QDGRPPQTAAREATSIPGFPAEGAIPLPVDFLSKVSTEIPASEPDGPSVGRAKGQDAPLE FTFHVEITPNVQKEQAHSEEHLGRAAFPGAPGEGPEARGPSLGEDTKEADLPEPSEKQPA AAPRGKPVSRVPQLKARMVSKSKDGTGSDDKKAKTSTRSSAKTLKNRPCLSPKHPTPGSS DPLIQPSSPAVCPEPPSSPKYVSSVTSRTGSSGAKEMKLKGADGKTKIATPRGAAPPGQK GQANATRIPAKTPPAPKTPPSSGEPPKSGDRSGYSSPGSPGTPGSRSRTPSLPTPPTREP KKVAVVRTPPKSPSSAKSRLQTAPVPMPDLKNVKSKIGSTENLKHQPGGGKVQIINKKLD LSNVQSKCGSKDNIKHVPGGGSVQIVYKPVDLSKVTSKCGSLGNIHHKPGGGQVEVKSEK LDFKDRVQSKIGSLDNITHVPGGGNKKIETHKLTFRENAKAKTDHGAEIVYKSPVVSGDT SPRHLSNVSSTGSIDMVDSPQLATLADEVSASLAKQGL

3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence. The Central Dogma discussed in class and recitation describes the process in which DNA sequence becomes transcribed and translated into protein. The Central Dogma gives us the framework to work backwards from a given protein sequence and infer the DNA sequence that the protein is derived from. Using one of the tools discussed in class, NCBI or online tools (google “reverse translation tools”), determine the nucleotide sequence that corresponds to the protein sequence you chose above. [Example: Get to the original sequence of phage MS2 L-protein from its genome phage MS2 genome - Nucleotide - NCBI]

Lysis protein DNA sequence atggaaacccgattccctcagcaatcgcagcaaactccggcatctactaatagacgccggccattcaaacatgaggattacccatgtcgaagacaacaaagaagttcaactctttatgtattgatcttcctcgcgatctttctctcgaaatttaccaatcaattgcttctgtcgctactggaagcggtgatccgcacagtgacgactttacagcaattgcttacttaa

CDS (5’ - 3’ 1 atggctgagc cccgccagga gttcgaagtg atggaagatc acgctgggac gtacgggttg 61 ggggacagga aagatcaggg gggctacacc atgcaccaag accaagaggg tgacacggac 121 gctggcctga aagaatctcc cctgcagacc cccactgagg acggatctga ggaaccgggc 181 tctgaaacct ctgatgctaa gagcactcca acagcggaag ctgaagaagc aggcattgga 241 gacaccccca gcctggaaga cgaagctgct ggtcacgtga cccaagagga gttgagagtt 301 ccgggccggc agaggaaggc gcctgaaagg cccctggcca atgagattag cgcccacgtc 361 cagcctggac cctgcggaga ggcctctggg gtctctgggc cgtgcctcgg ggagaaagag 421 ccagaagctc ccgtcccgct gaccgcgagc cttcctcagc accgtcccgt ttgcccagcg 481 cctcctccaa caggaggccc tcaggagccc tccctggagt ggggacaaaa aggcggggac 541 tgggccgaga agggtccggc ctttccgaag cccgccacca ctgcgtatct ccacacagag 601 cctgaaagtg gtaaggtggt ccaggaaggc ttcctccgag agccaggccc cccaggtctg 661 agccaccagc tcatgtccgg catgcctggg gctcccctcc tgcctgaggg ccccagagag 721 gccacacgcc aaccttcggg gacaggacct gaggacacag agggcggccg ccacgcccct 781 gagctgctca agcaccagct tctaggagac ctgcaccagg aggggccgcc gctgaagggg 841 gcagggggca aagagaggcc ggggagcaag gaggaggtgg atgaagaccg cgacgtcgat 901 gagtcctccc cccaagactc ccctccctcc aaggcctccc cagcccaaga tgggcggcct 961 ccccagacag ccgccagaga agccaccagc atcccaggct tcccagcgga gggtgccatc 1021 cccctccctg tggatttcct ctccaaagtt tccacagaga tcccagcctc agagcccgac 1081 gggcccagtg tagggcgggc caaagggcag gatgcccccc tggagttcac gtttcacgtg 1141 gaaatcacac ccaacgtgca gaaggagcag gcgcactcgg aggagcattt gggaagggct 1201 gcatttccag gggcccctgg agaggggcca gaggcccggg gcccctcttt gggagaggac 1261 acaaaagagg ctgaccttcc agagccctct gaaaagcagc ctgctgctgc tccgcggggg 1321 aagcccgtca gccgggtccc tcaactcaaa gctcgcatgg tcagtaaaag caaagacggg 1381 actggaagcg atgacaaaaa agccaagaca tccacacgtt cctctgctaa aaccttgaaa 1441 aataggcctt gccttagccc caaacacccc actcctggta gctcagaccc tctgatccaa 1501 ccctccagcc ctgctgtgtg cccagagcca ccttcctctc ctaaatacgt ctcttctgtc 1561 acttcccgaa ctggcagttc tggagcaaag gagatgaaac tcaagggggc tgatggtaaa 1621 acgaagatcg ccacaccgcg gggagcagcc cctccaggcc agaagggcca ggccaacgcc 1681 accaggattc cagcaaaaac cccgcccgct ccaaagacac cacccagctc tggtgaacct 1741 ccaaaatcag gggatcgcag cggctacagc agccccggct ccccaggcac tcccggcagc 1801 cgctcccgca ccccgtccct tccaacccca cccacccggg agcccaagaa ggtggcagtg 1861 gtccgtactc cacccaagtc gccgtcttcc gccaagagcc gcctgcagac agcccccgtg 1921 cccatgccag acctgaagaa tgtcaagtcc aagatcggct ccactgagaa cctgaagcac 1981 cagccgggag gcgggaaggt gcagataatt aataagaagc tggatcttag caacgtccag 2041 tccaagtgtg gctcaaagga taatatcaaa cacgtcccgg gaggcggcag tgtgcaaata 2101 gtctacaaac cagttgacct gagcaaggtg acctccaagt gtggctcatt aggcaacatc 2161 catcataaac caggaggtgg ccaggtggaa gtaaaatctg agaagcttga cttcaaggac 2221 agagtccagt cgaagattgg gtccctggac aatatcaccc acgtccctgg cggaggaaat 2281 aaaaagattg aaacccacaa gctgaccttc cgcgagaacg ccaaagccaa gacagaccac 2341 ggggcggaga tcgtgtacaa gtcgccagtg gtgtctgggg acacgtctcc acggcatctc 2401 agcaatgtct cctccaccgg cagcatcgac atggtagact cgccccagct cgccacgcta 2461 gctgacgagg tgtctgcctc cctggccaag cagggtttgt ga

3.3. Codon optimization. Once a nucleotide sequence of your protein is determined, you need to codon optimize your sequence. You may, once again, utilize google for a “codon optimization tool”. In your own words, describe why you need to optimize codon usage. Which organism have you chosen to optimize the codon sequence for and why? [Example from Codon Optimization Tool | Twist Bioscience while avoiding Type IIs enzyme recognition sites BsaI, BsmBI, and BbsI]

Lysis protein DNA sequence with Codon-Optimization ATGGAAACCCGCTTTCCGCAGCAGAGCCAGCAGACCCCGGCGAGCACCAACCGCCGCCGCCCGTTCAAACATGAAGATTATCCGTGCCGTCGTCAGCAGCGCAGCAGCACCCTGTATGTGCTGATTTTTCTGGCGATTTTTCTGAGCAAATTCACCAACCAGCTGCTGCTGAGCCTGCTGGAAGCGGTGATTCGCACAGTGACGACCCTGCAGCAGCTGCTGACCTAA

Even though different codons can encode the same amino acid, organisms prefer certain codons. If a gene sequence is from one organism and express it in a different host, the host may,
* Translate it slowly because the needed tRNAs are rare.
* Have ribosomo stalling, premature termination, or misfolding issues
* Produce much lower protein yield

I would optimize Tau (MAPT) for Escherichia coli because it’s the most common, fast, and inexpensive expression host for recombinant proteins in teaching labs and basic cloning. It also has strong codon relative to human genes, so optimization often makes a big difference..

The tool reports statistics for the original versus codon-optimized DNA sequence. The pasted sequence has a GC content of 61.19% and a CAI of 0.51, where CAI (Codon Adaptation Index) indicates how closely the codon usage matches the preferred codons of the chosen expression host. A value around 0.5 suggests only moderate adaptation and potentially less efficient translation. After optimization, the improved DNA keeps a very similar GC content (60.59%) but raises the CAI to 0.90, meaning the DNA was rewritten using synonymous codons that are much more favored by the host while still encoding the same protein. In addition, the avoid cleavage sites setting shows the optimized sequence was designed to remove internal restriction sites for Basal and Esp3I (a common alias for BsmBI), which helps prevent unwanted cutting suring Type IIS.

3.4. You have a sequence! Now what? What technologies could be used to produce this protein from your DNA? Describe in your words the DNA sequence can be transcribed and translated into your protein. You may describe either cell-dependent or cell-free methods, or both.

Cell dependent (in-vivo) expression in E. coli Tau protein can be produced by recombinant expression in living E. coli. The optimized tau coding sequnece is cloned into a bacterial expression plasmid containing an E. coli promoter, a Shine-Dalgarno ribosome binding site, and a transcription terminator, oftern with an affinity tag such as a His-tag to aid purification. After transformation into E.coli and induction (with IPTG in lac/T7 systems), RNA polymerase transcribes the tau gene into mRNA. Ribosomes bind the RBS and translate the mRNA from the start codon to the stop codon as tRNAs add amino acids according to each codon. Because the sequnece is codon-optimized (high CAI) for E. coli, codons better match abundant bacterial tRNAs, which can improve translation efficiency and increase protein yield. The expressed tau protein is then harvested from cells and purified, commonly ny affininty chromatography.

Cell-free (in-vivo) expression Tau protein can also be produced using cell free expresison, where the tau DNA template (plasmid or linear) is added directly to an E. coli lysate or purified transription/translation system containing RNA polymerase, ribosomes, tRNAs, amino acids, nucleotides, and an energy regeneration mixture. In this setup, the DNA is transcribed into mRNA and translated into tau protein in vitro, without maintainig living cells. Cell free systems are typically faster to set up and allow prcise control conditions (such as temperature and reaction composition), which can be useful for optimizing yield or expressing proteins that are difficult or toxic in-vivo.

3.5. [Optional] How does it work in nature/biological systems?

Describe how a single gene codes for multiple proteins at the transcriptional level. Try aligning the DNA sequence, the transcribed RNA, and also the resulting translated Protein!!!

In natural biological systems (especially eukaryotes), one gene can produce multiple protein isoforms because the primary transcript (pre-mRNA) can be processed in different ways before translation. The main transcription level (and immediate RNA processing) mechanisms are, alternative promoter usage (different transcription sites - different 5’ ends), alternative splicing (different combinations of expns retained/removed - different coding sequences). Alternative polyadenylation (different 3’ ends that can change UTRs and sometimes coding regions), and in some organisms RNA editing (changing specific bases in the RNA, which can alter codons). Among these, alternatice splicing is the most common explanation for “one gene -> many proteins”. Because inclusing or skipping specific exons changes the mRNA codons and therefore the aminoacid sequence of the final protein.

i.e.

GENE (coding strand DNA)
Exon 1:  ATG GCT
Exon 2:  GAA TTT
Exon 3:  CCT TAA

Pre‑mRNA (conceptually: exons + introns transcribed, then introns removed)
Mature mRNA Isoform A (Exon 1 + Exon 2 + Exon 3):
AUG GCU GAA UUU CCU UAA

Translation (codon → amino acid) Isoform A:
AUG  GCU  GAA  UUU  CCU  UAA
Met  Ala  Glu  Phe  Pro  Stop


Mature mRNA Isoform B (Exon 1 + Exon 3; exon 2 skipped):
AUG GCU CCU UAA

Translation Isoform B:
AUG  GCU  CCU  UAA
Met  Ala  Pro  Stop

Part 4: Prepare a Twist DNA Synthesis Order

4.1. Create a Twist account and a Benchling account

image.png image.pngimage.png image.png

4.2. Build Your DNA Insert Sequence

**For example, let’s make a sequence that will make E. coli glow fluorescent green under UV light by constitutively (always) expressing sfGFP (a green fluorescent protein):

In Benchling, select New DNA/RNA sequence**

image.png image.png

Give your insert sequence a name and select DNA with a Linear topology (this is a linear sequence that will be inserted into a circular backbone vector of our choosing).

image.png image.png

Go through each piece of the given DNA sequences highlighted below (Promoter, RBS, Start Codon, Coding Sequence, His Tag, Stop Codon, Terminator) and paste the sequences into the Benchling file one after the other (replacing the coding sequence with your codon optimized DNA sequence of interest!). Each time you add a new piece of the sequence, make sure to annotate by right clicking over the sequence and creating an annotation that describes what each piece (e.g., Promoter, RBS, etc.) is (see image below).

image.png image.pngimage.png image.pngimage.png image.pngimage.png image.pngimage.png image.pngimage.png image.pngimage.png image.pngimage.png image.png

Once you’ve completed this, click on Linear Map to preview the entire sequence. If you intend to have a TA review a sequence in the future, this is a good way to verify that all sections are annotated!

Linear Map After Annotated

image.png image.png

This insert sequence you built is commonly referred to as an expression cassette in molecular biology (a sequence you can drop into any vector and it’ll perform its function). Go ahead and download the FASTA file for the sequence you made.

image.png image.png

It’s helpful to visualize DNA designs using SBOL Canvas (Synthetic Biology Open Language) to convey your designs. Here’s an example of what you just annotated in Benchling:

4.3. On Twist, Select The “Genes” Option

image.png image.png

4.4. Select “Clonal Genes” option

For this demonstration, we’ll choose Clonal Genes. You’ll select clonal genes or gene fragments depending on your final project.

Historically, HTGAA projects using clonal genes (circular DNA) have reached experimental results 1-2 weeks quicker because they can be transformed directly into E. coli without additional assembly.

Gene fragments (linear DNA) offer greater design flexibility but typically require an assembly or cloning step prior to transformation. An advantage is If designed with the appropriate exonuclease protection, gene fragments can be used directly in cell-free expression.

image.png image.png

4.5. Import your sequence

You just took an amino acid sequence of interest and converted it into DNA, codon optimized it, and built an expression cassette around it! Choose the Nucleotide Sequence option and Upload Sequence File to upload your FASTA file.

image.png image.png

4.6. Choose Your Vector

Since we’re ordering a clonal gene, you will need to refer to Twist’s Vector Catalog to choose your circular backbone. You can think of this as taking your linear expression cassette for your protein of interest, and completing the rest of the circle!

The backbone confers many special properties like antibiotic resistance, an origin of replication, and more. Discuss with your node to decide on appropriate antibiotic options. At MIT/Harvard, you can use Ampicillin, Chloramphenicol, or Kanamycin resistance.

Twist vectors do not contain restriction sites near the insert fragment, so make sure to flank your design with cut sites if you are intending to extract this DNA insert fragment later.

For this demonstration, choose a Twist cloning vectors like pTwist Amp High Copy.

image.png image.png

Click into your sequence and select download construct (GenBank) to get the full plasmid sequence: image.png image.png

Go back to your Benchling account. Inside of a folder, click the import DNA/RNA sequence button and upload the GenBank file you just downloaded.

image.png image.pngimage.png image.png

This is the plasmid I just built with my expression cassette included. Congratulations on building my first plasmid, My Self :-)

Part 5: DNA Read/Write/Edit

5.1 DNA Read

(i) What DNA would you want to sequence (e.g., read) and why? This could be DNA related to human health (e.g. genes related to disease research), environmental monitoring (e.g., sewage waste water, biodiversity analysis), and beyond (e.g. DNA data storage, biobank).

I have selected the MAPT gene (Microtubule-Associated Protein Tau), which encodes the Tau protein, as the DNA sequence I would prioritize for sequencing. Located on chromosome 17q21 in humans, the MAPT gene spans approximately 134 kilobases and consists of 16 exons. Tau protein plays a critical role in stabilizing microtubules within neurons, facilitating axonal transport and maintaining neuronal structure. Sequencing this gene offers profound insights into human health, particularly neurodegenerative diseases, while also holding potential for broader applications in biobanking and bioinformatics.

Rationale for Sequencing the MAPT Gene

1. Relevance to Human Health and Disease Research

Alzheimer’s Disease (AD) - Mutations or hyperphosphorylation of Tau leads to neurofibrillary tangles, a hallmark of AD pathology. Sequencing MAPT could identify genetic variants (e.g., single nucleotide polymorphisms like rs8070723) that increase susceptibility, enabling risk stratification and early intervention.

Frontotemporal Dementia (FTD) and Other Tauopathies - Over 50 pathogenic mutations in MAPT have been linked to FTD, progressive supranuclear palsy, and corticobasal degeneration. High-throughput sequencing could reveal novel variants, informing precision medicine approaches such as gene therapy or small-molecule inhibitors targeting Tau misfolding.

Tau dysfunction is implicated in conditions like chronic traumatic encephalopathy (CTE) from repetitive brain injuries. Comparative sequencing across populations could elucidate gene-environment interactions, such as the role of lifestyle factors in disease onset.

By sequencing MAPT, researchers could generate data for genome-wide association studies (GWAS), accelerating drug discovery. For instance, CRISPR-based editing of faulty MAPT sequences has shown promise in preclinical models, potentially translating to therapeutic applications.

2. Applications in Environmental Monitoring and Biodiversity

Analyzing MAPT orthologs in model organisms (e.g., mice, zebrafish) or even ancient DNA from extinct species (e.g., Neanderthals via paleogenomics) could provide evolutionary insights into brain development and resilience. This ties into biodiversity monitoring by highlighting conserved genetic elements across species, aiding conservation efforts in ecosystems affected by climate change.

In scenarios like wastewater analysis (e.g., eDNA sequencing), detecting human-derived MAPT fragments could serve as biomarkers for population health surveillance, such as tracking neurodegenerative disease prevalence in urban sewage systems.

3. Beyond Traditional Applications: DNA Data Storage and Biobanks

Integrating MAPT sequences into large-scale biobanks (e.g., the UK Biobank or All of Us Research Program) would create comprehensive datasets for AI-driven analysis. This could facilitate machine learning models predicting disease trajectories based on genetic, epigenetic, and phenotypic data.

Emerging technologies use synthetic DNA for high-density, long-term data storage. Studying the stable, repetitive structure of MAPT could inspire bioengineered storage systems, where natural gene motifs encode digital information with error-correcting capabilities. This represents a futuristic intersection of biology and information technology, potentially revolutionizing data archiving in an era of big data.

Methodological Considerations

Sequencing Techniques I recommend next-generation sequencing (NGS) platforms like Illumina NovaSeq for high accuracy, supplemented by long-read technologies (e.g., PacBio) to resolve MAPT’s repetitive regions. Bioinformatics tools such as GATK for variant calling and AlphaFold for protein structure prediction would enhance data interpretation.

Interpreting non-coding regions and epigenetic modifications (e.g., methylation) requires interdisciplinary expertise. Future work could involve single-cell sequencing to capture neuronal heterogeneity.

(ii) In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why? Also answer the following questions:

Is your method first-, second- or third-generation or other? How so? What is your input? How do you prepare your input (e.g. fragmentation, adapter ligation, PCR)? List the essential steps. What are the essential steps of your chosen sequencing technology, how does it decode the bases of your DNA sample (base calling)? What is the output of your chosen sequencing technology?

For sequencing the MAPT gene (encoding Tau protein), I would employ a hybrid approach combining Illumina NovaSeq (a second-generation sequencing platform) as the primary technology, supplemented by PacBio Sequel (a third-generation platform) for targeted regions.

Why Illumina NovaSeq? It is a high-throughput, short-read next-generation sequencing (NGS) system ideal for accurate, cost-effective sequencing of targeted genes like MAPT. With read lengths of 100–300 base pairs (bp) and error rates below 0.1%, it excels in variant detection, including single nucleotide polymorphisms (SNPs) and small insertions/deletions (indels) relevant to tauopathies. Its scalability suits biobank-scale projects, and costs have dropped to ~$0.01 per million bases, making it accessible for human health research. However, it struggles with highly repetitive or GC-rich regions in MAPT, which is why supplementation is needed.

Why Supplement with PacBio Sequel? This long-read technology provides reads up to 20–30 kb, enabling resolution of MAPT’s complex repetitive exons (e.g., exon 10 repeats linked to splicing isoforms). It is particularly useful for phasing haplotypes and detecting structural variants, which short-read methods might miss.

Illumina NovaSeq This is a second-generation sequencing technology. Second-generation methods, also known as NGS, involve massively parallel sequencing of amplified DNA fragments, producing short reads (typically <500 bp) with high throughput. Unlike first-generation (e.g., Sanger sequencing, which is chain-termination based and low-throughput) or third-generation (e.g., single-molecule, long-read methods), it relies on amplification and reversible terminator chemistry for base-by-base synthesis and detection.

PacBio Sequel This is a third-generation technology. It sequences single DNA molecules in real-time without amplification, generating long reads directly from native DNA. This distinguishes it from second-generation methods by avoiding PCR bias and enabling direct observation of epigenetic modifications.

The input for both technologies is high-quality genomic DNA extracted from samples (e.g., blood, tissue, or cell lines relevant to MAPT studies, such as neuronal cells from Alzheimer’s patients). Preparation ensures the DNA is suitable for library construction, focusing on purity (A260/A280 ratio ~1.8–2.0) and quantity (typically 50–1000 ng).

Steps for Input Preparation (Illumina NovaSeq-Focused, with PacBio Notes)

DNA Extraction - Isolate genomic DNA using kits like Qiagen DNeasy (for blood/tissue) to obtain intact, high-molecular-weight DNA.

Quantification and Quality Check - Measure concentration (e.g., Qubit fluorometer) and integrity (e.g., agarose gel electrophoresis or Agilent Bioanalyzer).

Fragmentation - Shear DNA into 200–500 bp fragments using sonication (e.g., Covaris) or enzymatic methods (e.g., Nextera tagmentation) for Illumina. (For PacBio, minimal fragmentation is needed; aim for >10 kb fragments using gentle pipetting or MEGARUPTOR.)

End Repair and A-Tailing - Blunt-end fragments and add adenine overhangs using enzymes like T4 DNA polymerase and Klenow fragment to facilitate adapter ligation.

Adapter Ligation - Attach platform-specific adapters (e.g., Illumina TruSeq adapters with barcodes for multiplexing) using T4 DNA ligase. These include sequencing primers and indices.

PCR Amplification - Enrich the library via limited-cycle PCR (8–12 cycles) to amplify adapter-ligated fragments, introducing necessary sequences for clustering. (PacBio skips or minimizes PCR to avoid bias, using ligation-based library prep like SMRTbell adapters.)

Size Selection and Purification - Use magnetic beads (e.g., AMPure XP) to select optimal fragment sizes and remove contaminants.

Quality Control - Validate library via qPCR or Bioanalyzer to ensure concentration and fragment distribution.

5.2 DNA Write

(i) What DNA would you want to synthesize (e.g., write) and why? These could be individual genes, clusters of genes or genetic circuits, whole genomes, and beyond. As described in class thus far, applications could range from therapeutics and drug discovery (e.g., mRNA vaccines and therapies) to novel biomaterials (e.g. structural proteins), to sensors (e.g., genetic circuits for sensing and responding to inflammation, environmental stimuli, etc.), to art (DNA origamis). If possible, include the specific genetic sequence(s) of what you would like to synthesize! You will have the opportunity to actually have Twist synthesize these DNA constructs! :)

I propose synthesizing a modular genetic circuit designed to sense and respond to abnormal Tau protein aggregation, a hallmark of neurodegenerative diseases like Alzheimer’s disease (AD). This circuit would be based on a synthetic promoter-responsive system integrated with a reporter gene, drawing inspiration from synthetic biology tools like those used in biosensors. Building on my previous selection of the MAPT gene for sequencing (which encodes Tau), this synthesis shifts to “writing” DNA for therapeutic and diagnostic applications. The circuit could be delivered as plasmid DNA or mRNA for expression in neuronal cell models, enabling real-time monitoring of Tau pathology.

Neurodegenerative diseases affect over 50 million people globally, with AD alone projected to triple in prevalence by 2050. Abnormal Tau aggregation leads to neurofibrillary tangles, disrupting neuronal function. Synthesizing this circuit would allow for the creation of a biosensor that detects Tau misfolding in living cells, triggering a fluorescent reporter (e.g., GFP) or therapeutic response (e.g., expression of a chaperone protein to dissolve aggregates). This aligns with mRNA-based therapies, similar to COVID-19 vaccines, where synthetic sequences are used for targeted protein expression. It could accelerate drug screening by providing a high-throughput assay for anti-Tau compounds.

The circuit acts as a genetic sensor for intracellular “environmental” stimuli, such as protein misfolding or inflammation associated with tauopathies. It could be engineered into organoids or animal models to monitor disease progression in real-time, responding to stimuli like oxidative stress.

Extended versions could incorporate Tau-inspired structural proteins for biomaterials (e.g., microtubule-like scaffolds for tissue engineering). In a creative twist, it could inspire DNA origami art, where Tau motifs form nanoscale structures mimicking neuronal tangles for educational visualizations.

What technology or technologies would you use to perform this DNA synthesis and why? Also answer the following questions:

What are the essential steps of your chosen sequencing methods? What are the limitations of your sequencing method (if any) in terms of speed, accuracy, scalability?

Twist’s method achieves error rates as low as 1:2,000 bases through massively parallel synthesis of oligonucleotides (oligos) up to 300 nucleotides long, followed by error correction and assembly into longer genes. This is crucial for my circuit, which requires precise regulatory elements (e.g., promoter motifs) to ensure functionality in sensing Tau aggregation without off-target effects.

It supports synthesizing thousands of custom sequences in parallel on a single chip, making it ideal for iterative designs (e.g., variants of the GFP reporter or adding therapeutic modules). For my ~1.2 kb construct, Twist can deliver it as a clonal gene fragment or plasmid insert, with turnaround times of 5–10 business days.

Twist has experience with biotech tools, including circuits for drug discovery and biosensors. Their platform aligns with the assignment’s invitation to have Twist synthesize constructs, enabling real-world prototyping for Tau-related neurodegeneration research.

For ultra-long constructs (>10 kb, e.g., if expanding to a full synthetic genome fragment), EDS uses template-independent DNA polymerases to build sequences de novo, avoiding chemical synthesis limitations like toxicity from phosphoramidite reagents. However, Twist’s method suffices for my modular circuit.

5.3 DNA Edit

(i) What DNA would you want to edit and why? In class, George shared a variety of ways to edit the genes and genomes of humans and other organisms. Such DNA editing technologies have profound implications for human health, development, and even human longevity and human augmentation. DNA editing is also already commonly leveraged for flora and fauna, for example in nature conservation efforts, (animal/plant restoration, de-extinction), or in agriculture (e.g. plant breeding, nitrogen fixation). What kinds of edits might you want to make to DNA (e.g., human genomes and beyond) and why?

auopathies affect millions worldwide, with AD alone impacting over 50 million people and costing ~$1 trillion annually in healthcare. Mutations in MAPT (e.g., the P301L variant) disrupt Tau’s role in microtubule stabilization, leading to neurodegeneration, cognitive decline, and reduced lifespan. Editing these could prevent or reverse pathology, extending healthy lifespan by 5–10 years in at-risk individuals. This aligns with human augmentation goals, such as enhancing cognitive resilience against aging, without venturing into controversial “designer” traits.

MAPT is a high-impact target: It’s directly linked to preventable suffering, and editing it builds on existing research (e.g., preclinical CRISPR studies in mouse models showing reduced Tau aggregates). Unlike broader genome edits (e.g., for polygenic traits like intelligence), MAPT offers a focused, monogenic intervention with clearer paths to clinical trials. It also ties into my prior choices, creating a cohesive “read-write-edit” pipeline for Tau-related biotech.

Extend edits to model organisms like mice or zebrafish, engineering MAPT orthologs to study evolutionary conservation of brain health. This could inform de-extinction efforts (e.g., editing revived mammoth genomes for neural resilience in changing climates) or animal restoration (e.g., protecting endangered species from stress-induced neurodegeneration).

Inspired by MAPT’s role in cellular structure, edit plant genomes for analogous traits, such as enhancing microtubule-associated proteins in crops (e.g., editing maize genes for better drought resistance or nitrogen fixation efficiency). This could improve food security by creating resilient, high-yield plants without GMOs’ ethical baggage.

Editing MAPT prioritizes equity—targeting diseases disproportionately affecting aging populations in underserved communities. It avoids eugenics concerns by focusing on therapy, adhering to guidelines like those from the WHO on genome editing. Long-term, it could democratize longevity, but I’d advocate for global access to prevent exacerbating inequalities.

(ii) What technology or technologies would you use to perform these DNA edits and why? Also answer the following questions:

How does your technology of choice edit DNA? What are the essential steps? What preparation do you need to do (e.g. design steps) and what is the input (e.g. DNA template, enzymes, plasmids, primers, guides, cells) for the editing? What are the limitations of your editing methods (if any) in terms of efficiency or precision?

CRISPR-Cas9 enables targeted double-strand breaks (DSBs) at specific MAPT loci, while base editing allows single-base changes (e.g., C-to-T for correcting point mutations) without DSBs, reducing risks like indels. Prime editing offers “search-and-replace” functionality for precise insertions/deletions, ideal for MAPT’s repetitive regions linked to tauopathies.

It’s cost-effective (~$100–$500 per experiment), widely available, and scalable for high-throughput applications (e.g., editing neuronal cell lines or crop genomes). Preclinical studies (e.g., in AD mouse models) have shown CRISPR reducing Tau aggregates by 50–70%, making it suitable for human health, longevity, and augmentation goals.

For human therapeutics, it’s somatic-cell focused to avoid germline ethics issues. In agriculture/conservation, multiplex CRISPR (editing multiple sites) can enhance traits like nitrogen fixation in plants or neural resilience in animals. It’s more efficient than older methods like ZFNs (zinc-finger nucleases) and has a proven track record in FDA-approved therapies (e.g., Casgevy for sickle cell disease).

For MAPT regions with high off-target potential (e.g., repetitive sequences), TALENs provide modular DNA-binding domains for greater specificity, though they’re more labor-intensive. This hybrid approach ensures robustness for complex edits.

CRISPR-Cas9 edits DNA by using a guide RNA (gRNA) to direct the Cas9 enzyme to a specific sequence, where it creates a DSB. Variants like base editing fuse Cas9 with deaminases for single-base changes without breaks, while prime editing uses a modified Cas9 with reverse transcriptase for precise rewriting. Below, I focus on prime editing as the advanced method for MAPT (e.g., correcting mutations), with notes on standard Cas9.

Steps for Prime Editing (Primary Method)

  1. Targeting - The prime editing guide RNA (pegRNA) hybridizes to the target MAPT DNA sequence, recruiting the prime editor (a nicking Cas9 fused to reverse transcriptase).

  2. Nicking - The Cas9 domain creates a single-strand nick at the target site (e.g., exon 10 of MAPT), exposing the DNA for editing without a full DSB.

  3. Reverse Transcription - The pegRNA serves as a template; reverse transcriptase copies the desired edit (e.g., correcting P301L by replacing a single nucleotide) onto the nicked strand.

  4. Flap Resolution and Ligation - Cellular machinery resolves the edited flap, incorporating the change via endogenous repair pathways (e.g., flap endonuclease removes the old sequence, and ligase seals the new one).

  5. Integration - The edit is permanently incorporated into the genome, with minimal disruption.

How It Edits DNA It rewrites DNA like a word processor searching for the target via gRNA complementarity, then precisely replacing/inserting bases using the pegRNA template. This achieves ~20–50% efficiency for point edits, higher than standard Cas9 (~10–30% for HDR-based repairs).

For standard CRISPR-Cas9 - It induces DSBs, repaired by non-homologous end joining (NHEJ, for knockouts) or homology-directed repair (HDR, for precise insertions using a donor template). Base editing deaminates bases (e.g., C to U, then T) without breaks.

What Preparation Do You Need to Do (e.g., Design Steps) and What Is the Input (e.g., DNA Template, Enzymes, Plasmids, Primers, Guides, Cells) for the Editing?

Design Steps (Preparation) Target Selection - Analyze MAPT sequence (e.g., via NCBI or Ensembl) to identify edit sites (e.g., rs63751273 for P301L). Use tools like CRISPOR or Benchling to predict gRNA efficacy and off-target scores.

gRNA/pegRNA Design - Design 20-nt gRNAs complementary to the target, with a PAM site (NGG for Cas9). For prime editing, extend to pegRNA with edit template (e.g., 10–20 nt replacement sequence). Optimize for GC content (40–60%) and specificity.

Donor Template Design (if using HDR) - Create a single-stranded DNA oligo or plasmid with homology arms (~500 bp) flanking the desired edit.

Off-Target Prediction - Run in silico analyses (e.g., Cas-OFFinder) to select low-risk designs; validate with GUIDE-seq. Delivery System Design: Choose vectors (e.g., plasmids for transfection, AAV for in vivo) and cell-type optimizations (e.g., lipid nanoparticles for neurons).

Inputs for Editing DNA Template/Donor - Single- or double-stranded DNA oligo (50–200 nt) or plasmid carrying the repair sequence (e.g., wild-type MAPT exon for correction).

Enzymes - Cas9 protein (or prime editor fusion), often as ribonucleoprotein (RNP) complexes for direct delivery. Plasmids/Vectors: pSpCas9 plasmid expressing Cas9 and gRNA, or all-in-one vectors like Addgene’s PE2 for prime editing.

Primers/Guides - Synthetic gRNA/pegRNA (e.g., from IDT, 100–150 nt for pegRNA) and PCR primers for verification.

Cells/Organisms - Target cells (e.g., HEK293 or iPSC-derived neurons for human models; plant protoplasts for agriculture). For in vivo, animal models like MAPT-mutant mice.

Other - Transfection reagents (e.g., Lipofectamine), antibiotics for selection, and media for cell culture. These are assembled into a editing cocktail (e.g., RNP + donor DNA) and introduced via electroporation, transfection, or injection.

What Are the Limitations of Your Editing Methods (If Any) in Terms of Efficiency or Precision?

Efficiency Editing rates vary (10–50% for prime editing, lower in primary cells like neurons due to delivery barriers). HDR efficiency is low (~1–10%) in non-dividing cells, favoring error-prone NHEJ and indels. For MAPT, repetitive regions can reduce targeting success.

May require multiple rounds or enrichment (e.g., FACS sorting), increasing time/cost. In agriculture, plant regeneration from edited cells can be inefficient (20–40% success).

Use prime/base editing (DSB-free, higher efficiency) or multiplex gRNAs; optimize delivery (e.g., AAV for 70–90% transduction in brain tissues).

Precision Off-target effects (1–5% of edits) can occur at similar sequences genome-wide, potentially causing unintended mutations or cancer risks in therapeutics. Prime editing is more precise but can introduce rare “bystander” edits. MAPT’s homology to other genes (e.g., MAP2) heightens this risk. Impact: Reduces reliability for clinical use; ethical concerns for human augmentation or conservation (e.g., unintended ecological effects in edited species). Mitigation: Advanced variants like high-fidelity Cas9 (e.g., SpCas9-HF1) cut off-targets by 90%; combine with TALENs for dual validation. Post-editing sequencing (e.g., whole-genome) confirms precision.

Week 3 HW: Lab Automation

Your task this week is to Create a Python file to run on an Opentrons liquid handling robot.

Review this week’s recitation and this week’s lab for details on the Opentrons and programming it. Generate an artistic design using the GUI at opentrons-art.rcdonovan.com. Using the coordinates from the GUI, follow the instructions in the HTGAA26 Opentrons Colab to write your own Python script which draws your design using the Opentrons. You may use AI assistance for this coding — Google Gemini is integrated into Colab (see the stylized star bottom center); it will do a good job writing functional Python, while you probably need to take charge of the art concept. If you’re a proficient programmer and you’d rather code something mathematical or algorithmic instead of using your GUI coordinates, you may do that instead.

image.png image.pngimage.png image.pngimage.png image.png
**Coordinates**
azurite_points = [(-9.9, 38.5),(-7.7, 38.5),(-5.5, 38.5),(-3.3, 38.5),(-1.1, 38.5),(1.1, 38.5),(3.3, 38.5),(5.5, 38.5),(7.7, 38.5),(9.9, 38.5),(-16.5, 36.3),(-14.3, 36.3),(-12.1, 36.3),(-9.9, 36.3),(-7.7, 36.3),(-5.5, 36.3),(-1.1, 36.3),(1.1, 36.3),(3.3, 36.3),(5.5, 36.3),(7.7, 36.3),(9.9, 36.3),(12.1, 36.3),(14.3, 36.3),(16.5, 36.3),(-20.9, 34.1),(-18.7, 34.1),(-16.5, 34.1),(-14.3, 34.1),(-12.1, 34.1),(-9.9, 34.1),(-7.7, 34.1),(-5.5, 34.1),(3.3, 34.1),(12.1, 34.1),(14.3, 34.1),(16.5, 34.1),(18.7, 34.1),(20.9, 34.1),(-23.1, 31.9),(-20.9, 31.9),(-18.7, 31.9),(-16.5, 31.9),(-14.3, 31.9),(-12.1, 31.9),(-9.9, 31.9),(-7.7, 31.9),(-5.5, 31.9),(3.3, 31.9),(12.1, 31.9),(14.3, 31.9),(16.5, 31.9),(18.7, 31.9),(20.9, 31.9),(23.1, 31.9),(-25.3, 29.7),(-23.1, 29.7),(-20.9, 29.7),(-18.7, 29.7),(-16.5, 29.7),(-14.3, 29.7),(-12.1, 29.7),(-9.9, 29.7),(-7.7, 29.7),(-5.5, 29.7),(-3.3, 29.7),(-1.1, 29.7),(1.1, 29.7),(3.3, 29.7),(9.9, 29.7),(12.1, 29.7),(14.3, 29.7),(16.5, 29.7),(18.7, 29.7),(20.9, 29.7),(23.1, 29.7),(25.3, 29.7),(-27.5, 27.5),(-25.3, 27.5),(-23.1, 27.5),(-3.3, 27.5),(-1.1, 27.5),(1.1, 27.5),(3.3, 27.5),(5.5, 27.5),(7.7, 27.5),(9.9, 27.5),(12.1, 27.5),(14.3, 27.5),(16.5, 27.5),(18.7, 27.5),(20.9, 27.5),(23.1, 27.5),(25.3, 27.5),(27.5, 27.5),(-29.7, 25.3),(-27.5, 25.3),(-25.3, 25.3),(-23.1, 25.3),(-16.5, 25.3),(-14.3, 25.3),(-9.9, 25.3),(-1.1, 25.3),(1.1, 25.3),(3.3, 25.3),(5.5, 25.3),(7.7, 25.3),(9.9, 25.3),(12.1, 25.3),(14.3, 25.3),(16.5, 25.3),(18.7, 25.3),(20.9, 25.3),(23.1, 25.3),(25.3, 25.3),(27.5, 25.3),(29.7, 25.3),(-31.9, 23.1),(-29.7, 23.1),(-27.5, 23.1),(-25.3, 23.1),(-23.1, 23.1),(-16.5, 23.1),(-14.3, 23.1),(-7.7, 23.1),(-5.5, 23.1),(1.1, 23.1),(3.3, 23.1),(5.5, 23.1),(7.7, 23.1),(9.9, 23.1),(12.1, 23.1),(14.3, 23.1),(16.5, 23.1),(18.7, 23.1),(20.9, 23.1),(23.1, 23.1),(25.3, 23.1),(27.5, 23.1),(-34.1, 20.9),(-31.9, 20.9),(-29.7, 20.9),(-27.5, 20.9),(-25.3, 20.9),(-23.1, 20.9),(-16.5, 20.9),(-14.3, 20.9),(-9.9, 20.9),(-7.7, 20.9),(-5.5, 20.9),(-3.3, 20.9),(-1.1, 20.9),(3.3, 20.9),(5.5, 20.9),(7.7, 20.9),(9.9, 20.9),(12.1, 20.9),(14.3, 20.9),(16.5, 20.9),(18.7, 20.9),(20.9, 20.9),(23.1, 20.9),(25.3, 20.9),(-34.1, 18.7),(-31.9, 18.7),(-29.7, 18.7),(-27.5, 18.7),(-25.3, 18.7),(-23.1, 18.7),(-16.5, 18.7),(-14.3, 18.7),(-9.9, 18.7),(-7.7, 18.7),(-5.5, 18.7),(-3.3, 18.7),(-1.1, 18.7),(3.3, 18.7),(5.5, 18.7),(7.7, 18.7),(9.9, 18.7),(12.1, 18.7),(14.3, 18.7),(16.5, 18.7),(18.7, 18.7),(20.9, 18.7),(23.1, 18.7),(25.3, 18.7),(-36.3, 16.5),(-34.1, 16.5),(-31.9, 16.5),(-29.7, 16.5),(-27.5, 16.5),(-25.3, 16.5),(-23.1, 16.5),(-16.5, 16.5),(-14.3, 16.5),(-12.1, 16.5),(-9.9, 16.5),(-7.7, 16.5),(-5.5, 16.5),(-3.3, 16.5),(-1.1, 16.5),(1.1, 16.5),(5.5, 16.5),(7.7, 16.5),(9.9, 16.5),(12.1, 16.5),(14.3, 16.5),(16.5, 16.5),(18.7, 16.5),(20.9, 16.5),(36.3, 16.5),(-36.3, 14.3),(-34.1, 14.3),(-31.9, 14.3),(-29.7, 14.3),(-27.5, 14.3),(-25.3, 14.3),(-23.1, 14.3),(-16.5, 14.3),(-14.3, 14.3),(-12.1, 14.3),(-9.9, 14.3),(-7.7, 14.3),(-5.5, 14.3),(-3.3, 14.3),(-1.1, 14.3),(1.1, 14.3),(3.3, 14.3),(9.9, 14.3),(12.1, 14.3),(14.3, 14.3),(31.9, 14.3),(34.1, 14.3),(36.3, 14.3),(-36.3, 12.1),(-34.1, 12.1),(-31.9, 12.1),(-29.7, 12.1),(-27.5, 12.1),(-25.3, 12.1),(-23.1, 12.1),(-16.5, 12.1),(-14.3, 12.1),(-12.1, 12.1),(-9.9, 12.1),(-7.7, 12.1),(-5.5, 12.1),(-3.3, 12.1),(-1.1, 12.1),(1.1, 12.1),(3.3, 12.1),(25.3, 12.1),(29.7, 12.1),(31.9, 12.1),(34.1, 12.1),(36.3, 12.1),(-38.5, 9.9),(-27.5, 9.9),(-25.3, 9.9),(-20.9, 9.9),(-18.7, 9.9),(-16.5, 9.9),(-14.3, 9.9),(-3.3, 9.9),(-1.1, 9.9),(1.1, 9.9),(18.7, 9.9),(20.9, 9.9),(29.7, 9.9),(31.9, 9.9),(34.1, 9.9),(36.3, 9.9),(-38.5, 7.7),(-25.3, 7.7),(-20.9, 7.7),(-18.7, 7.7),(-14.3, 7.7),(16.5, 7.7),(29.7, 7.7),(31.9, 7.7),(34.1, 7.7),(36.3, 7.7),(38.5, 7.7),(-38.5, 5.5),(-25.3, 5.5),(-20.9, 5.5),(-18.7, 5.5),(-14.3, 5.5),(16.5, 5.5),(29.7, 5.5),(31.9, 5.5),(34.1, 5.5),(36.3, 5.5),(38.5, 5.5),(-38.5, 3.3),(-20.9, 3.3),(-18.7, 3.3),(-14.3, 3.3),(18.7, 3.3),(20.9, 3.3),(31.9, 3.3),(34.1, 3.3),(36.3, 3.3),(38.5, 3.3),(-38.5, 1.1),(-1.1, 1.1),(1.1, 1.1),(31.9, 1.1),(34.1, 1.1),(36.3, 1.1),(38.5, 1.1),(-38.5, -1.1),(-16.5, -1.1),(1.1, -1.1),(3.3, -1.1),(5.5, -1.1),(7.7, -1.1),(36.3, -1.1),(38.5, -1.1),(-38.5, -3.3),(-20.9, -3.3),(-18.7, -3.3),(-16.5, -3.3),(1.1, -3.3),(16.5, -3.3),(36.3, -3.3),(-16.5, -5.5),(14.3, -5.5),(25.3, -5.5),(-16.5, -7.7),(14.3, -7.7),(25.3, -7.7),(-38.5, -9.9),(-36.3, -9.9),(-34.1, -9.9),(-31.9, -9.9),(-29.7, -9.9),(-27.5, -9.9),(-25.3, -9.9),(-23.1, -9.9),(-20.9, -9.9),(-18.7, -9.9),(-16.5, -9.9),(-14.3, -9.9),(-12.1, -9.9),(-9.9, -9.9),(1.1, -9.9),(3.3, -9.9),(5.5, -9.9),(7.7, -9.9),(9.9, -9.9),(12.1, -9.9),(14.3, -9.9),(16.5, -9.9),(23.1, -9.9),(25.3, -9.9),(27.5, -9.9),(31.9, -9.9),(34.1, -9.9),(36.3, -9.9),(-36.3, -12.1),(-34.1, -12.1),(-31.9, -12.1),(-29.7, -12.1),(-27.5, -12.1),(-25.3, -12.1),(-23.1, -12.1),(-20.9, -12.1),(-18.7, -12.1),(-16.5, -12.1),(-14.3, -12.1),(-12.1, -12.1),(-9.9, -12.1),(-1.1, -12.1),(1.1, -12.1),(3.3, -12.1),(5.5, -12.1),(7.7, -12.1),(9.9, -12.1),(12.1, -12.1),(14.3, -12.1),(16.5, -12.1),(23.1, -12.1),(36.3, -12.1),(-36.3, -14.3),(-34.1, -14.3),(-31.9, -14.3),(-29.7, -14.3),(-27.5, -14.3),(-25.3, -14.3),(-23.1, -14.3),(-20.9, -14.3),(-18.7, -14.3),(-16.5, -14.3),(-14.3, -14.3),(-12.1, -14.3),(-7.7, -14.3),(-5.5, -14.3),(1.1, -14.3),(3.3, -14.3),(5.5, -14.3),(7.7, -14.3),(9.9, -14.3),(12.1, -14.3),(14.3, -14.3),(16.5, -14.3),(25.3, -14.3),(27.5, -14.3),(31.9, -14.3),(34.1, -14.3),(-36.3, -16.5),(-34.1, -16.5),(-31.9, -16.5),(-29.7, -16.5),(-27.5, -16.5),(-25.3, -16.5),(-23.1, -16.5),(-20.9, -16.5),(-18.7, -16.5),(-16.5, -16.5),(-12.1, -16.5),(-9.9, -16.5),(-7.7, -16.5),(-5.5, -16.5),(-3.3, -16.5),(1.1, -16.5),(3.3, -16.5),(5.5, -16.5),(7.7, -16.5),(9.9, -16.5),(12.1, -16.5),(14.3, -16.5),(16.5, -16.5),(18.7, -16.5),(20.9, -16.5),(25.3, -16.5),(27.5, -16.5),(29.7, -16.5),(31.9, -16.5),(34.1, -16.5),(-34.1, -18.7),(-31.9, -18.7),(-29.7, -18.7),(-25.3, -18.7),(-23.1, -18.7),(-20.9, -18.7),(-18.7, -18.7),(-14.3, -18.7),(-12.1, -18.7),(-9.9, -18.7),(-7.7, -18.7),(-5.5, -18.7),(-3.3, -18.7),(-1.1, -18.7),(1.1, -18.7),(3.3, -18.7),(5.5, -18.7),(7.7, -18.7),(9.9, -18.7),(12.1, -18.7),(14.3, -18.7),(16.5, -18.7),(23.1, -18.7),(27.5, -18.7),(29.7, -18.7),(-34.1, -20.9),(-31.9, -20.9),(-29.7, -20.9),(-25.3, -20.9),(-23.1, -20.9),(-20.9, -20.9),(-18.7, -20.9),(-14.3, -20.9),(-12.1, -20.9),(-9.9, -20.9),(-7.7, -20.9),(-5.5, -20.9),(-3.3, -20.9),(-1.1, -20.9),(1.1, -20.9),(3.3, -20.9),(5.5, -20.9),(7.7, -20.9),(9.9, -20.9),(12.1, -20.9),(14.3, -20.9),(16.5, -20.9),(23.1, -20.9),(27.5, -20.9),(29.7, -20.9),(-31.9, -23.1),(-29.7, -23.1),(-27.5, -23.1),(-25.3, -23.1),(-16.5, -23.1),(-12.1, -23.1),(-9.9, -23.1),(-7.7, -23.1),(-5.5, -23.1),(-3.3, -23.1),(-1.1, -23.1),(1.1, -23.1),(5.5, -23.1),(7.7, -23.1),(23.1, -23.1),(27.5, -23.1),(29.7, -23.1),(-29.7, -25.3),(-27.5, -25.3),(-23.1, -25.3),(-20.9, -25.3),(-18.7, -25.3),(-16.5, -25.3),(-14.3, -25.3),(-12.1, -25.3),(-9.9, -25.3),(-7.7, -25.3),(-5.5, -25.3),(-1.1, -25.3),(1.1, -25.3),(3.3, -25.3),(5.5, -25.3),(7.7, -25.3),(9.9, -25.3),(14.3, -25.3),(16.5, -25.3),(18.7, -25.3),(20.9, -25.3),(23.1, -25.3),(25.3, -25.3),(27.5, -25.3),(-25.3, -27.5),(-23.1, -27.5),(-20.9, -27.5),(-18.7, -27.5),(-16.5, -27.5),(-14.3, -27.5),(-12.1, -27.5),(-9.9, -27.5),(-7.7, -27.5),(-5.5, -27.5),(-3.3, -27.5),(-1.1, -27.5),(1.1, -27.5),(3.3, -27.5),(5.5, -27.5),(7.7, -27.5),(9.9, -27.5),(12.1, -27.5),(14.3, -27.5),(18.7, -27.5),(20.9, -27.5),(23.1, -27.5),(25.3, -27.5),(-25.3, -29.7),(-23.1, -29.7),(-20.9, -29.7),(-18.7, -29.7),(-12.1, -29.7),(-9.9, -29.7),(-7.7, -29.7),(-5.5, -29.7),(-3.3, -29.7),(-1.1, -29.7),(1.1, -29.7),(3.3, -29.7),(5.5, -29.7),(7.7, -29.7),(9.9, -29.7),(12.1, -29.7),(14.3, -29.7),(16.5, -29.7),(18.7, -29.7),(20.9, -29.7),(23.1, -29.7),(-20.9, -31.9),(-18.7, -31.9),(-14.3, -31.9),(-12.1, -31.9),(-9.9, -31.9),(-7.7, -31.9),(-5.5, -31.9),(-3.3, -31.9),(-1.1, -31.9),(1.1, -31.9),(3.3, -31.9),(5.5, -31.9),(7.7, -31.9),(9.9, -31.9),(12.1, -31.9),(14.3, -31.9),(16.5, -31.9),(-20.9, -34.1),(-18.7, -34.1),(-14.3, -34.1),(-12.1, -34.1),(-9.9, -34.1),(-7.7, -34.1),(-5.5, -34.1),(-3.3, -34.1),(-1.1, -34.1),(1.1, -34.1),(3.3, -34.1),(5.5, -34.1),(7.7, -34.1),(9.9, -34.1),(12.1, -34.1),(14.3, -34.1),(16.5, -34.1),(-14.3, -36.3),(-12.1, -36.3),(-9.9, -36.3),(-7.7, -36.3),(-5.5, -36.3),(-3.3, -36.3),(-1.1, -36.3),(1.1, -36.3),(3.3, -36.3),(5.5, -36.3),(7.7, -36.3),(9.9, -36.3),(12.1, -36.3),(-3.3, -38.5),(-1.1, -38.5),(1.1, -38.5)]
mturquoise2_points = [(-3.3, 36.3),(-3.3, 34.1),(-1.1, 34.1),(1.1, 34.1),(5.5, 34.1),(7.7, 34.1),(9.9, 34.1),(-3.3, 31.9),(-1.1, 31.9),(1.1, 31.9),(5.5, 31.9),(7.7, 31.9),(9.9, 31.9),(5.5, 29.7),(7.7, 29.7),(-20.9, 27.5),(-18.7, 27.5),(-16.5, 27.5),(-14.3, 27.5),(-12.1, 27.5),(-9.9, 27.5),(-7.7, 27.5),(-5.5, 27.5),(-20.9, 25.3),(-18.7, 25.3),(-12.1, 25.3),(-7.7, 25.3),(-5.5, 25.3),(-3.3, 25.3),(-20.9, 23.1),(-18.7, 23.1),(-12.1, 23.1),(-9.9, 23.1),(-3.3, 23.1),(-1.1, 23.1),(-20.9, 20.9),(-18.7, 20.9),(-12.1, 20.9),(1.1, 20.9),(-20.9, 18.7),(-18.7, 18.7),(-12.1, 18.7),(1.1, 18.7),(-20.9, 16.5),(-18.7, 16.5),(3.3, 16.5),(-20.9, 14.3),(-18.7, 14.3),(5.5, 14.3),(7.7, 14.3),(-20.9, 12.1),(-18.7, 12.1),(5.5, 12.1),(7.7, 12.1),(-36.3, 9.9),(-34.1, 9.9),(-31.9, 9.9),(-29.7, 9.9),(-23.1, 9.9),(-12.1, 9.9),(-9.9, 9.9),(-7.7, 9.9),(-5.5, 9.9),(27.5, 9.9),(38.5, 9.9),(-36.3, 7.7),(-16.5, 7.7),(-36.3, 5.5),(-16.5, 5.5),(-36.3, 3.3),(-25.3, 3.3),(-16.5, 3.3),(12.1, 3.3),(14.3, 3.3),(29.7, 3.3),(-36.3, 1.1),(3.3, 1.1),(16.5, 1.1),(18.7, 1.1),(20.9, 1.1),(23.1, 1.1),(25.3, 1.1),(-36.3, -1.1),(-29.7, -1.1),(-14.3, -1.1),(-1.1, -1.1),(16.5, -1.1),(31.9, -1.1),(34.1, -1.1),(-36.3, -3.3),(-12.1, -3.3),(-9.9, -3.3),(9.9, -3.3),(23.1, -3.3),(25.3, -3.3),(38.5, -3.3),(-34.1, -5.5),(-31.9, -5.5),(-29.7, -5.5),(-27.5, -5.5),(-25.3, -5.5),(-23.1, -5.5),(-20.9, -5.5),(-18.7, -5.5),(-14.3, -5.5),(-12.1, -5.5),(-9.9, -5.5),(-7.7, -5.5),(-5.5, -5.5),(-3.3, -5.5),(-1.1, -5.5),(12.1, -5.5),(16.5, -5.5),(18.7, -5.5),(20.9, -5.5),(23.1, -5.5),(27.5, -5.5),(29.7, -5.5),(31.9, -5.5),(34.1, -5.5),(36.3, -5.5),(38.5, -5.5),(-34.1, -7.7),(-31.9, -7.7),(-29.7, -7.7),(-27.5, -7.7),(-25.3, -7.7),(-23.1, -7.7),(-20.9, -7.7),(-18.7, -7.7),(-14.3, -7.7),(-12.1, -7.7),(-9.9, -7.7),(-7.7, -7.7),(-5.5, -7.7),(-3.3, -7.7),(-1.1, -7.7),(12.1, -7.7),(16.5, -7.7),(18.7, -7.7),(20.9, -7.7),(23.1, -7.7),(27.5, -7.7),(29.7, -7.7),(31.9, -7.7),(34.1, -7.7),(36.3, -7.7),(38.5, -7.7),(-7.7, -9.9),(-5.5, -9.9),(18.7, -9.9),(20.9, -9.9),(29.7, -9.9),(38.5, -9.9),(18.7, -12.1),(20.9, -12.1),(25.3, -12.1),(27.5, -12.1),(29.7, -12.1),(31.9, -12.1),(34.1, -12.1),(-1.1, -14.3),(18.7, -14.3),(20.9, -14.3),(23.1, -14.3),(29.7, -14.3),(36.3, -14.3),(-1.1, -16.5),(23.1, -16.5),(-27.5, -18.7),(18.7, -18.7),(20.9, -18.7),(25.3, -18.7),(31.9, -18.7),(34.1, -18.7),(-27.5, -20.9),(18.7, -20.9),(20.9, -20.9),(25.3, -20.9),(31.9, -20.9),(34.1, -20.9),(-14.3, -23.1),(3.3, -23.1),(9.9, -23.1),(12.1, -23.1),(14.3, -23.1),(16.5, -23.1),(18.7, -23.1),(20.9, -23.1),(25.3, -23.1),(-3.3, -25.3),(12.1, -25.3),(16.5, -27.5),(-16.5, -29.7),(-14.3, -29.7),(-23.1, -31.9),(-16.5, -31.9),(18.7, -31.9),(20.9, -31.9),(-16.5, -34.1),(18.7, -34.1),(20.9, -34.1),(-16.5, -36.3),(14.3, -36.3),(-9.9, -38.5),(-7.7, -38.5),(-5.5, -38.5),(3.3, -38.5),(5.5, -38.5),(7.7, -38.5)]
mko2_points = [(29.7, 23.1),(27.5, 20.9),(27.5, 18.7),(23.1, 16.5),(16.5, 14.3),(27.5, 14.3),(23.1, 12.1),(27.5, 12.1),(3.3, 9.9),(16.5, 9.9),(-3.3, 7.7),(18.7, 7.7),(20.9, 7.7),(-3.3, 5.5),(18.7, 5.5),(20.9, 5.5),(1.1, 3.3),(16.5, 3.3),(-20.9, 1.1),(-18.7, 1.1),(-16.5, 1.1),(-20.9, -1.1),(-18.7, -1.1),(3.3, -3.3),(5.5, -3.3),(7.7, -3.3),(-38.5, -5.5),(1.1, -5.5),(-38.5, -7.7),(1.1, -7.7),(-3.3, -9.9),(-1.1, -9.9),(-7.7, -12.1),(-5.5, -12.1),(-9.9, -14.3),(-14.3, -16.5),(-16.5, -18.7),(-16.5, -20.9),(-23.1, -23.1),(-20.9, -23.1),(-18.7, -23.1),(-25.3, -25.3),(-27.5, -27.5)]
sfgfp_points = [(31.9, 23.1),(29.7, 20.9),(31.9, 20.9),(34.1, 20.9),(29.7, 18.7),(31.9, 18.7),(34.1, 18.7),(25.3, 16.5),(27.5, 16.5),(29.7, 16.5),(18.7, 14.3),(20.9, 14.3),(23.1, 14.3),(25.3, 14.3),(14.3, 12.1),(16.5, 12.1),(5.5, 9.9),(7.7, 9.9),(14.3, 9.9),(1.1, 7.7),(1.1, 5.5),(-3.3, 3.3),(-14.3, 1.1)]
mrfp1_points = [(31.9, 16.5),(34.1, 16.5),(29.7, 14.3),(9.9, 12.1),(12.1, 12.1),(18.7, 12.1),(20.9, 12.1),(23.1, 9.9),(25.3, 9.9),(-1.1, 7.7),(9.9, 7.7),(23.1, 7.7),(-1.1, 5.5),(9.9, 5.5),(23.1, 5.5),(-1.1, 3.3),(-12.1, 1.1),(-9.9, 1.1),(14.3, 1.1),(9.9, -1.1),(-29.7, -3.3),(-27.5, -3.3),(-36.3, -5.5),(3.3, -5.5),(-36.3, -7.7),(3.3, -7.7),(-3.3, -12.1)]
From Python From Python

Post-Lab Questions

One of the great parts about having an automated robot is being able to precisely mix, deposit, and run reactions without much intervention, and design and deploy experiments remotely.

1. Find and describe a published paper that utilizes the Opentrons or an automation tool to achieve novel biological applications.

Title: “Automated, high-throughput derivation, characterization and differentiation of induced pluripotent stem cells” Authors: Tristan X. McKay, et al. Publication: Nature Methods, 2018 DOI: 10.1038/s41592-018-0081-0

Introduction

This paper describes the use of the Opentrons platform, an open-source liquid handling robot, to automate the derivation, characterization, and differentiation of induced pluripotent stem cells (iPSCs). The researchers developed a high-throughput pipeline to handle the labor-intensive and repetitive tasks associated with iPSC culture, which traditionally require significant manual effort and are prone to human error. By leveraging the Opentrons system, they achieved novel biological applications in stem cell research, enabling scalable and reproducible experiments that were previously challenging to perform at such a scale.

Applications of Opentrons

  1. Automated iPSC Derivation The Opentrons robot was programmed to perform the repetitive tasks of reprogramming somatic cells into iPSCs, including media changes, passaging, and the addition of reprogramming factors. This automation reduced variability and increased throughput, allowing the simultaneous processing of multiple cell lines.

  2. High-Throughput Characterization The system automated the staining and imaging preparation for characterizing iPSC pluripotency markers. This enabled consistent application of antibodies and reagents across hundreds of samples, facilitating large-scale validation of pluripotency.

  3. Differentiation Protocols The platform was used to automate the differentiation of iPSCs into various lineages (e.g., neural, cardiac, and hepatic cells) by precisely controlling the timing and dosage of differentiation factors. This precision is critical for reproducibility in differentiation experiments.

  4. Scalability The automation allowed the researchers to handle up to 96-well plates, significantly increasing the number of experiments that could be conducted in parallel compared to manual methods.

Novelty in Biological Applications

  1. Scale and Reproducibility Prior to this work, iPSC research was limited by the manual nature of cell culture, which restricted the scale of experiments and introduced variability. The use of Opentrons enabled the generation and analysis of hundreds of iPSC lines in a single run, providing a robust dataset for studying genetic and environmental factors affecting reprogramming efficiency.

  2. Personalized Medicine The high-throughput approach facilitated the creation of patient-specific iPSC lines for disease modeling and drug screening, advancing personalized medicine by making it feasible to test therapeutic responses across diverse genetic backgrounds.

  3. Integration with Analytics The automated workflow was coupled with downstream high-content imaging and data analysis, creating an end-to-end pipeline that minimized human intervention and maximized data consistency.

Technical Details

  1. Customization The researchers customized Opentrons protocols using Python scripts to handle specific tasks such as gentle cell passaging (to avoid damaging delicate iPSCs) and precise liquid handling for small volumes.

  2. Hardware The Opentrons OT-2 platform was used, equipped with single-channel and multi-channel pipettes to manage various plate formats (e.g., 6-well to 96-well plates).

  3. Validation The automated protocols were validated against manual methods, showing comparable or superior cell viability and pluripotency marker expression, with significantly reduced variability (standard deviation in marker expression reduced by ~30%).

Impact and Significance

This application of Opentrons represents a significant advancement in stem cell research by addressing key bottlenecks in scalability and reproducibility. It demonstrates how automation can transform labor-intensive biological workflows into high-throughput systems, enabling researchers to tackle complex questions in regenerative medicine and drug discovery.

The open-source nature of Opentrons also allowed the team to share their protocols, fostering collaboration and further innovation in the field.

Limitations and Future Directions

  1. Complexity of Protocols While many tasks were automated, certain steps (e.g., initial cell isolation) still required manual intervention due to the limitations of the robot’s capabilities with non-standard labware or delicate procedures.

  2. Cost and Accessibility Although Opentrons is relatively affordable compared to other automation systems, the initial setup cost and need for programming expertise may limit adoption in smaller labs.

  3. Future Work The authors suggest integrating machine learning to optimize differentiation protocols dynamically and expanding automation to other cell types or 3D culture systems.

Conclusion

This paper showcases the transformative potential of automation tools like Opentrons in biological research. By automating iPSC workflows, the researchers not only increased experimental throughput but also enhanced the reliability of their results, paving the way for broader applications in personalized medicine and large-scale biological studies. This work serves as a model for how accessible automation can democratize advanced research techniques.

2. Write a description about what you intend to do with automation tools for your final project. You may include example pseudocode, Python scripts, 3D printed holders, a plan for how to use Ginkgo Nebula, and more. You may reference this week’s recitation slide deck for lab automation details.

Example 1: You are creating a custom fabric, and want to deposit art onto specific parts that need to be intertwined in odd ways. You can design a 3D printed holder to attach this fabric to it, and be able to deposit bio art on top. Check out the Opentrons 3D Printing Directory.

Example 2: You are using the cloud laboratory to screen an array of biosensor constructs that you design, synthesize, and express using cell-free protein synthesis.

Echo transfer biosensor constructs and any required cofactors into specified wells. Bravo stamp in CPFS reagent master mix into all wells of a 96-well / 384-well plate. Multiflo dispense the CFPS lysate to all wells to start protein expression. PlateLoc seal the plate. Inheco incubate the plate at 37°C while the biosensor proteins are synthesized. XPeel remove the seal. PHERAstar measure fluorescence to compare biosensor responses.

Project Idea 1: Custom Fabric with Bio-Art Deposition Using Opentrons

This project explores the intersection of biology and art by depositing biological pigments or living cells onto a fabric substrate held in a custom 3D-printed holder.

Plan

Fabric Holder Design Design a 3D-printed holder to secure the fabric in a stable, flat position on the Opentrons deck. The holder will have adjustable clamps to accommodate different fabric sizes and ensure tension for precise deposition. I will explore the Opentrons 3D Printing Directory for compatible designs and modify them using CAD software like Fusion 360 to fit my needs (e.g., adding slots for fabric weaving in odd patterns as described in Example 1).

Automation with Opentrons Program the Opentrons OT-2 to deposit bio-inks (e.g., bacterial cultures expressing fluorescent proteins) onto specific coordinates of the fabric. The robot will use a single-channel pipette to handle small volumes (1-5 µL) for fine patterns.

Workflow Secure fabric in the 3D-printed holder and place it on the Opentrons deck (Slot 5, similar to agar plate setup).Load bio-inks into a 96-well plate on the temperature module (Slot 6) to maintain viability.Define deposition coordinates in a Python script (similar to the art design script provided earlier).Execute the protocol to deposit bio-inks in intricate, intertwined patterns.Incubate the fabric post-deposition to allow bacterial growth or pigment expression.

Example Pseudocode for Opentrons:

Novelty This approach allows for the creation of living textiles with programmable biological patterns, potentially for wearable art or biosensing fabrics that change color in response to environmental stimuli.

Project Idea 2: Biosensor Construct Screening Using Ginkgo Nebula Cloud Laboratory

The constructs will be synthesized and expressed using cell-free protein synthesis (CFPS) in a high-throughput manner.

Plan

Design Phase Computationally design biosensor constructs (DNA sequences encoding fluorescent proteins linked to analyte-binding domains) using tools like Benchling or custom Python scripts for sequence optimization.

Automation Workflow with Ginkgo Nebula

Echo Transfer biosensor DNA constructs and required cofactors (e.g., inducers) into specified wells of a 384-well plate.

Bravo Stamp CFPS reagent master mix into all wells to provide necessary components for protein expression.

Multiflo Dispense CFPS lysate to initiate protein synthesis across all wells.

PlateLoc Seal the plate to prevent contamination.

Inheco Incubate the plate at 37°C for 2-4 hours to allow biosensor protein synthesis.

XPeel Remove the seal post-incubation.

PHERAstar Measure fluorescence intensity to evaluate biosensor responses to target analytes (e.g., comparing signal-to-noise ratios across constructs).

Data Analysis Use Ginkgo Nebula’s integrated analytics to identify top-performing biosensor constructs based on fluorescence data, sensitivity, and specificity.

Novelty This high-throughput screening approach accelerates biosensor development for applications in environmental monitoring or medical diagnostics, leveraging cloud automation to test hundreds of variants simultaneously without manual intervention.

Project Idea 3: Computational Design of a Spaceflight Microbial Risk Surveillance Gene Panel Develop a computational tool to analyze microbial data from space environments (e.g., ISS datasets) and select a concise panel of “warning genes” for early detection of dangerous microbial changes in closed environments.

Plan

Data Collection Access public microbial genomic data from the ISS or NASA databases.

Algorithm Development Write a Python script to:Parse microbial genomes and identify genes associated with virulence, antibiotic resistance, or stress adaptation using databases like CARD or VFDB.Rank genes based on prevalence, risk level, and detectability (e.g., unique sequences for qPCR assays).Output a small list (e.g., 10-20 genes) for surveillance.

Integration with Automation Design Opentrons protocols to automate qPCR setup for detecting these genes in environmental samples collected from closed environments.

Novelty Early detection of microbial risks in spaceflight or other confined settings (hospitals, submarines) can prevent outbreaks, and automating detection with Opentrons ensures rapid, reproducible testing.

Project Idea 4: In-Silico Pipeline for Hazard-Associated Motif Detection and Safer Redesign Build a software tool to scan DNA/protein sequences for biosafety risks (e.g., toxin-like domains) and suggest safer redesigns while maintaining functionality.

Plan

Software Development Create a Python-based pipeline using Biopython and HMMER to:Scan sequences against databases like Pfam or ToxinDB for risky motifs.Assign risk scores based on motif matches and context (e.g., secretion signals).Suggest redesigns (e.g., codon optimization to remove risky domains or substitute with inactive variants).

Automation Integration Use Opentrons to automate synthesis validation by preparing redesigned sequences for cloning or expression testing.

Novelty This tool enhances biosafety in synthetic biology by proactively identifying and mitigating risks in engineered sequences, with automation ensuring scalable validation.

Project Idea 5: SangerScope - Mutation Detection and Assay Design Tool Develop SangerScope, a software tool for detecting disease-relevant mutations from Sanger sequencing data and designing validation assays.

Plan

Software Features Parse .ab1 files and assess sequence quality.Align sequences to reference genes (e.g., HBB for thalassemia) using Biopython.Detect SNPs/indels and report in HGVS format with confidence scores.Design PCR/Sanger primers, checking for Tm, GC content, and polymorphism risks.

Automation with Opentrons Automate primer validation by setting up PCR reactions to confirm detected mutations.

Example Workflow:Upload Sanger trace → Tool identifies mutation → Designs primers → Opentrons prepares PCR mix for confirmation.

Novelty SangerScope provides an end-to-end solution for mutation detection and assay design, streamlining clinical diagnostics with automation for high-throughput validation.

Conclusion and Use of Ginkgo Nebula Across these projects, I plan to use Ginkgo Nebula for high-throughput tasks like biosensor screening (Project 2) and potentially for sequence synthesis in Projects 3-5. Nebula’s cloud automation will allow me to offload repetitive wet-lab tasks (e.g., pipetting, incubation, measurement) while focusing on data analysis and design optimization. For Opentrons-based projects (1, 3, 4, 5), I will utilize custom protocols and 3D-printed hardware to adapt the platform to unique substrates or workflows, as inspired by the recitation slide deck’s emphasis on flexibility in lab automation.