Homework

Weekly homework submissions:

  • Week 1 HW.1: Class assignment

    1. Describe an application Identify a biological engineering tool or application you wish to develop and explain your motivation. I would like to develop a way to make plants grow 100x faster. I find this a very interesting and ambitious question. Perhaps you reverse-engineer the genome, morphological development and constraints, proteins/enzymes/catalysts for growth. Perhaps you design a separate organism (two bacterium?) which produces biomass - a combination of a carbon sequester and a cellulose printer. Perhaps you attempt to design a minimal artificial cell, like a Xenobot / JCVI minimal cells - using new AI design software, you create a minimal genome/DNA, design your own morphological topology through simulation, which is compiled down to gene regulatory networks (GRN’s), transcription factors/thresholds, and DNA.
  • Week 1 HW.2: Lecture prep for W2

    Answer prep questions from three faculty members: Homework Questions from Professor Jacobson: Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy? Error rate refers to errors per nucleotide added per replication. An error could be a misincorporation (wrong base expressed for a pair), for example.

  • Week 1 HW.3: Setup your website

    CHECK IT OUT https://pages.htgaa.org/2026a/liam-edwards-playne/

  • Week 2 HW.1: Benchling & In-silico Gel Art

    Make a free account at benchling.com, Import the Lambda DNA. Simulate Restriction Enzyme Digestion with the following Enzymes: EcoRI HindIII BamHI KpnI EcoRV SacI SalI Create a pattern/image in the style of Paul Vanouse’s Latent Figure Protocol artworks. Benchling screenshots. Experimental design for Gel art.

  • Week 2 HW.2: Gel Art - Restriction Digests and Gel Electrophoresis

    In the wet-lab perform the lab experiment you designed in Part 1 and outlined in this week’s lab protocol “Gel Art: Restriction Digests and Gel Electrophoresis”. N/A - no access to BioClub Tokyo Lab.

  • Week 2 HW.3: DNA Design Challenge

    3.1. Choose your protein. In recitation, we discussed that you will pick a protein for your homework that you find interesting. Which protein have you chosen and why? Using one of the tools described in recitation (NCBI, UniProt, google), obtain the protein sequence for the protein you chose. Miraculin - https://rest.uniprot.org/uniprotkb/P13087.fasta https://rest.uniprot.org/uniprotkb/P13087.txt >sp|P13087|MIRA_SYNDU Miraculin OS=Synsepalum dulcificum OX=3743 PE=1 SV=3 MKELTMLSLSFFFVSALLAAAANPLLSAADSAPNPVLDIDGEKLRTGTNYYIVPVLRDHG GGLTVSATTPNGTFVCPPRVVQTRKEVDHDRPLAFFPENPKEDVVRVSTDLNINFSAFMP CRWTSSTVWRLDKYDESTGQYFVTIGGVKGNPGPETISSWFKIEEFCGSGFYKLVFCPTV CGSCKVKCGDVGIYIDQKGRRRLALSDKPFAFEFNKTVYF 3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence. Using https://www.bioinformatics.org/sms2/rev_trans.html:

  • Week 2 HW.4: Twist DNA Synthesis Order

    Steps to build a plasmid: Import DNA into Benchling. Add promoter, RBS, start/stop codons, 7x His Tag, and terminator Export .fasta and import into Twist. Order Twist clonal gene, using pTwist Amp High Copy vector. Export .gb (genbank) file for plasmid. Import plasmid .gb file into Benchling, open Info>Toplogy and set Circular.

  • Week 2 HW.5: DNA Read/Write/Edit

    DNA Read (i) What DNA would you want to sequence (e.g., read) and why? This could be DNA related to human health (e.g. genes related to disease research), environmental monitoring (e.g., sewage waste water, biodiversity analysis), and beyond (e.g. DNA data storage, biobank). No idea. Possibly my basil plant.

  • Week 3 HW.1: Python Script for Opentrons Artwork

    Review recitation materials and lab documentation. Design artwork using the GUI at opentrons-art.rcdonovan.com. Write a Python script using coordinates from the GUI via the “HTGAA26 Opentrons Colab”. Sign up for a robot time slot and run the script on the Opentrons robot. Submit Python file via provided form. Artwork Design Python Script

  • Week 3 HW.2: Post-Lab Reflection

    2.1. Find and describe a published paper utilizing Opentrons or similar liquid handling automation tools. The paper I have found: Slowpoke: An Automated Golden Gate Cloning Workflow for Opentrons OT‑2 and Flex Slowpoke is a tool which generates Opentron protocols for DNA assembly. DNA assembly is used to assemble larger strings of DNA than can be synthesised in one go, by joining together oligonucleotides.

  • Week 3 HW.3: Final Project Ideas

    Submit 1–3 slides with three individual project concept ideas.

  • Week 4 HW

    Part A. Conceptual Questions Answer any NINE of the following questions from Shuguang Zhang:

  1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons) Meat is roughly 20% protein by weight, so 0.2*500g=100g of protein. This is the only amino acid in meat, as carbs are sugars, and fats are triglycerides (fatty acids + glycerol).
  • Week 5 HW

    Part A. SOD1 Binder Peptide Design Superoxide dismutase 1 (SOD1) is a cytosolic antioxidant enzyme that converts superoxide radicals into hydrogen peroxide and oxygen. In its native state, it forms a stable homodimer and binds copper and zinc. Mutations in SOD1 cause familial Amyotrophic Lateral Sclerosis (ALS). Among them, the A4V mutation (Alanine → Valine at residue 4) leads to one of the most aggressive forms of the disease. The mutation subtly destabilizes the N-terminus, perturbs folding energetics, and promotes toxic aggregation.

  • Week 6 HW

    What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose? A Phusion HF PCR Master Mix is a pre-combined PCR reaction system optimised for a specific engineered DNA polymerase. Phusion DNA polymerase — provides the catalytic activity which synthesises DNA, and includes a 3’→5’ exonuclease proofreading to reduce error Reaction buffer MgCl₂ — magnesium ions dNTPs — deoxynucelotide triphosphates: dATP, dCTP, dGTP, and dTTP Stabilizers/additives Water A typical setup only requires after adding:

Subsections of Homework

Week 1 HW.1: Class assignment

1. Describe an application

Identify a biological engineering tool or application you wish to develop and explain your motivation.

I would like to develop a way to make plants grow 100x faster. I find this a very interesting and ambitious question. Perhaps you reverse-engineer the genome, morphological development and constraints, proteins/enzymes/catalysts for growth. Perhaps you design a separate organism (two bacterium?) which produces biomass - a combination of a carbon sequester and a cellulose printer. Perhaps you attempt to design a minimal artificial cell, like a Xenobot / JCVI minimal cells - using new AI design software, you create a minimal genome/DNA, design your own morphological topology through simulation, which is compiled down to gene regulatory networks (GRN’s), transcription factors/thresholds, and DNA.

Why? Because trees and plants are great. They are calming, they look beautiful, they are functionally useful. Originally I wanted to build my own house, and was wondering - why is wood so expensive? If we could grow wood more quickly and effectively, that would be useful. It would also be fun to rapidly green certain areas of the world to produce arable land - the Australian desert, for example.

2. Establish governance goals

Describe one or more governance/policy goals related to ensuring that this application or tool contributes to an “ethical” future, like ensuring non-malfeasance (preventing harm). Break big goals down into two or more specific sub-goals.

  • Enhance biosecurity (prevent misuse and uncontrolled spread)

    • Prevent incidents

      • Restrict access to engineered strains, protocols, and enabling tools
      • Use genetic containment (kill-switches, auxotrophy, sterility)
      • Avoid traits that increase invasiveness or persistence outside intended settings
    • Help respond

      • Establish monitoring and reporting systems for unexpected dissemination
      • Maintain traceability (registries, audit logs, chain-of-custody)
  • Foster lab safety (reduce accidents during development)

    • Prevent incidents

      • Standard biosafety training and conservative organism/chassis selection
      • Physical containment and phased testing (lab → greenhouse → controlled trials)
      • Explicit evaluation of failure modes in growth and developmental pathways
    • Help respond

      • Clear spill/escape response protocols and emergency shutdown procedures
      • Regular safety reviews and independent oversight
  • Protect the environment (minimize ecological externalities)

    • Prevent incidents

      • Ecological risk assessment: gene flow, non-target effects, ecosystem disruption
      • Prohibit open release until long-term impacts are understood
      • Prefer reversible or self-limiting designs over permanent alterations
    • Help respond

      • Post-deployment surveillance and remediation plans
      • Defined liability and responsibility for environmental harms
  • Equity, autonomy, and constructive use (ensure benefits are fairly distributed)

    • Minimizing burdens to stakeholders

      • Community consultation for land-use and deployment decisions
      • Avoid shifting risks onto local ecosystems or vulnerable populations
    • Feasibility without blocking research

      • Clear regulatory pathways that enable safe experimentation
      • Transparency and documentation to support responsible scaling
    • Promote beneficial applications

      • Prioritize reforestation, sustainable materials, and climate-positive outcomes
      • Discourage purely extractive or destabilizing commercial deployment

3. Design governance actions

Describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”)

  1. Purpose: What is done now and what changes are you proposing?
  2. Design: What is needed to make it “work”? (including the actor(s) involved - who must opt-in, fund, approve, or implement, etc)
  3. Assumptions: What could you have wrong (incorrect assumptions, uncertainties)?
  4. Risks of Failure & “Success”: How might this fail, including any unintended consequences of the “success” of your proposed actions?
  1. Containment-by-design + staged release
  • Actors: Institutional Biosafety Committees (IBC), national GMO regulators (e.g., OGTR/USDA), lab leads, funders
  • Design: engineered sterility/kill-switches, greenhouse-only trials, stepwise permits before field testing
  • Assumptions: containment works reliably; lab phenotypes predict outdoor behavior
  • Risks: safeguard failure, gene flow, invasive advantage, unexpected ecosystem effects
  1. Access control + biosecurity screening
  • Actors: DNA synthesis firms, biosecurity agencies, research institutions, grant/journal oversight
  • Design: sequence screening, restricted strain distribution, dual-use review processes
  • Assumptions: misuse is limited by controlling access to key materials/information
  • Risks: leakage, uneven enforcement globally, slowing benign research
  1. Environmental monitoring + liability framework
  • Actors: environmental agencies, local governments/landholders, independent ecologists, insurers/courts
  • Design: required impact studies, long-term surveillance, clear remediation liability
  • Assumptions: harms are detectable early and manageable with monitoring
  • Risks: underfunded surveillance, delayed ecological damage, liability discouraging deployment

4. Score against rubric

Evaluate each action against objectives including:

  • Biosecurity enhancement
  • Lab safety
  • Environmental protection
  • Cost/burden minimization
  • Feasibility and research impact
Does the option:Option 1Option 2Option 3
Enhance Biosecurity332
• By preventing incidents332
• By helping respond223
Foster Lab Safety321
• By preventing incident321
• By helping respond222
Protect the environment323
• By preventing incidents322
• By helping respond213
Other considerations
• Minimizing costs and burdens to stakeholders221
• Feasibility?231
• Not impede research121
• Promote constructive applications323

5. Prioritize options

Last, drawing upon this scoring, describe which governance option, or combination of options, you would prioritize, and why. Outline any trade-offs you considered as well as assumptions and uncertainties.

For this, you can choose one or more relevant audiences for your recommendation, which could range from the very local (e.g. to MIT leadership or Cambridge Mayoral Office) to the national (e.g. to President Biden or the head of a Federal Agency) to the international (e.g. to the United Nations Office of the Secretary-General, or the leadership of a multinational firm or industry consortia). These could also be one of the “actor” groups in your matrix.

I would prioritise Containment-by-design + staged release. Given that there is immense uncertainty in how this project could be achieved, it is a waste of resources to consider other governance actions for now. Rapid iteration to reduce uncertainty is the path towards achievement. As part of this - a scalable safety protocol throughout this process facilitates rapid experimentation without risk of ruin, until the project can achieve milestones necessary for unlocking funding and revenue.

Week 1 HW.2: Lecture prep for W2

Answer prep questions from three faculty members:

Homework Questions from Professor Jacobson:

Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?

Error rate refers to errors per nucleotide added per replication. An error could be a misincorporation (wrong base expressed for a pair), for example.

Error rate of polymerase synthesis is 1/1e7 (1:10^7).

Human genome has 3.1-3.2 Gbp or 3e9 base pairs.

The rate of errors in polymerase copying the human genome’s DNA is 1/1e7 * 3e9, which is nonzero.

Biology deals with the likely error through multiple levels of mitigation:

  • Proofreading during synthesis corrects errors
  • Mismatch repair after synthesis repairs errors
  • Redundancy and selection at multiple levels - DNA is double-stranded, cells exist in huge populations, misfolded proteins get degraded, defective RNAs are destroyed, faulty cells undergo apoptosis
  • Damage repair system

How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?

Our assumptions:

  • Average Human Protein: 1036 bp.
  • ~30,000 proteins observed in mammalian genome.
  • A protein of length L = 3L nucleotides (bases) + a stop codon in the genome

Coding is the process by which DNA is transcribed into mRNA (triplets / codons), and mRNA (codons) is translated into a linear chain of amino acids (polypeptides), which folds into 3D protein structures.

How many different ways are there to code for an average human protein, meaning how many different DNA encodings would compile (transcribe and translate) down to the same protein (chain of amino acids) of length 1036 bp?

alt text alt text

Codons are 3 nucleotides, each which have a base (A,C,G,T). There are 64 possible triplet combinations (codons) using the four bases (A, U, G, C). Each codon encodes one amino acid. An amino acid can be encoded by multiple codons. For instance, codons GAA and GAG both specify glutamic acid and exhibit redundancy. This is referred to as degeneracy.

The degeneracy of an amino acid refers to the number of codons which encode it. ie. d(Leu)=6, meaning Leucine has 6 codons which encode it.

Average codon degeneracy across amino acids is roughly 3.

So to calculate the number of possible encodings for a protein of length L=5 amino acids, we compute the degeneracy of each amino acid, and compute their product to find the maximum number of permutations. ie. for a protein of L=5, average degeneracy d(*)=3, num_permutations=d(*) * d(*) * d(*) * d(*) * d(*) = d(*)^L = 3^L

So for an average human protein of L=1036 bp, the number of possible encodings could be 3^L = 3^1036.

There is an intractable number of possible encodings. However, functional “good” encodings are a tiny subset constrained by expression, folding, RNA processing, regulation, and host biology.


Homework Questions from Dr. LeProust:

What’s the most commonly used method for oligo synthesis currently?

solid-phase chemical synthesis with phosphoramidite chemistry

Why is it difficult to make oligos longer than 200nt via direct synthesis?

Because direct phosphoramidite synthesis has a per-step yield <1.0, errors compound exponentially with length. P(success)=(1-e)^200 is improbable (e ~= 0.01)

Why can’t you make a 2000bp gene via direct oligo synthesis?

(1-e)^2000 is near impossible, due to errors accumulating from each synthetic cycle/step.

  • expected number of cleavage events scales ~linearly with cycle count and purine content
  • Misincorporations accumulate (wrong base addition)

Homework Question from George Church:

Choose ONE of the following three questions to answer; and please cite AI prompts or paper citations used, if any.

[Using Google & Prof. Church’s slide #4] What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?

[Given slides #2 & 4 (AA:NA and NA:NA codes)] What code would you suggest for AA:AA interactions?

[(Advanced students)] Given the one paragraph abstracts for these real 2026 grant programs sketch a response to one of them or devise one of your own:

https://arpa-h.gov/explore-funding/programs/boss

https://www.darpa.mil/research/programs/smart-rbc

https://www.darpa.mil/research/programs/go

What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?

Histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, valine (+ arginine conditional).

Out of the 20 amino acids needed, the body synthesizes 11-12, while the remaining 8-9, known as essential amino acids, must be obtained through diet.

This is not accurate to all animals, it seems? Counterexample: cats. Cats require taurine.

The Lysine Contingency was a genetic alteration Henry Wu performed in the dinosaur genome. The modification knocked out the ability of the dinosaurs to produce the amino acid Lysine.

This forced the dinosaurs to depend on lysine supplements provided by the park’s veterinary staff. In this way, dinosaurs could never escape from the park because they would never survive long without the food supplements.

Haha, I have to rewatch this film.

The way I would hack around this would be to introduce a substance containing the microbes that cows digest and feed it to the dinosaurs. These microbes synthesise the essential amino acids from nitrogen, thus mitigating the need for the dinosaurs to produce Lysine themselves, instead forming a symbiotic relationship with the microbes in their gut.

I don’t know what this question means, but it reminds me also of Liebig’s law - would the restriction of one amino acid necessarily debilitate the dinosaurs so they can’t escape, or is nature more nonlinear and complex than that?

LLM prompts used:

  • 10 essential amino acids in all animals?
  • across all animals?
  • cows can synthesise most of their needed amino acids? how many which ones
  • how long can you survive without just one of the amnio acids ?

Week 2 HW.1: Benchling & In-silico Gel Art

  • Make a free account at benchling.com, Import the Lambda DNA.
  • Simulate Restriction Enzyme Digestion with the following Enzymes: EcoRI HindIII BamHI KpnI EcoRV SacI SalI
  • Create a pattern/image in the style of Paul Vanouse’s Latent Figure Protocol artworks.

Benchling screenshots.

benchling screenshot benchling screenshot

Experimental design for Gel art.

Week 2 HW.2: Gel Art - Restriction Digests and Gel Electrophoresis

In the wet-lab perform the lab experiment you designed in Part 1 and outlined in this week’s lab protocol “Gel Art: Restriction Digests and Gel Electrophoresis”.

N/A - no access to BioClub Tokyo Lab.

Week 2 HW.3: DNA Design Challenge

3.1. Choose your protein.

In recitation, we discussed that you will pick a protein for your homework that you find interesting. Which protein have you chosen and why? Using one of the tools described in recitation (NCBI, UniProt, google), obtain the protein sequence for the protein you chose.

Miraculin - https://rest.uniprot.org/uniprotkb/P13087.fasta https://rest.uniprot.org/uniprotkb/P13087.txt

>sp|P13087|MIRA_SYNDU Miraculin OS=Synsepalum dulcificum OX=3743 PE=1 SV=3
MKELTMLSLSFFFVSALLAAAANPLLSAADSAPNPVLDIDGEKLRTGTNYYIVPVLRDHG
GGLTVSATTPNGTFVCPPRVVQTRKEVDHDRPLAFFPENPKEDVVRVSTDLNINFSAFMP
CRWTSSTVWRLDKYDESTGQYFVTIGGVKGNPGPETISSWFKIEEFCGSGFYKLVFCPTV
CGSCKVKCGDVGIYIDQKGRRRLALSDKPFAFEFNKTVYF

3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.

Using https://www.bioinformatics.org/sms2/rev_trans.html:

atgaaagaactgaccatgctgagcctgagctttttttttgtgagcgcgctgctggcggcg
gcggcgaacccgctgctgagcgcggcggatagcgcgccgaacccggtgctggatattgat
ggcgaaaaactgcgcaccggcaccaactattatattgtgccggtgctgcgcgatcatggc
ggcggcctgaccgtgagcgcgaccaccccgaacggcacctttgtgtgcccgccgcgcgtg
gtgcagacccgcaaagaagtggatcatgatcgcccgctggcgttttttccggaaaacccg
aaagaagatgtggtgcgcgtgagcaccgatctgaacattaactttagcgcgtttatgccg
tgccgctggaccagcagcaccgtgtggcgcctggataaatatgatgaaagcaccggccag
tattttgtgaccattggcggcgtgaaaggcaacccgggcccggaaaccattagcagctgg
tttaaaattgaagaattttgcggcagcggcttttataaactggtgttttgcccgaccgtg
tgcggcagctgcaaagtgaaatgcggcgatgtgggcatttatattgatcagaaaggccgc
cgccgcctggcgctgagcgataaaccgtttgcgtttgaatttaacaaaaccgtgtatttt

3.3. Codon optimization.

Describe why you need to optimize codon usage. Which organism have you chosen to optimize the codon sequence for and why?

Proteins are translated from mRNA by tRNA’s. The tRNA’s “pair” with codons from the mRNA. A codon is a 3-base sequence which is then mapped onto a single amino acid. As we covered last week, there are 64 different codons (permutations of a string of 3 nuceleotide bases) which map down to only 20 amino acids. The degeneracy means we can swap out parts of the DNA/mRNA to express the same amino acids aka proteins. Why would we do this? Because mRNA codons are translated into amino acids by the available tRNA in the organism. Each tRNA matches a codon (or several synonymous codons, see wobble pairing at 3rd base). There is not a uniform concentration of tRNA for all codons. So some mRNA codons will translate more efficiently than others, because there is more tRNA.

To restate:

  1. DNA encodes triplet codons.
  2. mRNA is transcribed from DNA.
  3. Ribosomes read mRNA in triplets.
  4. tRNAs carrying amino acids base-pair with codons (binding with the tRNA’s complementary anticodon)
  5. Translation rate is approximately proportional to local charged tRNA abundance and ribosomal processivity.

Multiple codons encode the same amino acid, yet different organisms use these synonymous codons at different frequencies (codon usage bias). If a gene from organism A is expressed in organism B without modification, the codon distribution may not match the tRNA pool of B.

You need to optimize codon usage in order to achieve (good) yields from your biomanufacturing process.

I choose Escherichia coli (E. coli) as the target host for optimization:

  • Takes less time
    • Cell division is faster
  • Well established protocols to isolate plasmid
    • Each cell has single chromosome
    • Single circular plasmid
    • Each replicated cell has exact copy of DNA
  • Easy method

3.4. You have a sequence! Now what?

What technologies could be used to produce this protein from your DNA? Describe in your words how the DNA sequence can be transcribed and translated into your protein. You may describe either cell-dependent or cell-free methods, or both.

Recombinant expression in a host organism like E. Coli.

  1. Clone the coding sequence into an expression vector (a plasmid).
    • Promoter - T7 under lac control: binds the RNA polymerase
    • Ribosome binding site - Shine–Dalgarno AGGAGG: recruits ribosome
    • Coding sequence - see Miraculin DNA sequence above.
    • Terminator - hairpin-forming sequence: stops transcription
    • Antibiotic resistance gene - ampR: for selection of culture
  2. Transform into E. Coli (transform the plasmid into host cells.)
    1. Bacteria are given a heat shock.
    2. Colonies grow.
    3. Pick colonies.
      1. Plate on ampicillin → only plasmid-containing cells survive.
    4. Inoculate the liquid cultures (by introducing single colonies)
  3. Induce expression (e.g., add IPTG if T7/lac system).
    • T7 RNA polymerase binds promoter
    • DNA is transcribed into mRNA
    • Ribosome binds RBS on mRNA.
    • tRNA translates into protein, stop at terminator.
      • tRNAs decode codons
      • Amino acids polymerize into polypeptide
  4. Harvest. Cells are lysed. Protein is purified.
    1. Lyse cells (sonication or chemical lysis).
    2. Purify protein (e.g., His-tag + Ni-NTA affinity column).

Apparently E. coli is possible but non-ideal for a cysteine-rich, glycosylated plant secreted protein like miraculin.

3.5. [Optional] How does it work in nature/biological systems?

Describe how a single gene codes for multiple proteins at the transcriptional level. Try aligning the DNA sequence, the transcribed RNA, and also the resulting translated Protein!!! See example below.

Week 2 HW.4: Twist DNA Synthesis Order

Steps to build a plasmid:

  • Import DNA into Benchling.
  • Add promoter, RBS, start/stop codons, 7x His Tag, and terminator
  • Export .fasta and import into Twist.
  • Order Twist clonal gene, using pTwist Amp High Copy vector.
  • Export .gb (genbank) file for plasmid.
  • Import plasmid .gb file into Benchling, open Info>Toplogy and set Circular.

Week 2 HW.5: DNA Read/Write/Edit

DNA Read

No idea. Possibly my basil plant.

(ii) In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why?

I would use long-read sequencing (1–100+ kb). Even though it is more expensive, it would provide greater accuracy.

The way that DNA sequencing works currenly is by taking DNA, lysing it, and then reassembling fragments based on probabilistic approaches. The “read length” refers to how large these fragments are in terms of base pairs. A fragment of length = 1 bp would be near useless, since there is no way to “place” it probabilistically within the greater genome. A fragment of length = 150bp map well because apparently the human genome is largely non-repetitive at that scale.

Short-read sequencing is a read of 50–600 bp. Long-read sequencing is 1-100 kb.

Technologies:

  • Polymerase-based sequencing
  • Enzymatic digest sequencing
  • Nanopore sequencing
  • DNA microarrays

DNA Write

(i) What DNA would you want to synthesize (e.g., write) and why?

I have no idea.

(ii) What technology or technologies would you use to perform this DNA synthesis and why?

  • Recombinant DNA synthesis
  • Oligonucleotide synthesis - can make complex motifs, extremely large DNA molecules (1kbp+)

DNA Edit

(i) What DNA would you want to edit and why?

I have no idea. Potentially plant DNA. I don’t know anything about what DNA plants have. I would like to figure out how to increase the growth speed, change the bark texture. Or even doing experiments on yeast. Perhaps I could figure out the enzymes/proteins and what DNA/genes code for it, and then edit that.

(ii) What technology or technologies would you use to perform these DNA edits and why?

CRISPR-Cas9

Week 3 HW.1: Python Script for Opentrons Artwork

  • Review recitation materials and lab documentation.
  • Design artwork using the GUI at opentrons-art.rcdonovan.com.
  • Write a Python script using coordinates from the GUI via the “HTGAA26 Opentrons Colab”.
  • Sign up for a robot time slot and run the script on the Opentrons robot.
  • Submit Python file via provided form.

Artwork Design

Python Script

Week 3 HW.2: Post-Lab Reflection

2.1. Find and describe a published paper utilizing Opentrons or similar liquid handling automation tools.

The paper I have found: Slowpoke: An Automated Golden Gate Cloning Workflow for Opentrons OT‑2 and Flex

Slowpoke is a tool which generates Opentron protocols for DNA assembly. DNA assembly is used to assemble larger strings of DNA than can be synthesised in one go, by joining together oligonucleotides.

It provides facilities to automate:

  1. Golden Gate Cloning - automates the DNA assembly reaction setup, E. Coli transformation, and plating.
  2. Colony PCR - automates colony PCR screening of resulting transformants.

Users provide for:

  1. Golden Gate Cloning

    1. Genetic toolkit map - e.g. MoClo YTK, STK plate layout
    2. Custom parts map
    3. Combination file
  2. Colony PCR

    1. Colony template positions
    2. PCR deck maps
    3. Reaction recipes.

The robot protocol automates the full pipeline of assembly, transformation, plating and colony PCR:

  1. DNA and enzyme buffer extraction.
  2. Golden gate reaction.
  3. Transformation.
  4. Plating.
  5. Colony PCR.

It is compatible with multiple MoClo/Golden Gate toolkits (YTK, STK, and extensible to others).

Manual steps still required:

  • Colony picking - most labour-intensive step.
  • Sealing PCR plates in OT-2 thermocycler module.
  • Transferring PCR tubes to benchtop thermocycler.
  • Incubation, strain storage, and plasmid purification - still accounts for a lot of time.

https://github.com/Tom-Ellis-Lab/Slowpoke

2.2. Describe your intended automation use for your final project, including pseudocode, scripts, or implementation plans.

I intend to use a cloud lab platform to screen an array of biosensor constructs that I have designed, synthesised, and expressed using cell-free protein synthesis (CFPS).

Week 3 HW.3: Final Project Ideas

  • Submit 1–3 slides with three individual project concept ideas.

Week 4 HW

Part A. Conceptual Questions

Answer any NINE of the following questions from Shuguang Zhang:

1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)

Meat is roughly 20% protein by weight, so 0.2*500g=100g of protein. This is the only amino acid in meat, as carbs are sugars, and fats are triglycerides (fatty acids + glycerol).

1 g = 6.02217364335E+23 dalton

100 g = 6.022173643E+25 daltons

2. Why do humans eat beef but do not become a cow, eat fish but do not become fish?

Humans eat beef but don’t become a cow because the stomach metabolizes complex proteins and cells down to base level molecules and amino acids.

3. Why are there only 20 natural amino acids?

For evolutionary reasons probably. Much like how human language has a finite set of phonemes which allow us to express infinitely more higher-level syllables, words, and concepts and sentences - biology has a base grammar of 20 units. This has proven to be enough - 64 codons map onto 20 amino acids plus stop signals. There may have been more but this is evidently evolutionarily optimal as it is now.

4. Can you make other non-natural amino acids? Design some new amino acids.

β-amino acids are interesting - usually the amino group is attached onto the α-carbon, but here they are attached on the β-carbon. Due to this, proteases (enzymes which support digestion) are highly ineffective against β-peptides.

Others I googled:

  • Fluoroleucine — leucine with fluorine substituted in; more hydrophobic and metabolically stable
  • Azidohomoalanine — methionine analog with an azide group, useful for click chemistry bioconjugation

5. Where did amino acids come from before enzymes that make them, and before life started?

  1. Meterorites that naturally carry amino acids
  2. Miller-Urey experiment showed amino acids could form spontaneously from simple molecules + an electric arc

6. If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?

Left-handed. Natural α-helices are right-handed because they’re built from L-amino acids. D-amino acids are the mirror image, so the resulting helix is the mirror image too — left-handed.

7. Can you discover additional helices in proteins?

Yes — beyond the common α-helix, proteins also contain 3₁₀-helices (3 residues per turn, tighter) and π-helices (4.4 residues per turn, rarer and wider). These are already known but underappreciated. Computational analysis of PDB structures keeps surfacing edge cases and unusual conformations that don’t fit neatly into existing categories.

8. Why are most molecular helices right-handed?

Because natural amino acids are L-enantiomers. The geometry of the L-α-carbon makes right-handed coiling energetically favorable — the side chains point outward without steric clashes in a right-handed helix. A left-handed helix built from L-amino acids would force side chains into the backbone, creating strain.

9. Why do β-sheets tend to aggregate? What is the driving force for β-sheet aggregation?

β-sheets have “sticky edges” — the backbone NH and C=O groups along the edge strands are unsatisfied hydrogen bond donors/acceptors. These can pair with the edge of another β-sheet. The driving forces are hydrogen bonding along the backbone and hydrophobic stacking between sheet faces. This makes lateral growth into large, ordered aggregates thermodynamically favorable.

10. Why do many amyloid diseases form β-sheets? Can you use amyloid β-sheets as materials?

When proteins misfold or partially unfold, they expose their backbone, which can then hydrogen-bond with other misfolded proteins into cross-β structure (hydrogen bonds running perpendicular to the fibril axis). This structure is extremely stable — often more so than the native fold — so once nucleation starts, it propagates. Many proteins will form amyloid under the right conditions; some just do so more readily due to sequence composition or environmental stress.

Yes, amyloid fibrils can be used as materials — they’re stiff, stable, and self-assembling. Researchers have used them as scaffolds for nanomaterials, hydrogels, and functional coatings.

11. Design a β-sheet motif that forms a well-ordered structure.

Part B. Protein Analysis and Visualization

(1) Briefly describe the protein you selected and why you selected it.

Insulin. A 51-amino acid peptide hormone secreted by pancreatic β-cells that regulates blood glucose by signaling cells to take up glucose. I chose it because it’s small and well-studied, historically significant (first recombinantly produced therapeutic protein), and I’m curious how something so tiny has such a large physiological effect.

(2) Identify the amino acid sequence of your protein. How long is it? What is the most frequent amino acid? How many protein sequence homologs are there? Does your protein belong to any protein family?

51 amino acids total — two chains: A (21 aa) and B (30 aa), linked by two disulfide bonds. Most frequent: leucine (L) and cysteine (C), both at 6 occurrences. Many homologs — the insulin/IGF/relaxin superfamily includes IGF-1, IGF-2, relaxin, and insulin-like peptides across many organisms. Belongs to the insulin family (InterPro: Insulin/IGF/relaxin superfamily).

(3) Identify the structure page of your protein in RCSB. When was the structure solved? Is it a good quality structure? Are there any other molecules in the solved structure apart from protein? Does your protein belong to any structure classification family?

PDB: 4INS — human insulin hexamer, solved in 1989 at 1.5 Å resolution. Good quality structure. In addition to protein, the hexamer contains two zinc ions (Zn²⁺) coordinated by His B10 residues at the center, plus water molecules.

Why extra zinc ions? Insulin is stored as a zinc-stabilized hexamer in β-cells; once secreted, the hexamer dissociates into monomers and the zinc stays behind, so zinc is necessary for storage and secretion but not for receptor binding. Zinc deficiency is linked to impaired insulin secretion and increased type 2 diabetes risk.

In structural classification, insulin belongs to the “Insulin-like” fold under the all-α class.

(4) Open the structure in 3D visualization software. Visualize as “cartoon”, “ribbon”, and “ball and stick”. Color by secondary structure — does it have more helices or sheets? Color by residue type — what can you tell about hydrophobic vs hydrophilic distribution? Visualize the surface — does it have any binding pockets?

  • Color by secondary structure — does it have more helices or sheets
alt text alt text

It has helices

Red spirals = α-helices

Yellow flat arrow shapes = β-sheets

  • Color by residue type — does it have more helices or sheets

PyMOL: util.cbag

alt text alt text - green is helices. I don’t see any β-sheets.

  • Visualize the surface — does it have any binding pockets?

Red is helix, Yellow is sheet, Green is loop.

PyMOL: show surface

alt text alt textalt text alt text

The surface doesn’t show a deep binding pocket — the receptor-binding interface is relatively flat.

Part C. Using ML-Based Protein Design Tools

4INS : GIVEQCCTSICSLYQLENYCNFVNQHLCGSHLVEALYLVCGERGFFYTPKT

C1. Protein Language Modeling

https://colab.research.google.com/drive/1Pu0Nmmpn-OjL_UDqrjnAhowjz1ZP1hKJ?usp=sharing

Deep Mutational Scans

Use ESM2 to generate an unsupervised deep mutational scan. Explain any particular pattern (choose a residue and mutation that stands out). (Bonus) Find sequences for which we have experimental scans, and compare the prediction of the language model to experiment.

Latent Space Analysis

Use the provided sequence dataset to embed proteins in reduced dimensionality. Analyze the different formed neighborhoods. Place your protein in the resulting map and explain its position and similarity to its neighbors.

alt text alt text alt text alt text

Explain its position and similarity to its neighbors

alt text alt text

!!! TODO - I don’t know enough to describe it.

C2. Protein Folding

Fold your protein with ESMFold. Do the predicted coordinates match your original structure? Try changing the sequence — first some mutations, then large segments. Is your protein structure resilient to mutations?

!!! TODO - Cannot see the coordinates. Structure looks interesting.

Predicted folded:

alt text alt text

ptm0.413_r3_default.pdb - PDB file.

Original 4INS Ribbon Diagram:

C3. Protein Generation

Use ProteinMPNN to inverse-fold your protein backbone. Analyze the predicted sequence probabilities and compare to the original. Input the predicted sequence into ESMFold and compare the predicted structure to your original.

Generating sequences...
>tmp, score=2.0974, fixed_chains=[], designed_chains=['A'], model_name=v_48_020
GIVEQCCTSICSLYQLENYCNFVNQHLCGSHLVEALYLVCGERGFFYTPKT
>T=0.1, sample=0, score=1.1989, seq_recovery=0.3137
SIIEKCCYHNCTEEELLKYCPEENKKKCLKNLIKELKKKCGPKCYVKIPKP
alt text alt text
Generating sequences...
>tmp, score=2.0873, fixed_chains=[], designed_chains=['A'], model_name=v_48_020
GIVEQCCTSICSLYQLENYCNFVNQHLCGSHLVEALYLVCGERGFFYTPKT
>T=0.1, sample=0, score=1.2054, seq_recovery=0.3137
GLYEKCCYSNCTLAEIAKYCPKKNKKKCVKSKIKELKKKCGPKCWVYVPPP

 New Sequence:GLYEKCCYSNCTLAEIAKYCPKKNKKKCVKSKIKELKKKCGPKCWVYVPPP

Comparing:

GIVEQCCTSICSLYQLENYCNFVNQHLCGSHLVEALYLVCGERGFFYTPKT
GLYEKCCYSNCTLAEIAKYCPKKNKKKCVKSKIKELKKKCGPKCWVYVPPP

ESMFold predicted structure vs. original:

Predicted: Predicted Predicted

Original:

Part D. Group Brainstorm on Bacteriophage Engineering

NA - Sick

Week 5 HW

Part A. SOD1 Binder Peptide Design

Superoxide dismutase 1 (SOD1) is a cytosolic antioxidant enzyme that converts superoxide radicals into hydrogen peroxide and oxygen. In its native state, it forms a stable homodimer and binds copper and zinc.

Mutations in SOD1 cause familial Amyotrophic Lateral Sclerosis (ALS). Among them, the A4V mutation (Alanine → Valine at residue 4) leads to one of the most aggressive forms of the disease. The mutation subtly destabilizes the N-terminus, perturbs folding energetics, and promotes toxic aggregation.

Challenge: Design short peptides that bind mutant SOD1, then decide which ones are worth advancing toward therapy.

Models used:

  • PepMLM: target sequence-conditioned peptide generation via masked language modeling
  • PeptiVerse: therapeutic property prediction
  • moPPIt: motif-specific multi-objective peptide design using Multi-Objective Guided Discrete Flow Matching (MOG-DFM)

Part 1: Generate Binders with PepMLM

Retrieve the human SOD1 sequence from UniProt (P00441), introduce the A4V mutation, and use the PepMLM Colab to generate four peptides of length 12 amino acids conditioned on the mutant sequence. Add the known binder FLYRWLPSRRGG for comparison. Record perplexity scores.

A4V mutant SOD1 sequence (deleted M at position 1, changed A→V at position 4):

ATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Generated peptides:

#BinderPseudo Perplexity
1WRSPAVAVAHWE7.77
2WRVGWVGVELKE24.21
3WRSPAAXIEHKX11.24
4WRVYAAXIEWGK20.45
KnownFLYRWLPSRRGG22.53

A note on perplexity: A lower perplexity score means higher model confidence that the peptide satisfies the criteria for binding the target.

Part 2: Evaluate Binders with AlphaFold3

Submit each peptide + mutant SOD1 as separate chains to the AlphaFold Server. Record the ipTM score and describe where each peptide appears to bind — does it localize near the N-terminus (A4V site), the β-barrel, or the dimer interface? Is it surface-bound or partially buried? In a short paragraph, describe the ipTM values and whether any PepMLM-generated peptide matches or exceeds the known binder.

PeptideBinding locationipTM score
WRSPAVAVAHWENone0.28
WRVGWVGVELKENone0.35
WRSPAAXIEHKXNone0.33
WRVYAAXIEWGKNone0.34

Part 3: Evaluate Properties in the PeptiVerse

Using PeptiVerse, evaluate the therapeutic properties of each peptide against the A4V mutant SOD1 sequence. Check: predicted binding affinity, solubility, hemolysis probability, net charge (pH 7), and molecular weight.

PeptideSolubilityHemolysisBinding AffinityMW (Da)Net Charge (pH 7)
WRSPAVAVAHWE1.00.044 (Non)5.361 (Weak)1408.6-0.14
WRVGWVGVELKE1.00.117 (Non)7.089 (Medium)1457.7-0.23
WRSPAAXIEHKX1.00.011 (Non)4.645 (Weak)1158.50.85
WRVYAAXIEWGK1.00.043 (Non)6.724 (Weak)1360.70.76
FLYRWLPSRRGG (known)1.00.047 (Non)5.962 (Weak)1507.72.76

The best peptide to advance for wet lab validation would be WRVGWVGVELKE due to its relatively high binding affinity (7.089, Medium).

Part 4: Generate Optimized Peptides with moPPIt

Using the moPPIt Colab: paste your A4V mutant SOD1 sequence, choose specific residue indices to target (e.g. near position 4, the dimer interface, or another surface patch), set peptide length to 12 aa, and enable motif + affinity guidance. Briefly describe how the moPPIt peptides differ from your PepMLM peptides. How would you evaluate these before advancing to clinical studies?

BinderHemolysisSolubilityAffinityMotif
SVKTKCCTTYQS0.9640.9176.5760.890
DDTKKCSCIQTH0.9750.9176.3140.915
ENGETFQCTKKV0.9700.8336.0440.935
KKSKKAFVCCVC0.9630.6678.1720.614

For the long execution time and computational resources required, the main advantage of moPPIt over PepMLM (in this context) is the motif score — there was no option to check motif specificity in PeptiVerse. All other properties of the PepMLM-generated sequences were comparable to the moPPIt peptides.

Part B. BRD4 Drug Discovery Platform Tutorial

(Optional — skipped)

Part C. Final Project: L-Protein Mutants

High level summary: The objective of this assignment is to improve the stability and auto-folding of the lysis protein of a MS2-phage. This mechanism is key to the understanding of how phages can potentially solve antibiotic-resistance.

Full instructions: Google Doc

Chose option 3: generating random mutations in the lysis protein while avoiding loss-of-function or nonsense codons. A Python script (Colab) was used to load active mutations from experimental data and apply them randomly to unique positions.

Generated sequences:

0. METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT (Original)
1. METRFPQQSQQTPASTNRRRPFKHEDYPCQRQQRSSTLYVLIFLAFFLSKFTNQLLLSLLEAVIRTVTTLQQLLT
2. METRFPQQSQQTLAATNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT
3. METRFPQQSQQTPASTNRRRPFKHGGYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

AF2 Multimer was used to co-fold mutant sequence 1 with DnaJ. The plDDT score indicates low model confidence in the folding of the mutant L protein. Overall, the random mutation approach is very time-consuming for obtaining leads.

Week 6 HW

What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?

A Phusion HF PCR Master Mix is a pre-combined PCR reaction system optimised for a specific engineered DNA polymerase.

  • Phusion DNA polymerase — provides the catalytic activity which synthesises DNA, and includes a 3’→5’ exonuclease proofreading to reduce error
  • Reaction buffer
  • MgCl₂ — magnesium ions
  • dNTPs — deoxynucelotide triphosphates: dATP, dCTP, dGTP, and dTTP
  • Stabilizers/additives
  • Water

A typical setup only requires after adding:

  • Forward primer
  • Reverse primer
  • Template DNA
  • Additional water to reach final volume

What are some factors that determine primer annealing temperature during PCR?

Factors:

  • Primer melting temperature — dominant factor
  • GC content of primer
  • Primer length
  • Sequence features
  • Salt concentration in the reaction buffer
  • Template–primer mismatch tolerance

There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other.

PCR creates DNA fragments by enzymatic replication using primers that define the fragment boundaries.

Protocol: The reaction contains template DNA, forward and reverse primers, dNTPs, buffer, Mg²⁺, and a thermostable DNA polymerase (e.g. Phusion or Taq). The protocol cycles temperature: denaturation (~95 °C) separates strands, annealing (~50–65 °C) allows primers to bind, and extension (~72 °C) synthesizes new DNA. After ~25–35 cycles, the region between the primers is exponentially amplified, producing many linear copies of a precisely defined sequence.

Restriction enzyme digestion produces linear fragments by cutting DNA at specific recognition sequences using restriction endonucleases.

Protocol: The protocol involves incubating DNA with one or more enzymes in the appropriate buffer (often ~37 °C) for a set time. The enzyme recognizes a short sequence (typically 4–8 bp) and cleaves the phosphodiester backbone, generating fragments with defined ends (blunt or sticky). The resulting fragment sizes depend entirely on where those recognition sites exist in the DNA.

Conceptually, PCR synthesizes a fragment by copying between two designed boundaries, whereas restriction digestion extracts a fragment by cutting an existing molecule at predetermined sequence motifs.

To understand when both are useful, consider an objective: engineer E. coli to produce human insulin, which requires building a plasmid containing the insulin gene under a bacterial promoter.

3 difference scenarios for getting insulin:

  1. DNA comes from a biological sample (e.g. human genomic DNA). The insulin gene is buried inside billions of unrelated bases, so PCR is used to isolate and amplify only that specific region using primers that define its boundaries. PCR is therefore used when the goal is to retrieve a specific gene from a complex DNA mixture.

  2. DNA already exists in a plasmid (e.g. moving GFP from plasmid A into plasmid B). The fragment is already isolated, so restriction enzymes are used to cut DNA at specific recognition sequences, allowing the gene to be excised and inserted into another vector. Restriction digestion is therefore used when the task is to cut and rearrange existing DNA molecules.

  3. DNA is chemically synthesized because the sequence is already known. The synthesized fragment may still be PCR-amplified if more copies are needed, and restriction enzymes (or similar assembly methods) are used to insert it into plasmids. In practice, PCR isolates or amplifies sequences, while restriction enzymes cut DNA molecules so fragments can be inserted, removed, or reorganized.

How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning?

Desiderata: Linear DNA fragments whose terminal 20–40 bp regions are perfectly homologous to the neighboring fragment, unique, structurally stable, and present in a clean preparation so the Gibson enzymes can expose the overlaps, allow annealing, fill gaps, and ligate the final construct.

How does the plasmid DNA enter the E. coli cells during transformation?

A cell is boundaried by a lipid membrane wall. It is not a solid wall. It is more of a dense molecular fluid. Each phospholipid is held in place only by weak interactions (hydrophobic forces, van der Waals forces). Like an electron has no fixed static wall, rather a field it creates, a cell has no fixed solid wall, it is a highly dense molecular fluid.

Lipids constantly fluctuate, small gaps appear and disappear.

During transformation, the culture is treated to brief heat shock (~42 °C for ~30–60 s). The rapid temperature change causes a sudden increase in the lipid kinetic energy, resulting in transient disordering of phospholipid packing, resulting in transient aqueous pores in bilayer. Plasmid DNA molecules enter through these pores.

This is paired with a treatment of calcium ions, which neutralises negative charges on the DNA phosphate backbone and the membrane surface, and thus reduces electrostatic repulsion between DNA and cell envelope.

Describe another assembly method in detail (such as Golden Gate Assembly)

Explain the other method in 5–7 sentences plus diagrams (either handmade or online).

Design fragments with Type IIS sites and specific 4-bp overhangs. PCR amplify or synthesize fragments with those flanking sites. Mix fragments, plasmid backbone, Type IIS enzyme, ligase, and buffer. Run digestion–ligation thermal cycles. Transform assembled plasmid into bacteria.

Model this assembly method with Benchling or Asimov Kernel!