Homework
I have included my OpenTron work, answers to post-lab questions and 3 early stage project ideas in the Week 3 lab section.
I have included my OpenTron work, answers to post-lab questions and 3 early stage project ideas in the Week 3 lab section.

First, describe a biological engineering application or tool you want to develop and why.
I want to develop a closed loop pipeline for peptide engineering that uses Feynman–Kac steering to control diffusion-based protein generation at inference time. The goal is to go beyond zero-shot prediction and instead build an automated engineering cycle that repeatedly:
This is inspired by FK-steering approach which wraps a diffusion protein generator with a sampling scheme so trajectories are continuously reweighted toward user-defined rewards, which in this case, is the experimental readout.
Peptides are a good choice for this project as they are often fast to synthesize and test, making them compatible with iterative lab loops. However, many properties of peptides we care about (solubility, stability, expression, off-target behavior) can be hard to optimize from prediction alone so a wet-lab loop is attractive. Functionally, they can serve as binders, inhibitors, diagnostic reagents, or modular parts in synthetic biology pipelines.
As a concrete MVP within this class, I hope to learn how to perform the wet lab experiments associated to this project and finish at least 1 cycle. In the medium term, I would like to run comparisons between different computational approaches like simple finetuning or RL. In the long term, I would like to utilizie this method to discover therapeutic proteins.
Next, describe one or more governance/policy goals related to ensuring that this application or tool contributes to an “ethical” future, like ensuring non-malfeasance (preventing harm). Break big goals down into two or more specific sub-goals.
Closed loop design could be repurposed to create harmful biomolecules. Governance should reduce the probability of both deliberate misuse and accidental creation of dangerous function. Thus, one major goal would be to prevent misuse. As sub goals, the following may be good options:
Ensure the system does not optimize toward harmful or restricted targets/functions.
Reduce the chance that hazardous sequences are synthesized without review.
Ensure that there are audit trails and responsible-use norms.
Next, describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”).
I propose three governance actions spanning institutional review, synthesis controls, and a logging infrastructure.
Option 1: Institutional Review
Option 2: Synthesis Controls
Option 3: Logging Infrastructure
| Does the option: | Option 1 | Option 2 | Option 3 |
|---|---|---|---|
| Enhance Biosecurity | |||
| • By preventing incidents | 2 | 1 | 2 |
| • By helping respond | 1 | 2 | 1 |
| Foster Lab Safety | |||
| • By preventing incident | 1 | 2 | 3 |
| • By helping respond | 1 | 2 | 1 |
| Protect the environment | |||
| • By preventing incidents | 2 | 2 | 3 |
| • By helping respond | 2 | 2 | 1 |
| Other considerations | |||
| • Minimizing costs and burdens to stakeholders | 2 | 2 | 2 |
| • Feasibility? | 1 | 2 | 3 |
| • Not impede research | 1 | 2 | 1 |
| • Promote constructive applications | 1 | 2 | 2 |
Last, drawing upon this scoring, describe which governance option, or combination of options, you would prioritize, and why. Outline any trade-offs you considered as well as assumptions and uncertainties.
In order of priority:
Tradeoffs:
Key Uncertainties:
Reflecting on what you learned and did in class this week, outline any ethical concerns that arose, especially any that were new to you. Then propose any governance actions you think might be appropriate to address those issues. This should be included on your class page for this week.
Unfortunately, I was ill this week so I was not able to attend class.
Create a pattern/image in the style of Paul Vanouse’s Latent Figure Protocol artworks
I have created an image of mount fuji with clouds in the sky. I have inverted the image so it is easier to visualize.

Note: Since we worked in groups during lab this week, we created a different design than the one shown above for the lab activity.
Choose your protein.
RES-701-3 is a tiny natural protein made by soil bacteria (Streptomyces). It belongs to a family called lasso peptides, named because their structure looks like a lasso or slipknot. The tail of the protein threads through a loop, creating a knot that is extremely hard to unravel.
This knotted shape makes lasso peptides unusually tough. They resist being broken down by digestive enzymes, heat, and harsh chemical environments. These are properties that most proteins lack, and that make them attractive as potential drugs.
RES-701-3 blocks a receptor on the surface of blood vessel cells called the endothelin type B receptor (ETB). The endothelin system controls blood vessel tightening and relaxation, and becomes dysregulated with age, contributing to high blood pressure and vascular disease. RES-701-3 acts as an inverse agonist, meaning it blocks the receptor and pushes toward a less active state than its resting baseline.
In nature, the bacteria makes this peptide in two parts:
MSDITLTPMDLLDLDELAAGGGRSTAREGNWHEPEIDGWNPHGWThe core is removed from the leader with an enzyme, which makes it active.
Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.
The nucleotide sequence of the leader and the core is shown respectively.
ATGAGCGATATTACCCTGACCCCGATGGATCTGCTGGATCTGGATGAACTGGCTGCTGGTGGTGGTCGTAGCACCGCTCGTGAAGGTAACTGGCATGAACCGGAAATTGATGGTTGGAACCCGCATGGTTGGTAACodon optimization.
Due to evolution, different species have different codons it uses frequently and has abundant matching transfer RNAs for, and codons it rarely uses and has few tRNAs for. RES-701-3 comes from Streptomyces and strongly prefers codons loaded with G and C. Twist has a Streptomyces coelicolor for codon optimization.
However, it’s worth mentioning that in a 2025 paper by Shihoya et al. paper, they used Streptomyces venezuelae as organism and achieved the highest reported yields. If I was in a real drug development setting, I might go with this.
Here is the codon optimized variant for both leader and core together:
You have a sequence! Now what?
I have listed the Promoter, RBS, Start Codon, Coding Sequence, His Tag, Stop Codon, Terminator as well as the reagents needed below.
Promoter: The **ermE*p promoter is supposed to be the most widely used for gene expression in Streptomyces.
Ribosome Binding Site: We’re using Shine Dalgarno (SD) sequence, AAGGAG, which is supposed to be a good RBS for streptomyces with leaders. It is supposed to be positioned 6 to 10 nucleotides upstream of the start codon, so we will use 7 nucleotides. We’re going to put two spacers before and after the SD sequence, CGACG and ACAC.
Start Codon: This is just going to be the usual ATG.
Coding Sequence: We are going to put both of our leader and core peptide sequence together here.
His tag: This is a short string of six histidine amino acids added to the protein so you can fish it out of a mixture using a nickel column. The histidines stick to nickel, letting you pull your protein out of everything else the cell makes. However, in practice, apparently this is not actually good to put on for RES-701-3 because it would interfere with binding the ETB receptor.
Stop Codon: TGA tells the ribosome to stop building the protein here. TGA is the preferred stop codon in Streptomyces because it is relatively speaking, GC-rich, matching the organism’s DNA preferences as discussed before. For example, typical stop codon is TAA.
Terminator: Tells the cell’s RNA-copying machinery to stop making mRNA. Without it, the cell would keep reading past your gene into random neighboring DNA. We’re using the fd terminator from a bacteriophage which is commonly used in Streptomyces expression vectors.
Reagents
In order to produce these proteins we also need to use some enzymes to be used as reagents, namely, LasB1, LasB2 and LasC. For this lasso peptide, LasB1 binds the leader, delivers the whole precursor to LasB2 which cuts the leader off, and then LasC closes the ring on the core. It doesn’t seem easy to order the reagents so it seems like this peptide wouldn’t be a great choice for the class. In addition, the yield is optimized by using Streptomyces venezuelae, which is also not too common.
I prepared the lasso peptide order. Here is a picture of the expression cassette below in benchling.

Instead of a clonal gene, I used gene fragments because they work better Streptomyces as an organism rather than e coli, which are the standard cloning vectors.

What DNA would you want to sequence (e.g., read) and why?
I would want to sequence the whole genomes of all ~6,000 mammalian species. The largest current collection of mammalian genomes is the Zoonomia project, which contains around 250 whole genomes along with known maximum lifespan data for most of these species. However, expanding this to cover all mammals—paired with their maximum lifespan records—would allow us to train computational models that identify DNA patterns predicting how long a species can live. In short, more genomes means better predictions about which parts of DNA are linked to longevity.
In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why?
Illumina short-read sequencing (second-generation): This produces highly accurate short reads (~150–300 base pairs) and is great for spotting small genetic differences between species.
Is your method first-, second-, or third-generation?
I am using both second-generation Illumina. First-generation refers to older Sanger sequencing, which reads one fragment at a time and is too slow and expensive for whole genomes. Second-generation sequences millions of short fragments in parallel, making it fast and cheap.
What is your input? How do you prepare your input?
The input is genomic DNA extracted from tissue or blood samples of each mammalian species. The essential preparation steps are:
What are the essential steps of your chosen sequencing technology? How does it decode bases (base calling)?
Fragmented DNA is attached to a glass surface flow cell, amplified into clusters, and then sequenced one base at a time. In each cycle, a fluorescently labeled nucleotide is added, a camera captures which color lights up at each cluster where each of the four bases has a different color, and the machine records the base. This process repeats hundreds of times to read out each fragment.
What is the output?
The output is digital sequence files, typically in FASTQ format, containing millions of reads—short or long strings of A, T, C, and G letters—along with quality scores indicating how confident the machine is about each base call. These reads are then assembled and aligned computationally to reconstruct each species’ complete genome.
What DNA would you want to synthesize (e.g., write) and why?
Based on the sequencing data above, I would use trained computational models to predict specific DNA sequences associated with high maximum lifespan. I would then synthesize these predicted longevity-linked sequences—for example, specific gene variants or regulatory elements found in long-lived species like bowhead whales or naked mole-rats—so they can be tested in cell cultures or animal models. The goal is to move from computational prediction to experimental validation: do these DNA sequences actually promote cellular health and longevity?
What technology or technologies would you use to perform this DNA synthesis and why?
What are the essential steps of your chosen synthesis method?
What are the limitations of your synthesis method in terms of speed, accuracy, and scalability?
What DNA would you want to edit and why?
I would want to edit specific genes in model organisms (such as mice) to replace their native sequences with the longevity-associated sequences identified from the analysis above. For example, if the computational model predicts that a certain variant of a DNA repair gene is linked to longer lifespan in mammals, I would edit a mouse’s genome to carry that variant. This would let us test whether swapping in these predicted “long-life” DNA variants actually extends lifespan or improves age-related health outcomes like cancer resistance or cellular repair.
What technology or technologies would you use to perform these DNA edits and why?
I would use CRISPR-Cas9 gene editing, because it is the most precise, versatile, and widely used genome editing tool available. It can make targeted changes at specific locations in the genome of living cells and organisms, and it works well in mammalian systems including mice.
How does your technology edit DNA? What are the essential steps?
What preparation do you need to do, and what is the input?
What are the limitations of your editing method in terms of efficiency or precision?
I have included my OpenTron work, answers to post-lab questions and 3 early stage project ideas in the Week 3 lab section.
A beta-strand is what happens when a protein’s backbone which involves the repeating NH–Calpha–CO chain shared by every amino acid stretches out into a nearly flat zigzag. When two or more of these strands line up next to each other and link through hydrogen bonds (where an N–H on one strand bonds to a C=O on the neighbor), you get a beta-sheet. The strands on the outer edges still have a full row of exposed N–H and C=O groups resulting in another strand being added, and so on.
The hydrophobic effect is the biggest one. In a beta-strand, side chains stick out. Since many side chains are hydrophobic, two sheets stack such that the greasy surfaces are in the interior.
Hydrogen bonding gives the structure its regularity. Each new strand that joins the sheet edge contributes roughly one H-bond per amino acid along its length. Individually, H-bonds in water are not enormously strong because breaking one with a neighbor just lets you form one with a water molecule instead, but across a strand of ten or more residues, they add up meaningfully.
Van der Waals packing stabilizes sheets that have stacked together. Van der Waals forces are much weaker and shorter-range. They arise from temporary, fluctuating dipoles.
I selected a macrocyclic peptide for the following reasons: