Homework

Weekly homework submissions:

Week 01 HW
HW.1: Class assignment 1. Describe an application Identify a biological engineering tool or application you wish to develop and explain your motivation. I would like to develop a way to make plants grow 100x faster. I find this a very interesting and ambitious question. Perhaps you reverse-engineer the genome, morphological development and constraints, proteins/enzymes/catalysts for growth. Perhaps you design a separate organism (two bacterium?) which produces biomass - a combination of a carbon sequester and a cellulose printer. Perhaps you attempt to design a minimal artificial cell, like a Xenobot / JCVI minimal cells - using new AI design software, you create a minimal genome/DNA, design your own morphological topology through simulation, which is compiled down to gene regulatory networks (GRN’s), transcription factors/thresholds, and DNA.
Week 02 HW
HW.1: Benchling & In-silico Gel Art Make a free account at benchling.com, Import the Lambda DNA. Simulate Restriction Enzyme Digestion with the following Enzymes: EcoRI HindIII BamHI KpnI EcoRV SacI SalI Create a pattern/image in the style of Paul Vanouse’s Latent Figure Protocol artworks. Benchling screenshots. Experimental design for Gel art. HW.2: Gel Art - Restriction Digests and Gel Electrophoresis In the wet-lab perform the lab experiment you designed in Part 1 and outlined in this week’s lab protocol “Gel Art: Restriction Digests and Gel Electrophoresis”.
Week 03 HW
https://2026a.htgaa.org/2026a/course-pages/weeks/week-03/index.html HW.1: Python Script for Opentrons Artwork Review recitation materials and lab documentation. Design artwork using the GUI at opentrons-art.rcdonovan.com. Write a Python script using coordinates from the GUI via the “HTGAA26 Opentrons Colab”. Sign up for a robot time slot and run the script on the Opentrons robot. Submit Python file via provided form. HW.2: Post-Lab Reflection 2.1. Find and describe a published paper utilizing Opentrons or similar liquid handling automation tools. The paper I have found: Slowpoke: An Automated Golden Gate Cloning Workflow for Opentrons OT‑2 and Flex
Week 04 HW
Part A. Conceptual Questions Answer any NINE of the following questions from Shuguang Zhang:

How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons) Meat is roughly 20% protein by weight, so 0.2*500g=100g of protein. This is the only amino acid in meat, as carbs are sugars, and fats are triglycerides (fatty acids + glycerol).

Week 05 HW
Part A. SOD1 Binder Peptide Design Superoxide dismutase 1 (SOD1) is a cytosolic antioxidant enzyme that converts superoxide radicals into hydrogen peroxide and oxygen. In its native state, it forms a stable homodimer and binds copper and zinc. Mutations in SOD1 cause familial Amyotrophic Lateral Sclerosis (ALS). Among them, the A4V mutation (Alanine → Valine at residue 4) leads to one of the most aggressive forms of the disease. The mutation subtly destabilizes the N-terminus, perturbs folding energetics, and promotes toxic aggregation.
Week 06 HW
What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose? A Phusion HF PCR Master Mix is a pre-combined PCR reaction system optimised for a specific engineered DNA polymerase. Phusion DNA polymerase — provides the catalytic activity which synthesises DNA, and includes a 3’→5’ exonuclease proofreading to reduce error Reaction buffer MgCl₂ — magnesium ions dNTPs — deoxynucelotide triphosphates: dATP, dCTP, dGTP, and dTTP Stabilizers/additives Water A typical setup only requires after adding:
Week 07 HW
https://2026a.htgaa.org/2026a/course-pages/weeks/week-07/index.html Part 1: Intracellular Artificial Neural Networks (IANNs) What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions? They can compute continuous values, not just discrete ones. Describe a useful application for an IANN; include a detailed description of input/output behavior, as well as any limitations an IANN might face to achieve your goal. A feedback-response mechanism, for example how the pancreas runs a sort of PID controller / control system for managing glucose levels in the body - through two hormones that move glucose levels in opposite directions - insulin and glucagon.
Week 08 HW
https://2026a.htgaa.org/2026a/course-pages/weeks/week-08/index.html There was no week 8 - GOTCHA On to week 9
Week 09 HW
https://2026a.htgaa.org/2026a/course-pages/weeks/week-09/index.html General homework questions Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production. Cell-free protein synthesis is technology where proteins are produced using lysed cell machinery (ribosomes, enzymes) rather than living cells.
Week 10 HW
https://2026a.htgaa.org/2026a/course-pages/weeks/week-10/index.html Homework: Final Project Identify at least one aspect of your project to measure (protein mass, sequence, biomarker presence/quantity, etc.). For this homework, let’s say my project is to produce human insulin (INS_HUMAN) via recombinant synthesis (cell-based system - E. Coli). The aspect I want to measure is purification/presence of the insulin protein.
Week 11 HW
https://2026a.htgaa.org/2026a/course-pages/weeks/week-11/index.html Part A: The 1,536 Pixel Artwork Canvas | Collective Artwork Contribute at least one pixel to the global artwork experiment before editing ends. Discuss on the Discourse forum. I missed the deadline for this, sorry! Part B: Cell-Free Protein Synthesis | Cell-Free Reagents Provide 1–2 sentence descriptions of each component’s role in the cell-free reaction: E. coli Lysate: BL21 (DE3) Star Lysate (includes T7 RNA Polymerase) Provides the core machinery for translation and transcription of proteins.
Week 12 HW
https://2026a.htgaa.org/2026a/course-pages/weeks/week-12/index.html No HW this week.
Week 13 HW
https://2026a.htgaa.org/2026a/course-pages/weeks/week-13/index.html Work on final project
Week 14 HW
https://2026a.htgaa.org/2026a/course-pages/weeks/week-14/index.html Finish and present final project

Week 01 HW

HW.1: Class assignment

1. Describe an application

Identify a biological engineering tool or application you wish to develop and explain your motivation.

I would like to develop a way to make plants grow 100x faster. I find this a very interesting and ambitious question. Perhaps you reverse-engineer the genome, morphological development and constraints, proteins/enzymes/catalysts for growth. Perhaps you design a separate organism (two bacterium?) which produces biomass - a combination of a carbon sequester and a cellulose printer. Perhaps you attempt to design a minimal artificial cell, like a Xenobot / JCVI minimal cells - using new AI design software, you create a minimal genome/DNA, design your own morphological topology through simulation, which is compiled down to gene regulatory networks (GRN’s), transcription factors/thresholds, and DNA.

Why? Because trees and plants are great. They are calming, they look beautiful, they are functionally useful. Originally I wanted to build my own house, and was wondering - why is wood so expensive? If we could grow wood more quickly and effectively, that would be useful. It would also be fun to rapidly green certain areas of the world to produce arable land - the Australian desert, for example.

2. Establish governance goals

Describe one or more governance/policy goals related to ensuring that this application or tool contributes to an “ethical” future, like ensuring non-malfeasance (preventing harm). Break big goals down into two or more specific sub-goals.

Enhance biosecurity (prevent misuse and uncontrolled spread)
- Prevent incidents
  - Restrict access to engineered strains, protocols, and enabling tools
  - Use genetic containment (kill-switches, auxotrophy, sterility)
  - Avoid traits that increase invasiveness or persistence outside intended settings
- Help respond
  - Establish monitoring and reporting systems for unexpected dissemination
  - Maintain traceability (registries, audit logs, chain-of-custody)
Foster lab safety (reduce accidents during development)
- Prevent incidents
  - Standard biosafety training and conservative organism/chassis selection
  - Physical containment and phased testing (lab → greenhouse → controlled trials)
  - Explicit evaluation of failure modes in growth and developmental pathways
- Help respond
  - Clear spill/escape response protocols and emergency shutdown procedures
  - Regular safety reviews and independent oversight
Protect the environment (minimize ecological externalities)
- Prevent incidents
  - Ecological risk assessment: gene flow, non-target effects, ecosystem disruption
  - Prohibit open release until long-term impacts are understood
  - Prefer reversible or self-limiting designs over permanent alterations
- Help respond
  - Post-deployment surveillance and remediation plans
  - Defined liability and responsibility for environmental harms
Equity, autonomy, and constructive use (ensure benefits are fairly distributed)
- Minimizing burdens to stakeholders
  - Community consultation for land-use and deployment decisions
  - Avoid shifting risks onto local ecosystems or vulnerable populations
- Feasibility without blocking research
  - Clear regulatory pathways that enable safe experimentation
  - Transparency and documentation to support responsible scaling
- Promote beneficial applications
  - Prioritize reforestation, sustainable materials, and climate-positive outcomes
  - Discourage purely extractive or destabilizing commercial deployment

3. Design governance actions

Describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”)
Purpose: What is done now and what changes are you proposing?
Design: What is needed to make it “work”? (including the actor(s) involved - who must opt-in, fund, approve, or implement, etc)
Assumptions: What could you have wrong (incorrect assumptions, uncertainties)?
Risks of Failure & “Success”: How might this fail, including any unintended consequences of the “success” of your proposed actions?

Containment-by-design + staged release

Actors: Institutional Biosafety Committees (IBC), national GMO regulators (e.g., OGTR/USDA), lab leads, funders
Design: engineered sterility/kill-switches, greenhouse-only trials, stepwise permits before field testing
Assumptions: containment works reliably; lab phenotypes predict outdoor behavior
Risks: safeguard failure, gene flow, invasive advantage, unexpected ecosystem effects

Access control + biosecurity screening

Actors: DNA synthesis firms, biosecurity agencies, research institutions, grant/journal oversight
Design: sequence screening, restricted strain distribution, dual-use review processes
Assumptions: misuse is limited by controlling access to key materials/information
Risks: leakage, uneven enforcement globally, slowing benign research

Environmental monitoring + liability framework

Actors: environmental agencies, local governments/landholders, independent ecologists, insurers/courts
Design: required impact studies, long-term surveillance, clear remediation liability
Assumptions: harms are detectable early and manageable with monitoring
Risks: underfunded surveillance, delayed ecological damage, liability discouraging deployment

4. Score against rubric

Evaluate each action against objectives including:
Biosecurity enhancement
Lab safety
Environmental protection
Cost/burden minimization
Feasibility and research impact

Does the option:	Option 1	Option 2	Option 3
Enhance Biosecurity	3	3	2
• By preventing incidents	3	3	2
• By helping respond	2	2	3
Foster Lab Safety	3	2	1
• By preventing incident	3	2	1
• By helping respond	2	2	2
Protect the environment	3	2	3
• By preventing incidents	3	2	2
• By helping respond	2	1	3
Other considerations
• Minimizing costs and burdens to stakeholders	2	2	1
• Feasibility?	2	3	1
• Not impede research	1	2	1
• Promote constructive applications	3	2	3

5. Prioritize options

Last, drawing upon this scoring, describe which governance option, or combination of options, you would prioritize, and why. Outline any trade-offs you considered as well as assumptions and uncertainties.
For this, you can choose one or more relevant audiences for your recommendation, which could range from the very local (e.g. to MIT leadership or Cambridge Mayoral Office) to the national (e.g. to President Biden or the head of a Federal Agency) to the international (e.g. to the United Nations Office of the Secretary-General, or the leadership of a multinational firm or industry consortia). These could also be one of the “actor” groups in your matrix.

I would prioritise Containment-by-design + staged release. Given that there is immense uncertainty in how this project could be achieved, it is a waste of resources to consider other governance actions for now. Rapid iteration to reduce uncertainty is the path towards achievement. As part of this - a scalable safety protocol throughout this process facilitates rapid experimentation without risk of ruin, until the project can achieve milestones necessary for unlocking funding and revenue.

HW.2: Lecture prep for W2

Answer prep questions from three faculty members:

Homework Questions from Professor Jacobson:

Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?

Error rate refers to errors per nucleotide added per replication. An error could be a misincorporation (wrong base expressed for a pair), for example.

Error rate of polymerase synthesis is 1/1e7 (1:10^7).

Human genome has 3.1-3.2 Gbp or 3e9 base pairs.

The rate of errors in polymerase copying the human genome’s DNA is 1/1e7 * 3e9, which is nonzero.

Biology deals with the likely error through multiple levels of mitigation:

Proofreading during synthesis corrects errors
Mismatch repair after synthesis repairs errors
Redundancy and selection at multiple levels - DNA is double-stranded, cells exist in huge populations, misfolded proteins get degraded, defective RNAs are destroyed, faulty cells undergo apoptosis
Damage repair system

How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?

Our assumptions:

Average Human Protein: 1036 bp.
~30,000 proteins observed in mammalian genome.
A protein of length L = 3L nucleotides (bases) + a stop codon in the genome

Coding is the process by which DNA is transcribed into mRNA (triplets / codons), and mRNA (codons) is translated into a linear chain of amino acids (polypeptides), which folds into 3D protein structures.

How many different ways are there to code for an average human protein, meaning how many different DNA encodings would compile (transcribe and translate) down to the same protein (chain of amino acids) of length 1036 bp?

Codons are 3 nucleotides, each which have a base (A,C,G,T). There are 64 possible triplet combinations (codons) using the four bases (A, U, G, C). Each codon encodes one amino acid. An amino acid can be encoded by multiple codons. For instance, codons GAA and GAG both specify glutamic acid and exhibit redundancy. This is referred to as degeneracy.

The degeneracy of an amino acid refers to the number of codons which encode it. ie. d(Leu)=6, meaning Leucine has 6 codons which encode it.

Average codon degeneracy across amino acids is roughly 3.

So to calculate the number of possible encodings for a protein of length L=5 amino acids, we compute the degeneracy of each amino acid, and compute their product to find the maximum number of permutations. ie. for a protein of L=5, average degeneracy d(*)=3, num_permutations=d(*) * d(*) * d(*) * d(*) * d(*) = d(*)^L = 3^L

So for an average human protein of L=1036 bp, the number of possible encodings could be 3^L = 3^1036.

There is an intractable number of possible encodings. However, functional “good” encodings are a tiny subset constrained by expression, folding, RNA processing, regulation, and host biology.

Homework Questions from Dr. LeProust:

What’s the most commonly used method for oligo synthesis currently?

solid-phase chemical synthesis with phosphoramidite chemistry

Why is it difficult to make oligos longer than 200nt via direct synthesis?

Because direct phosphoramidite synthesis has a per-step yield <1.0, errors compound exponentially with length. P(success)=(1-e)^200 is improbable (e ~= 0.01)

Why can’t you make a 2000bp gene via direct oligo synthesis?

(1-e)^2000 is near impossible, due to errors accumulating from each synthetic cycle/step.

expected number of cleavage events scales ~linearly with cycle count and purine content
Misincorporations accumulate (wrong base addition)

Homework Question from George Church:

Choose ONE of the following three questions to answer; and please cite AI prompts or paper citations used, if any.

[Using Google & Prof. Church’s slide #4] What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?
[Given slides #2 & 4 (AA:NA and NA:NA codes)] What code would you suggest for AA:AA interactions?
[(Advanced students)] Given the one paragraph abstracts for these real 2026 grant programs sketch a response to one of them or devise one of your own:
https://arpa-h.gov/explore-funding/programs/boss
https://www.darpa.mil/research/programs/smart-rbc
https://www.darpa.mil/research/programs/go

What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?

Histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, valine (+ arginine conditional).

Out of the 20 amino acids needed, the body synthesizes 11-12, while the remaining 8-9, known as essential amino acids, must be obtained through diet.

This is not accurate to all animals, it seems? Counterexample: cats. Cats require taurine.

The Lysine Contingency was a genetic alteration Henry Wu performed in the dinosaur genome. The modification knocked out the ability of the dinosaurs to produce the amino acid Lysine.
This forced the dinosaurs to depend on lysine supplements provided by the park’s veterinary staff. In this way, dinosaurs could never escape from the park because they would never survive long without the food supplements.

Haha, I have to rewatch this film.

The way I would hack around this would be to introduce a substance containing the microbes that cows digest and feed it to the dinosaurs. These microbes synthesise the essential amino acids from nitrogen, thus mitigating the need for the dinosaurs to produce Lysine themselves, instead forming a symbiotic relationship with the microbes in their gut.

I don’t know what this question means, but it reminds me also of Liebig’s law - would the restriction of one amino acid necessarily debilitate the dinosaurs so they can’t escape, or is nature more nonlinear and complex than that?

LLM prompts used:

10 essential amino acids in all animals?
across all animals?
cows can synthesise most of their needed amino acids? how many which ones
how long can you survive without just one of the amnio acids ?

HW.3: Setup your website

CHECK IT OUT

https://pages.htgaa.org/2026a/liam-edwards-playne/

Week 02 HW

HW.1: Benchling & In-silico Gel Art

Make a free account at benchling.com, Import the Lambda DNA.
Simulate Restriction Enzyme Digestion with the following Enzymes: EcoRI HindIII BamHI KpnI EcoRV SacI SalI
Create a pattern/image in the style of Paul Vanouse’s Latent Figure Protocol artworks.

Benchling screenshots.

Experimental design for Gel art.

HW.2: Gel Art - Restriction Digests and Gel Electrophoresis

In the wet-lab perform the lab experiment you designed in Part 1 and outlined in this week’s lab protocol “Gel Art: Restriction Digests and Gel Electrophoresis”.

N/A - no access to BioClub Tokyo Lab.

HW.3: DNA Design Challenge

3.1. Choose your protein.

In recitation, we discussed that you will pick a protein for your homework that you find interesting. Which protein have you chosen and why? Using one of the tools described in recitation (NCBI, UniProt, google), obtain the protein sequence for the protein you chose.

Miraculin - https://rest.uniprot.org/uniprotkb/P13087.fasta https://rest.uniprot.org/uniprotkb/P13087.txt

>sp|P13087|MIRA_SYNDU Miraculin OS=Synsepalum dulcificum OX=3743 PE=1 SV=3
MKELTMLSLSFFFVSALLAAAANPLLSAADSAPNPVLDIDGEKLRTGTNYYIVPVLRDHG
GGLTVSATTPNGTFVCPPRVVQTRKEVDHDRPLAFFPENPKEDVVRVSTDLNINFSAFMP
CRWTSSTVWRLDKYDESTGQYFVTIGGVKGNPGPETISSWFKIEEFCGSGFYKLVFCPTV
CGSCKVKCGDVGIYIDQKGRRRLALSDKPFAFEFNKTVYF

3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.

Using https://www.bioinformatics.org/sms2/rev_trans.html:

atgaaagaactgaccatgctgagcctgagctttttttttgtgagcgcgctgctggcggcg
gcggcgaacccgctgctgagcgcggcggatagcgcgccgaacccggtgctggatattgat
ggcgaaaaactgcgcaccggcaccaactattatattgtgccggtgctgcgcgatcatggc
ggcggcctgaccgtgagcgcgaccaccccgaacggcacctttgtgtgcccgccgcgcgtg
gtgcagacccgcaaagaagtggatcatgatcgcccgctggcgttttttccggaaaacccg
aaagaagatgtggtgcgcgtgagcaccgatctgaacattaactttagcgcgtttatgccg
tgccgctggaccagcagcaccgtgtggcgcctggataaatatgatgaaagcaccggccag
tattttgtgaccattggcggcgtgaaaggcaacccgggcccggaaaccattagcagctgg
tttaaaattgaagaattttgcggcagcggcttttataaactggtgttttgcccgaccgtg
tgcggcagctgcaaagtgaaatgcggcgatgtgggcatttatattgatcagaaaggccgc
cgccgcctggcgctgagcgataaaccgtttgcgtttgaatttaacaaaaccgtgtatttt

3.3. Codon optimization.

Describe why you need to optimize codon usage. Which organism have you chosen to optimize the codon sequence for and why?

Proteins are translated from mRNA by tRNA’s. The tRNA’s “pair” with codons from the mRNA. A codon is a 3-base sequence which is then mapped onto a single amino acid. As we covered last week, there are 64 different codons (permutations of a string of 3 nuceleotide bases) which map down to only 20 amino acids. The degeneracy means we can swap out parts of the DNA/mRNA to express the same amino acids aka proteins. Why would we do this? Because mRNA codons are translated into amino acids by the available tRNA in the organism. Each tRNA matches a codon (or several synonymous codons, see wobble pairing at 3rd base). There is not a uniform concentration of tRNA for all codons. So some mRNA codons will translate more efficiently than others, because there is more tRNA.

To restate:

DNA encodes triplet codons.
mRNA is transcribed from DNA.
Ribosomes read mRNA in triplets.
tRNAs carrying amino acids base-pair with codons (binding with the tRNA’s complementary anticodon)
Translation rate is approximately proportional to local charged tRNA abundance and ribosomal processivity.

Multiple codons encode the same amino acid, yet different organisms use these synonymous codons at different frequencies (codon usage bias). If a gene from organism A is expressed in organism B without modification, the codon distribution may not match the tRNA pool of B.

You need to optimize codon usage in order to achieve (good) yields from your biomanufacturing process.

I choose Escherichia coli (E. coli) as the target host for optimization:

Takes less time
- Cell division is faster
Well established protocols to isolate plasmid
- Each cell has single chromosome
- Single circular plasmid
- Each replicated cell has exact copy of DNA
Easy method

3.4. You have a sequence! Now what?

What technologies could be used to produce this protein from your DNA? Describe in your words how the DNA sequence can be transcribed and translated into your protein. You may describe either cell-dependent or cell-free methods, or both.

Recombinant expression in a host organism like E. Coli.

Clone the coding sequence into an expression vector (a plasmid).
- Promoter - T7 under lac control: binds the RNA polymerase
- Ribosome binding site - Shine–Dalgarno AGGAGG: recruits ribosome
- Coding sequence - see Miraculin DNA sequence above.
- Terminator - hairpin-forming sequence: stops transcription
- Antibiotic resistance gene - ampR: for selection of culture
Transform into E. Coli (transform the plasmid into host cells.)
1. Bacteria are given a heat shock.
2. Colonies grow.
3. Pick colonies.
  1. Plate on ampicillin → only plasmid-containing cells survive.
4. Inoculate the liquid cultures (by introducing single colonies)
Induce expression (e.g., add IPTG if T7/lac system).
- T7 RNA polymerase binds promoter
- DNA is transcribed into mRNA
- Ribosome binds RBS on mRNA.
- tRNA translates into protein, stop at terminator.
  - tRNAs decode codons
  - Amino acids polymerize into polypeptide
Harvest. Cells are lysed. Protein is purified.
1. Lyse cells (sonication or chemical lysis).
2. Purify protein (e.g., His-tag + Ni-NTA affinity column).

Apparently E. coli is possible but non-ideal for a cysteine-rich, glycosylated plant secreted protein like miraculin.

3.5. [Optional] How does it work in nature/biological systems?

Describe how a single gene codes for multiple proteins at the transcriptional level. Try aligning the DNA sequence, the transcribed RNA, and also the resulting translated Protein!!! See example below.

HW.4: Twist DNA Synthesis Order

Steps to build a plasmid:

Import DNA into Benchling.
Add promoter, RBS, start/stop codons, 7x His Tag, and terminator
Export .fasta and import into Twist.
Order Twist clonal gene, using pTwist Amp High Copy vector.
Export .gb (genbank) file for plasmid.
Import plasmid .gb file into Benchling, open Info>Toplogy and set Circular.

HW.5: DNA Read/Write/Edit

DNA Read

No idea. Possibly my basil plant.

(ii) In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why?

I would use long-read sequencing (1–100+ kb). Even though it is more expensive, it would provide greater accuracy.

The way that DNA sequencing works currenly is by taking DNA, lysing it, and then reassembling fragments based on probabilistic approaches. The “read length” refers to how large these fragments are in terms of base pairs. A fragment of length = 1 bp would be near useless, since there is no way to “place” it probabilistically within the greater genome. A fragment of length = 150bp map well because apparently the human genome is largely non-repetitive at that scale.

Short-read sequencing is a read of 50–600 bp. Long-read sequencing is 1-100 kb.

Technologies:

Polymerase-based sequencing
Enzymatic digest sequencing
Nanopore sequencing
DNA microarrays

DNA Write

(i) What DNA would you want to synthesize (e.g., write) and why?

I have no idea.

(ii) What technology or technologies would you use to perform this DNA synthesis and why?

Recombinant DNA synthesis
Oligonucleotide synthesis - can make complex motifs, extremely large DNA molecules (1kbp+)

DNA Edit

(i) What DNA would you want to edit and why?

I have no idea. Potentially plant DNA. I don’t know anything about what DNA plants have. I would like to figure out how to increase the growth speed, change the bark texture. Or even doing experiments on yeast. Perhaps I could figure out the enzymes/proteins and what DNA/genes code for it, and then edit that.

(ii) What technology or technologies would you use to perform these DNA edits and why?

CRISPR-Cas9

Week 03 HW

https://2026a.htgaa.org/2026a/course-pages/weeks/week-03/index.html

HW.1: Python Script for Opentrons Artwork

Review recitation materials and lab documentation.
Design artwork using the GUI at opentrons-art.rcdonovan.com.
Write a Python script using coordinates from the GUI via the “HTGAA26 Opentrons Colab”.
Sign up for a robot time slot and run the script on the Opentrons robot.
Submit Python file via provided form.

HW.2: Post-Lab Reflection

2.1. Find and describe a published paper utilizing Opentrons or similar liquid handling automation tools.

The paper I have found: Slowpoke: An Automated Golden Gate Cloning Workflow for Opentrons OT‑2 and Flex

Slowpoke is a tool which generates Opentron protocols for DNA assembly. DNA assembly is used to assemble larger strings of DNA than can be synthesised in one go, by joining together oligonucleotides.

It provides facilities to automate:

Golden Gate Cloning - automates the DNA assembly reaction setup, E. Coli transformation, and plating.
Colony PCR - automates colony PCR screening of resulting transformants.

Users provide for:

Golden Gate Cloning
1. Genetic toolkit map - e.g. MoClo YTK, STK plate layout
2. Custom parts map
3. Combination file
Colony PCR
1. Colony template positions
2. PCR deck maps
3. Reaction recipes.

The robot protocol automates the full pipeline of assembly, transformation, plating and colony PCR:

DNA and enzyme buffer extraction.
Golden gate reaction.
Transformation.
Plating.
Colony PCR.

It is compatible with multiple MoClo/Golden Gate toolkits (YTK, STK, and extensible to others).

Manual steps still required:

Colony picking - most labour-intensive step.
Sealing PCR plates in OT-2 thermocycler module.
Transferring PCR tubes to benchtop thermocycler.
Incubation, strain storage, and plasmid purification - still accounts for a lot of time.

https://github.com/Tom-Ellis-Lab/Slowpoke

2.2. Describe your intended automation use for your final project, including pseudocode, scripts, or implementation plans.

I intend to use a cloud lab platform to screen an array of biosensor constructs that I have designed, synthesised, and expressed using cell-free protein synthesis (CFPS).

HW.3: Final Project Ideas

Submit 1–3 slides with three individual project concept ideas.

Week 04 HW

Part A. Conceptual Questions

Answer any NINE of the following questions from Shuguang Zhang:

1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)

Meat is roughly 20% protein by weight, so 0.2*500g=100g of protein. This is the only amino acid in meat, as carbs are sugars, and fats are triglycerides (fatty acids + glycerol).

1 g = 6.02217364335E+23 dalton

100 g = 6.022173643E+25 daltons

2. Why do humans eat beef but do not become a cow, eat fish but do not become fish?

Humans eat beef but don’t become a cow because the stomach metabolizes complex proteins and cells down to base level molecules and amino acids.

3. Why are there only 20 natural amino acids?

For evolutionary reasons probably. Much like how human language has a finite set of phonemes which allow us to express infinitely more higher-level syllables, words, and concepts and sentences - biology has a base grammar of 20 units. This has proven to be enough - 64 codons map onto 20 amino acids plus stop signals. There may have been more but this is evidently evolutionarily optimal as it is now.

4. Can you make other non-natural amino acids? Design some new amino acids.

β-amino acids are interesting - usually the amino group is attached onto the α-carbon, but here they are attached on the β-carbon. Due to this, proteases (enzymes which support digestion) are highly ineffective against β-peptides.

Others I googled:

Fluoroleucine — leucine with fluorine substituted in; more hydrophobic and metabolically stable
Azidohomoalanine — methionine analog with an azide group, useful for click chemistry bioconjugation

5. Where did amino acids come from before enzymes that make them, and before life started?

Meterorites that naturally carry amino acids
Miller-Urey experiment showed amino acids could form spontaneously from simple molecules + an electric arc

6. If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?

Left-handed. Natural α-helices are right-handed because they’re built from L-amino acids. D-amino acids are the mirror image, so the resulting helix is the mirror image too — left-handed.

7. Can you discover additional helices in proteins?

Yes — beyond the common α-helix, proteins also contain 3₁₀-helices (3 residues per turn, tighter) and π-helices (4.4 residues per turn, rarer and wider). These are already known but underappreciated. Computational analysis of PDB structures keeps surfacing edge cases and unusual conformations that don’t fit neatly into existing categories.

8. Why are most molecular helices right-handed?

Because natural amino acids are L-enantiomers. The geometry of the L-α-carbon makes right-handed coiling energetically favorable — the side chains point outward without steric clashes in a right-handed helix. A left-handed helix built from L-amino acids would force side chains into the backbone, creating strain.

9. Why do β-sheets tend to aggregate? What is the driving force for β-sheet aggregation?

β-sheets have “sticky edges” — the backbone NH and C=O groups along the edge strands are unsatisfied hydrogen bond donors/acceptors. These can pair with the edge of another β-sheet. The driving forces are hydrogen bonding along the backbone and hydrophobic stacking between sheet faces. This makes lateral growth into large, ordered aggregates thermodynamically favorable.

10. Why do many amyloid diseases form β-sheets? Can you use amyloid β-sheets as materials?

When proteins misfold or partially unfold, they expose their backbone, which can then hydrogen-bond with other misfolded proteins into cross-β structure (hydrogen bonds running perpendicular to the fibril axis). This structure is extremely stable — often more so than the native fold — so once nucleation starts, it propagates. Many proteins will form amyloid under the right conditions; some just do so more readily due to sequence composition or environmental stress.

Yes, amyloid fibrils can be used as materials — they’re stiff, stable, and self-assembling. Researchers have used them as scaffolds for nanomaterials, hydrogels, and functional coatings.

11. Design a β-sheet motif that forms a well-ordered structure.

Part B. Protein Analysis and Visualization

(1) Briefly describe the protein you selected and why you selected it.

Insulin. A 51-amino acid peptide hormone secreted by pancreatic β-cells that regulates blood glucose by signaling cells to take up glucose. I chose it because it’s small and well-studied, historically significant (first recombinantly produced therapeutic protein), and I’m curious how something so tiny has such a large physiological effect.

(2) Identify the amino acid sequence of your protein. How long is it? What is the most frequent amino acid? How many protein sequence homologs are there? Does your protein belong to any protein family?

51 amino acids total — two chains: A (21 aa) and B (30 aa), linked by two disulfide bonds. Most frequent: leucine (L) and cysteine (C), both at 6 occurrences. Many homologs — the insulin/IGF/relaxin superfamily includes IGF-1, IGF-2, relaxin, and insulin-like peptides across many organisms. Belongs to the insulin family (InterPro: Insulin/IGF/relaxin superfamily).

(3) Identify the structure page of your protein in RCSB. When was the structure solved? Is it a good quality structure? Are there any other molecules in the solved structure apart from protein? Does your protein belong to any structure classification family?

PDB: 4INS — human insulin hexamer, solved in 1989 at 1.5 Å resolution. Good quality structure. In addition to protein, the hexamer contains two zinc ions (Zn²⁺) coordinated by His B10 residues at the center, plus water molecules.

Why extra zinc ions? Insulin is stored as a zinc-stabilized hexamer in β-cells; once secreted, the hexamer dissociates into monomers and the zinc stays behind, so zinc is necessary for storage and secretion but not for receptor binding. Zinc deficiency is linked to impaired insulin secretion and increased type 2 diabetes risk.

In structural classification, insulin belongs to the “Insulin-like” fold under the all-α class.

(4) Open the structure in 3D visualization software. Visualize as “cartoon”, “ribbon”, and “ball and stick”. Color by secondary structure — does it have more helices or sheets? Color by residue type — what can you tell about hydrophobic vs hydrophilic distribution? Visualize the surface — does it have any binding pockets?

Color by secondary structure — does it have more helices or sheets

It has helices

Red spirals = α-helices

Yellow flat arrow shapes = β-sheets

Color by residue type — does it have more helices or sheets

PyMOL: util.cbag

- green is helices. I don’t see any β-sheets.

Visualize the surface — does it have any binding pockets?

Red is helix, Yellow is sheet, Green is loop.

PyMOL: show surface

The surface doesn’t show a deep binding pocket — the receptor-binding interface is relatively flat.

Part C. Using ML-Based Protein Design Tools

4INS : GIVEQCCTSICSLYQLENYCNFVNQHLCGSHLVEALYLVCGERGFFYTPKT

C1. Protein Language Modeling

https://colab.research.google.com/drive/1Pu0Nmmpn-OjL_UDqrjnAhowjz1ZP1hKJ?usp=sharing

Deep Mutational Scans

Use ESM2 to generate an unsupervised deep mutational scan. Explain any particular pattern (choose a residue and mutation that stands out). (Bonus) Find sequences for which we have experimental scans, and compare the prediction of the language model to experiment.

Latent Space Analysis

Use the provided sequence dataset to embed proteins in reduced dimensionality. Analyze the different formed neighborhoods. Place your protein in the resulting map and explain its position and similarity to its neighbors.

Explain its position and similarity to its neighbors

!!! TODO - I don’t know enough to describe it.

C2. Protein Folding

Fold your protein with ESMFold. Do the predicted coordinates match your original structure? Try changing the sequence — first some mutations, then large segments. Is your protein structure resilient to mutations?

!!! TODO - Cannot see the coordinates. Structure looks interesting.

Predicted folded:

ptm0.413_r3_default.pdb - PDB file.

Original 4INS Ribbon Diagram:

C3. Protein Generation

Use ProteinMPNN to inverse-fold your protein backbone. Analyze the predicted sequence probabilities and compare to the original. Input the predicted sequence into ESMFold and compare the predicted structure to your original.

Generating sequences...
>tmp, score=2.0974, fixed_chains=[], designed_chains=['A'], model_name=v_48_020
GIVEQCCTSICSLYQLENYCNFVNQHLCGSHLVEALYLVCGERGFFYTPKT
>T=0.1, sample=0, score=1.1989, seq_recovery=0.3137
SIIEKCCYHNCTEEELLKYCPEENKKKCLKNLIKELKKKCGPKCYVKIPKP

Generating sequences...
>tmp, score=2.0873, fixed_chains=[], designed_chains=['A'], model_name=v_48_020
GIVEQCCTSICSLYQLENYCNFVNQHLCGSHLVEALYLVCGERGFFYTPKT
>T=0.1, sample=0, score=1.2054, seq_recovery=0.3137
GLYEKCCYSNCTLAEIAKYCPKKNKKKCVKSKIKELKKKCGPKCWVYVPPP

 New Sequence:GLYEKCCYSNCTLAEIAKYCPKKNKKKCVKSKIKELKKKCGPKCWVYVPPP

Comparing:

GIVEQCCTSICSLYQLENYCNFVNQHLCGSHLVEALYLVCGERGFFYTPKT
GLYEKCCYSNCTLAEIAKYCPKKNKKKCVKSKIKELKKKCGPKCWVYVPPP

ESMFold predicted structure vs. original:

Predicted:

Original:

Part D. Group Brainstorm on Bacteriophage Engineering

NA - Sick

Week 05 HW

Part A. SOD1 Binder Peptide Design

Superoxide dismutase 1 (SOD1) is a cytosolic antioxidant enzyme that converts superoxide radicals into hydrogen peroxide and oxygen. In its native state, it forms a stable homodimer and binds copper and zinc.

Mutations in SOD1 cause familial Amyotrophic Lateral Sclerosis (ALS). Among them, the A4V mutation (Alanine → Valine at residue 4) leads to one of the most aggressive forms of the disease. The mutation subtly destabilizes the N-terminus, perturbs folding energetics, and promotes toxic aggregation.

Challenge: Design short peptides that bind mutant SOD1, then decide which ones are worth advancing toward therapy.

Models used:

PepMLM: target sequence-conditioned peptide generation via masked language modeling
PeptiVerse: therapeutic property prediction
moPPIt: motif-specific multi-objective peptide design using Multi-Objective Guided Discrete Flow Matching (MOG-DFM)

Part 1: Generate Binders with PepMLM

Retrieve the human SOD1 sequence from UniProt (P00441), introduce the A4V mutation, and use the PepMLM Colab to generate four peptides of length 12 amino acids conditioned on the mutant sequence. Add the known binder FLYRWLPSRRGG for comparison. Record perplexity scores.

A4V mutant SOD1 sequence (deleted M at position 1, changed A→V at position 4):

ATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Generated peptides:

#	Binder	Pseudo Perplexity
1	WRSPAVAVAHWE	7.77
2	WRVGWVGVELKE	24.21
3	WRSPAAXIEHKX	11.24
4	WRVYAAXIEWGK	20.45
Known	FLYRWLPSRRGG	22.53

A note on perplexity: A lower perplexity score means higher model confidence that the peptide satisfies the criteria for binding the target.

Part 2: Evaluate Binders with AlphaFold3

Submit each peptide + mutant SOD1 as separate chains to the AlphaFold Server. Record the ipTM score and describe where each peptide appears to bind — does it localize near the N-terminus (A4V site), the β-barrel, or the dimer interface? Is it surface-bound or partially buried? In a short paragraph, describe the ipTM values and whether any PepMLM-generated peptide matches or exceeds the known binder.

Peptide	Binding location	ipTM score
WRSPAVAVAHWE	None	0.28
WRVGWVGVELKE	None	0.35
WRSPAAXIEHKX	None	0.33
WRVYAAXIEWGK	None	0.34

Part 3: Evaluate Properties in the PeptiVerse

Using PeptiVerse, evaluate the therapeutic properties of each peptide against the A4V mutant SOD1 sequence. Check: predicted binding affinity, solubility, hemolysis probability, net charge (pH 7), and molecular weight.

Peptide	Solubility	Hemolysis	Binding Affinity	MW (Da)	Net Charge (pH 7)
WRSPAVAVAHWE	1.0	0.044 (Non)	5.361 (Weak)	1408.6	-0.14
WRVGWVGVELKE	1.0	0.117 (Non)	7.089 (Medium)	1457.7	-0.23
WRSPAAXIEHKX	1.0	0.011 (Non)	4.645 (Weak)	1158.5	0.85
WRVYAAXIEWGK	1.0	0.043 (Non)	6.724 (Weak)	1360.7	0.76
FLYRWLPSRRGG (known)	1.0	0.047 (Non)	5.962 (Weak)	1507.7	2.76

The best peptide to advance for wet lab validation would be WRVGWVGVELKE due to its relatively high binding affinity (7.089, Medium).

Part 4: Generate Optimized Peptides with moPPIt

Using the moPPIt Colab: paste your A4V mutant SOD1 sequence, choose specific residue indices to target (e.g. near position 4, the dimer interface, or another surface patch), set peptide length to 12 aa, and enable motif + affinity guidance. Briefly describe how the moPPIt peptides differ from your PepMLM peptides. How would you evaluate these before advancing to clinical studies?

Binder	Hemolysis	Solubility	Affinity	Motif
SVKTKCCTTYQS	0.964	0.917	6.576	0.890
DDTKKCSCIQTH	0.975	0.917	6.314	0.915
ENGETFQCTKKV	0.970	0.833	6.044	0.935
KKSKKAFVCCVC	0.963	0.667	8.172	0.614

For the long execution time and computational resources required, the main advantage of moPPIt over PepMLM (in this context) is the motif score — there was no option to check motif specificity in PeptiVerse. All other properties of the PepMLM-generated sequences were comparable to the moPPIt peptides.

Part B. BRD4 Drug Discovery Platform Tutorial

(Optional — skipped)

Part C. Final Project: L-Protein Mutants

High level summary: The objective of this assignment is to improve the stability and auto-folding of the lysis protein of a MS2-phage. This mechanism is key to the understanding of how phages can potentially solve antibiotic-resistance.

Full instructions: Google Doc

Chose option 3: generating random mutations in the lysis protein while avoiding loss-of-function or nonsense codons. A Python script (Colab) was used to load active mutations from experimental data and apply them randomly to unique positions.

Generated sequences:

0. METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT (Original)
1. METRFPQQSQQTPASTNRRRPFKHEDYPCQRQQRSSTLYVLIFLAFFLSKFTNQLLLSLLEAVIRTVTTLQQLLT
2. METRFPQQSQQTLAATNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT
3. METRFPQQSQQTPASTNRRRPFKHGGYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

AF2 Multimer was used to co-fold mutant sequence 1 with DnaJ. The plDDT score indicates low model confidence in the folding of the mutant L protein. Overall, the random mutation approach is very time-consuming for obtaining leads.

Week 06 HW

What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?

A Phusion HF PCR Master Mix is a pre-combined PCR reaction system optimised for a specific engineered DNA polymerase.

Phusion DNA polymerase — provides the catalytic activity which synthesises DNA, and includes a 3’→5’ exonuclease proofreading to reduce error
Reaction buffer
MgCl₂ — magnesium ions
dNTPs — deoxynucelotide triphosphates: dATP, dCTP, dGTP, and dTTP
Stabilizers/additives
Water

A typical setup only requires after adding:

Forward primer
Reverse primer
Template DNA
Additional water to reach final volume

What are some factors that determine primer annealing temperature during PCR?

Factors:

Primer melting temperature — dominant factor
GC content of primer
Primer length
Sequence features
Salt concentration in the reaction buffer
Template–primer mismatch tolerance

There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other.

PCR creates DNA fragments by enzymatic replication using primers that define the fragment boundaries.

Protocol: The reaction contains template DNA, forward and reverse primers, dNTPs, buffer, Mg²⁺, and a thermostable DNA polymerase (e.g. Phusion or Taq). The protocol cycles temperature: denaturation (~95 °C) separates strands, annealing (~50–65 °C) allows primers to bind, and extension (~72 °C) synthesizes new DNA. After ~25–35 cycles, the region between the primers is exponentially amplified, producing many linear copies of a precisely defined sequence.

Restriction enzyme digestion produces linear fragments by cutting DNA at specific recognition sequences using restriction endonucleases.

Protocol: The protocol involves incubating DNA with one or more enzymes in the appropriate buffer (often ~37 °C) for a set time. The enzyme recognizes a short sequence (typically 4–8 bp) and cleaves the phosphodiester backbone, generating fragments with defined ends (blunt or sticky). The resulting fragment sizes depend entirely on where those recognition sites exist in the DNA.

Conceptually, PCR synthesizes a fragment by copying between two designed boundaries, whereas restriction digestion extracts a fragment by cutting an existing molecule at predetermined sequence motifs.

To understand when both are useful, consider an objective: engineer E. coli to produce human insulin, which requires building a plasmid containing the insulin gene under a bacterial promoter.

3 difference scenarios for getting insulin:

DNA comes from a biological sample (e.g. human genomic DNA). The insulin gene is buried inside billions of unrelated bases, so PCR is used to isolate and amplify only that specific region using primers that define its boundaries. PCR is therefore used when the goal is to retrieve a specific gene from a complex DNA mixture.
DNA already exists in a plasmid (e.g. moving GFP from plasmid A into plasmid B). The fragment is already isolated, so restriction enzymes are used to cut DNA at specific recognition sequences, allowing the gene to be excised and inserted into another vector. Restriction digestion is therefore used when the task is to cut and rearrange existing DNA molecules.
DNA is chemically synthesized because the sequence is already known. The synthesized fragment may still be PCR-amplified if more copies are needed, and restriction enzymes (or similar assembly methods) are used to insert it into plasmids. In practice, PCR isolates or amplifies sequences, while restriction enzymes cut DNA molecules so fragments can be inserted, removed, or reorganized.

How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning?

Desiderata: Linear DNA fragments whose terminal 20–40 bp regions are perfectly homologous to the neighboring fragment, unique, structurally stable, and present in a clean preparation so the Gibson enzymes can expose the overlaps, allow annealing, fill gaps, and ligate the final construct.

How does the plasmid DNA enter the E. coli cells during transformation?

A cell is boundaried by a lipid membrane wall. It is not a solid wall. It is more of a dense molecular fluid. Each phospholipid is held in place only by weak interactions (hydrophobic forces, van der Waals forces). Like an electron has no fixed static wall, rather a field it creates, a cell has no fixed solid wall, it is a highly dense molecular fluid.

Lipids constantly fluctuate, small gaps appear and disappear.

During transformation, the culture is treated to brief heat shock (~42 °C for ~30–60 s). The rapid temperature change causes a sudden increase in the lipid kinetic energy, resulting in transient disordering of phospholipid packing, resulting in transient aqueous pores in bilayer. Plasmid DNA molecules enter through these pores.

This is paired with a treatment of calcium ions, which neutralises negative charges on the DNA phosphate backbone and the membrane surface, and thus reduces electrostatic repulsion between DNA and cell envelope.

Describe another assembly method in detail (such as Golden Gate Assembly)

Explain the other method in 5–7 sentences plus diagrams (either handmade or online).

Design fragments with Type IIS sites and specific 4-bp overhangs. PCR amplify or synthesize fragments with those flanking sites. Mix fragments, plasmid backbone, Type IIS enzyme, ligase, and buffer. Run digestion–ligation thermal cycles. Transform assembled plasmid into bacteria.

Model this assembly method with Benchling or Asimov Kernel!

Week 07 HW

https://2026a.htgaa.org/2026a/course-pages/weeks/week-07/index.html

Part 1: Intracellular Artificial Neural Networks (IANNs)

What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions?

They can compute continuous values, not just discrete ones.

Describe a useful application for an IANN; include a detailed description of input/output behavior, as well as any limitations an IANN might face to achieve your goal.

A feedback-response mechanism, for example how the pancreas runs a sort of PID controller / control system for managing glucose levels in the body - through two hormones that move glucose levels in opposite directions - insulin and glucagon.

Part 2: Fungal Materials

What are some examples of existing fungal materials and what are they used for? What are their advantages and disadvantages over traditional counterparts?

Mycelium composites. You can use them to grow coffee cups, fashion (cloths).

What might you want to genetically engineer fungi to do and why? What are the advantages of doing synthetic biology in fungi as opposed to bacteria?

You might want to genetically engineer fungi to grow coffee cups.

The advantage is that in fungi, the chassis is the product - filamentous growth means the organism shapes itself into the material. Bacteria are essentially isotropic point-particles in materials terms; hyphae are intrinsically networked, anisotropic, and 3D, which is the prerequisite for any morphogenetic program over a material.

Part 3: First DNA Twist Order

Review the Individual Final Project documentation guidelines.

Submit the Google Form with your draft Aim 1, final project summary, HTGAA industry council selections, and shared folder for DNA designs.

Review Part 3: DNA Design Challenge of the week 2 homework. Design at least 1 insert sequence and place it into the Benchling/Kernel/Other folder you shared in the Google Form above. Document the backbone vector it will be synthesized in on your website.

N/A

Week 08 HW

https://2026a.htgaa.org/2026a/course-pages/weeks/week-08/index.html

There was no week 8 - GOTCHA

On to week 9

Week 09 HW

https://2026a.htgaa.org/2026a/course-pages/weeks/week-09/index.html

General homework questions

Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production.

Cell-free protein synthesis is technology where proteins are produced using lysed cell machinery (ribosomes, enzymes) rather than living cells.

The main advantages of CFPS:

Rapid, direct expression from DNA (plasmids). Bacterial colonies require tagging, growth and selection, whereas CFS are simpler and more straightforward.
Direct control over the reaction environment. CFS systems work more directly with primitive biomolecular elements, rather than encapsulated cells, so they are more straightforward to measure, instrument and perturb.

Describe the main components of a cell-free expression system and explain the role of each component.

The major components are:

Core cell extract (lysate).
- Contains ribosomes, tRNAs, translation factors, metabolic enzymes, chaperones, transcription machinery
- This performs the core processes of protein production. Transcription and translation: DNA → mRNA → protein
DNA template. e.g. plasmid DNA, linear PCR
Energy system - supplies chemical energy (ATP, GTP) for transcription and translation.

Why is energy provision regeneration critical in cell-free systems? Describe a method you could use to ensure continuous ATP supply in your cell-free experiment.

Transcription and translation burn ATP and GTP very quickly. Without sufficient ATP/GTP production, the reaction rapidly stalls.

Consider a car. Igniting fuel will not propel a car forward, it will dispel energy uselessly. However attaching a piston and a chamber can direct that energy in one direction.

A cell is like a car with a piston. A cell-free system lacks such a system.

This is accomplished in a cell through the membrane. The cell membrane (composed of lipids) creates constrained geometry and controlled sequencing of interactions.

Describe a method you could use to ensure continuous ATP supply in your cell-free experiment..

Add phosphoenolpyruvate, PEP, plus pyruvate kinase.

Protocol: I would run a small optimization matrix rather than assume one energy condition is best. For example, I would test 10, 20, and 40 mM PEP with fixed 1.5 mM ATP and fixed pyruvate kinase, then measure protein yield at 0, 30, 60, 120, and 240 minutes. If yield stops early while ATP is low, I would increase PEP or pyruvate kinase. If yield stops despite ATP remaining, then the problem is probably not ATP supply but substrate depletion, pH drift, magnesium imbalance, mRNA degradation, or protein instability. This separates “energy failure” from other failure modes.

Compare prokaryotic versus eukaryotic cell-free expression systems. Choose a protein to produce in each system and explain why.

Prokaryotic: lacking a nucleus. Examples: E. Coli lysates.

Eurokaryotic: having a nucleus. Examples: Wheat Germ Extract, Rabbit Reticulocyte Lysate, Tobacco BY-2 Lysate

Let’s produce INS_HUMAN (insulin).

Prokaryotic.

Oxidizing E. coli lysate (e.g., PURExpress + DsbC, or SHuffle-derived extract).
Target: proinsulin (B–C–A single chain).
Why? The downstream refold + cleavage chemistry is already industrially solved

Eukaryotic.

Rabbit Reticulocyte Lysate + canine pancreatic microsomes.
Target: preproinsulin
C-peptide still cleaved enzymatically post-translation.

The biosynthesis of insulin involves multiple intermediary steps:

preproinsulin: Insulin is synthesized as an inactive precursor molecule, a 110 amino acid-long protein called preproinsulin
proinsulin: Preproinsulin is translated directly into the rough endoplasmic reticulum (RER), where its signal peptide is removed by signal peptidase to form proinsulin
proinsulin:
- As the proinsulin folds, opposite ends of the protein, called the “A-chain” and the “B-chain”, are fused together with three disulfide bonds.[26]
- Folded proinsulin then transits through the Golgi apparatus and is packaged into specialized secretory vesicles, or granules.[26]
- In the granule, proinsulin is cleaved by proprotein convertase 1/3 and proprotein convertase 2, removing the middle part of the protein, called the “C-peptide”.[26]
- Finally, carboxypeptidase E removes two pairs of amino acids from the protein’s ends, resulting in active insulin

Preproinsulin contains a signal peptide. Signal peptides function to prompt a cell to translocate the protein, usually to the cellular membrane.

C-peptide is a connecting protein produced when the pancreas cleaves proinsulin to make mature insulin. C-peptide excision is required and is done by PC1/3 + PC2 in vivo — neither CFE system reproduces that, so both routes need an in vitro cleavage step (trypsin + carboxypeptidase B, ?apparently? standard industrial route).

prokaryotic = make it cheap and finish it on the bench;
eukaryotic = watch it fold the way a β-cell folds it.

How would you design a cell-free experiment to optimize the expression of a membrane protein? Discuss the challenges and how you would address them in your setup.

(This question was super hard and I was lost. I am verbatim coping the response from the Claude LLM here because I really loved how first-principles the answer was.)
Concepts I became aware of:
membrane protein
fake wall - detergents, liposomes
concept of a “screen”
readout - another new term which fits into screen. in programming, we’d just call it a test.

Proteins are chains of amino acids that fold into 3D shapes. Most proteins are water-soluble: their surface is “hydrophilic” (water-loving), they float around in the watery interior of the cell, no problem.

Membrane proteins are different. They’re designed to live embedded in the cell’s outer wall — the lipid bilayer, which is a thin sheet of greasy fat molecules. The parts of the protein that sit inside that greasy sheet are themselves greasy (“hydrophobic”, water-hating). Think of a bolt designed to be installed through a wall: the threaded shaft is meant to be inside the wall material, only the head and tail stick out into air.

Now: in CFPS, there’s no wall. The protein gets manufactured into pure water. Its greasy mid-section is exposed to water, which it hates. It does what greasy things do in water — clumps up with other greasy bits (think oil droplets in vinegar). It misfolds, aggregates into junk, and you get no usable product.

So the central engineering problem for membrane-protein CFPS is: you have to provide a fake wall for the protein to embed into while it’s being made. Everything else in the protocol is in service of that.

The menu of fake walls (in rough order of complexity):

Detergents — soap molecules. They have a greasy tail and a water-loving head, so they can wrap around the greasy parts of the protein and keep it in solution. Cheap, easy. Downside: detergents are often harsh enough to denature the protein.
Liposomes — actual little lipid bubbles, basically tiny vesicles made of the same fat the real membrane is made of. The protein can insert into the bubble’s wall as it’s manufactured. More native, harder to work with.
Nanodiscs — a clever trick: a small flat patch of lipid bilayer (~10 nm across) held together by a belt of protein around the edge, like a coin made of fat with a metal rim. The membrane protein sits in the middle of the disc. Very clean, very defined, beloved by structural biologists. Most expensive.
SMALPs — a synthetic polymer (SMA) that does the nanodisc trick without needing the protein belt. Cheaper than nanodiscs.

The supporting cast. A real cell does more than just provide a wall. It has:

Translocons — machines built into the membrane that thread the protein into the wall as it’s being made (like a sewing machine guiding fabric through). Without these, the protein doesn’t insert correctly even if a wall is present. You can buy these as “inverted membrane vesicles” — little fragments of bacterial membrane with the translocons still in them — and dump them into the reaction.
Chaperones — helper proteins that prevent misfolding. Add as purified extras.
Redox environment — some membrane proteins have internal “staples” (disulfide bonds) that only form in an oxidizing environment. Standard cell juice is reducing (the opposite). You add chemicals to flip it.

How you’d actually run the experiment. You’re optimizing across maybe 5 variables (which wall, how much wall, with or without translocons, temperature, redox state). Too many combinations to try one at a time, so you do a screen: 48–96 small parallel reactions in a plate, each with a different combination, and a fast readout that tells you which ones worked.

The standard fast readout uses split-GFP: GFP is the green fluorescent protein, and you can split it into a big piece and a tiny tag (11 amino acids). Attach the tiny tag to your membrane protein. Add the big piece to the reaction. They only find each other and turn green if your membrane protein folded correctly and the tag is accessible. So fluorescence = success, no fluorescence = junk. Read the plate in a few minutes, identify the winning conditions, then scale those up.

Imagine you observe a low yield of your target protein in a cell-free system. Describe three possible reasons for this and suggest a troubleshooting strategy for each.

Yield ≈ template availability × transcription efficiency × translation efficiency × product stability × reaction lifetime

Aspects of process design:

DNA template supply
Transcription (DNA -> mRNA)
Translation (mRNA -> protein)
Folding / activation
Energy + raw materials supply
- system starts strong, then fuel runs out or waste accumulates
- ATP donor depleted, Amino acids consumed, NTP depletion, Phosphate buildup, Enzymes lose activity, Too much DNA overloads machinery
Process consistency

Troubleshooting is assumed to mean diagnosis here. Implementation of a solution is up to the experimetal designer.

Three possible reasons for low yield:

Poor folding. The protein may be successfully translated, but immediately misfolds, aggregates, or becomes insoluble. This is especially common for large proteins, membrane proteins, disulfide-rich proteins, or eukaryotic proteins expressed in bacterial extracts.
- Troubleshooting strategy:
  - DNA degradation: incubate DNA in extract, Sample at 0, 15, 30, 60 min, Run DNA gel or qPCR across the gene
Energy depletion / reaction burnout. Cell-free systems consume ATP extremely rapidly.
- Troubleshooting strategy:
  - Assume we measure yield somehow - eg. fluorescence (e.g. GFP), luciferase activity, SDS-PAGE band intensity, western blot, or mass spectrometry
  - Can diagnose energy depletion as yield declining abnormally (e.g. abruptly). Supplement fresh ATP mix midway.
    - if production resumes -> energy limitation confirmed
Poor template availability / transcription inefficiency.
- Troubleshooting strategy:
  - Weak Transcription initiation: measure mRNA by RT-qPCR, compare target template against a known-good positive-control template, keep coding region same, swap promoter/5’ region
    - if DNA intact but mRNA low = transcription problem
    - if positive control produces mRNA but target does not = target promoter/sequence design problem
    - if both fail = extract/polymerase/reaction chemistry problem

Useful synthetic cell.

Design an example of a useful synthetic minimal cell as follows:

Pick a function and describe it.
1. What would your synthetic cell do? What is the input and what is the output?
The synthetic cell would grow wood. The input is CO2 and photons and output is the structure it grows. Wood is a composite material made out of 40% cellulose (contained within cells), 30% lignin, and 30% hemicellulose.
1. Could this function be realized by cell-free Tx/Tl alone, without encapsulation?
No, since wood is made from cells. Cell-free structure could be created using crystalline approaches.
1. Could this function be realized by genetically modified natural cell?
Yes, but it would be more complex in the limit.
1. Describe the desired outcome of your synthetic cell operation.
Programming a tree to grow into a custom shape (a house) via morphological programming, growing a tree faster and more efficiently (without ancestral code which may no longer apply in current environment).
Design all components that would need to be part of your synthetic cell.
1. What would be the membrane made of?
Two lipids as in JVCI-syn3A - phosphatidylglycerol and a specific glycolipid.
1. What would you encapsulate inside? Enzymes, small molecules.
- Carbon capture and fixation - CETCH cycle inspired design?
- Energy production - cyanobacteria?
- Cellulose extrusion - Komagataeibacter? Not sure if it’s possible to put a bacteria in a cell
- Lignin production
There are many more things, I know. ;) These are the core ones that might be enough to answer the exercise.
1. Which organism your Tx/Tl system will come from? Is bacterial OK, or do you need a mammalian system for some reason? (hint: for example, if you want to use small molecule modulated promotors, like Tet-ON, you need mammalian)
Bacterial systems lack a nucleus and Tx/Tl happens in same place (vs. in human cell - where transcription occurs in the nucleus, translation occurs outside the nucleus but still in the cell).
I’m not sure yet. It would not seem we need mammalian gene complexes or closures from different organelles.
1. How will your synthetic cell communicate with the environment? (hint: are substrates permeable? or do you need to express the membrane channel?)
It needs to exchange:
- light
- CO2 / bicarbonate
- water
- ions
- small molecule substrates
- waste products
- possibly output molecules
Channels:
- Light - NA
- CO2 - membrane-permeable
- Bicarbonate - may need a bicarbonate transporter or channel
- Water - membrane-permeable
Experimental details
1. List all lipids and genes. (bonus: find the specific genes; for example, instead of just saying “small molecule membrane channel” pick the actual gene.)
There are so many haha. And it would be the most adequate to describe this as an engineering design - ie. a hierarchy of encapsulation.
- Lipids
  - phosphatidylglycerol
  - specific glycolipid
- Genes
  - Entire cyanobacteria 12 Mb genome
  - Entire Komagataeibacter (K. xylinus E25) 3.4 Mb genome
  - (missing) lignin production gene
1. How will you measure the function of your system?
This monolith synthetic cell will have too much uncertainty to design-build-test and solve in one iteration. It will definitely require divide-and-conquer to break it down into testable subunits.

Homework question from Peter Nguyen

Freeze-dried cell-free systems can be incorporated into all kinds of materials as biological sensors or as inducible enzymes to modify the material itself or the surrounding environment.

Choose one application field — Architecture, Textiles/Fashion, or Robotics — and propose an application using cell-free systems that are functionally integrated into the material. Answer each of these key questions for your proposal pitch:

Write a one-sentence summary pitch sentence describing your concept.

Seasoning which turns rapidly digestible starch into slower digestible starch, which reduces glucose spikes.

How will the idea work, in more detail? Write 3-4 sentences or more.

This product is an enzyme powder sprinkled onto starchy food after cooking, especially rice, potato, pasta, corn, or sweet potato. The active enzyme is a starch-debranching enzyme such as pullulanase or isoamylase, which cuts branch points in amylopectin and creates more linear starch chains. When the food is then cooled, these linear chains pack together into retrograded resistant starch, which human digestive enzymes break down more slowly than normal cooked starch. The result is not that carbohydrates disappear, but that some rapidly digestible starch is shifted into a slower-digesting or resistant form, reducing the speed and size of the post-meal glucose rise.

What societal challenge or market need will this address?

This addresses the need for simple food-preparation tools that reduce post-meal glucose spikes without requiring people to fully change what they eat. It is especially relevant for people managing diabetes, insulin dosing, blood glucose variability, or low-GI diets. It would be used like a seasoning or cooking aid for high-starch foods, not as a replacement for insulin or medical treatment.

How do you envision addressing the limitation of cell-free reactions (e.g., activation with water, stability, one-time use)?

The cell-free limitation would be handled by treating the product as a freeze-dried biochemical system rather than a living organism. The enzyme would be produced in microbes, purified, then dried with stabilizers such as trehalose or other protein-protective excipients so it can survive storage as an inactive powder. Water from the food rehydrates the system and turns the enzyme back on, while buffer salts keep the local pH in the enzyme’s working range. Because there is no cell to repair damaged proteins or regenerate itself, the reaction is designed as a one-time use system: the enzyme acts during a short warm window, modifies the starch structure, then becomes inactive during later heating, digestion, or storage.

Homework question from Ally Huang

Freeze-dried cell-free reactions have great potential in space, where resources are constrained. As described in my talk, the Genes in Space competition challenges students to consider how biotechnology, including cell-free reactions, can be used to solve biological problems encountered in space. While the competition is limited to only high school students, your assignment will be to develop your own mock Genes in Space proposal to practice thinking about biotech applications in space!

For this particular assignment, your proposal is required to incorporate the BioBits® cell-free protein expression system, but you may also use the other tools in the Genes in Space toolkit (the miniPCR® thermal cycler and the P51 Molecular Fluorescence Viewer). For more inspiration, check out https://www.genesinspace.org/ .

Provide background information that describes the space biology question or challenge you propose to address. Explain why this topic is significant for humanity, relevant for space exploration, and scientifically interesting. (Maximum 100 words)

Scientific research and invention requires intelligence. AI systems provide intelligence in much greater amounts, for orders of magnitude lower cost than biological intelligence. AI systems require energy and data centers. Building data centers in space can be done in a different tradeoff space - powered by solar (at much higher solar efficiency due to lack of atmosphere), unconstrained by regulatory cost. However transporting hardware to space is expensive.

What if we could build parts of the data center in space using cell-based systems? If transporting a seed to space could grow a tree, potentially we could transport a seed which grows a data center (or even a rack for a server).

Transporting seeds to space that grow into objects offers exponential savings compared to sending a data center as a manufactured object. This is due to the cost economics of sending load into space - the relationship between the mass of the rocket’s fuel and the mass of the payload is exponential, dictated by the Tsiolkovsky rocket equation. You can also think about it in the inverse - reducing load makes the fuel cost exponentially cheaper.

This requires solving:

morphological programming - how to program a single cell to grow into a 3D structure
materials design - ie. what is this rack made out of? Something that could survive in space’s vacuum? self-sealing lichen/coral

Name the molecular or genetic target that you propose to study. Examples of molecular targets include individual genes and proteins, DNA and RNA sequences, or broader -omics approaches. (Maximum 30 words)

Engineered photosynthetic cyanobacteria/lichen-like chassis genes controlling carbon fixation, extracellular polysaccharide secretion, melanin biosynthesis, mineralization, desiccation tolerance, and 3D morphogenesis.

Describe how your molecular or genetic target relates to the space biology question or challenge your proposal addresses. (Maximum 100 words)

These enable 3D structure production in the same way a plant would, mineralisation provides the hard dead shell that would provide protection of the living inner cells from the vacuum of space.

Clearly state your hypothesis or research goal and explain the reasoning behind it. (Maximum 150 words)

Above in Q1

Outline your experimental plan - identify the sample(s) you will test in your experiment, including any necessary controls, the type of data or measurements that will be collected, etc. (Maximum 100 words)

This experiment would be broken down:

solving morphological programming - growing a simple 3D shape from programming a single cell
building complex structures - instead of a shape made of a single material, growing a mineral layer and now the cell inner layer
testing photosynthesis - adding photosynthesis genes and building the mechnism for the organism to convert energy
sending it to space and testing it

Week 10 HW

https://2026a.htgaa.org/2026a/course-pages/weeks/week-10/index.html

Homework: Final Project

Identify at least one aspect of your project to measure (protein mass, sequence, biomarker presence/quantity, etc.).

For this homework, let’s say my project is to produce human insulin (INS_HUMAN) via recombinant synthesis (cell-based system - E. Coli).

The aspect I want to measure is purification/presence of the insulin protein.

Describe all elements you intend to measure and the measurement procedures in detail.

Total protein content
- Bradford essay on cell lysate
Apparent molecular weight and purity
- SDS-PAGE on a Tricine gel system
  - Band Homogeneity: A pure protein will display a single, distinct band after staining.
Identity confirmation
- Western blot with anti-human-insulin antibody
Exact mass and sequence integrity
- LC-MS (ESI-TOF) on the purified product
  - Target intact mass 5807.57 Da (oxidized form, 3 disulfides).
- Tryptic digest followed by LC-MS/MS for full sequence coverage (explained in questions below)
Disulfide bond verification
- Non-reducing peptide mapping by LC-MS - to confirm the three native disulfides
Quantification and purity
- Reversed-phase HPLC against a USP insulin reference standard
Host cell protein and endotoxin
- Anti-E. coli HCP ELISA and LAL assay for endotoxin
  - it detects and quantifies residual E. coli Host Cell Proteins (HCPs)

Specify the technologies you will use (gel electrophoresis, DNA sequencing, mass spectrometry, etc.).

Mass spectometry
- LC-MS and LC-MS/MS
SDS-PAGE
Reversed-phase HPLC

Homework: Waters Part I — Molecular Weight

Calculate the predicted molecular weight of eGFP based on its amino acid sequence using tools like the ExPASy compute_pi calculator.

Molecular weight (Da):
    26886.32 (average mass)
    26869.36 (monoisotopic mass)

https://web.expasy.org/cgi-bin/compute_pi/pi_tool_bis.cgi?P42212@1-238@average

Claude thinks I have the WT avGFP, not eGFP (wild type GFP, not enhanced GFP)? Not sure how to find this on UniProt.

Calculate the molecular weight using the adjacent charge state approach:

Select two charge states from the intact LC-MS data.

mh = 1.00728
m1=1037.4423   # 45%
m2=1000.4302   # 50%

Determine z for adjacent peaks using the provided formula.

z = (m2 - mH) / (m1 - m2) 
  = (1000.4302 - 1.00728) / (1037.4423 - 1000.4302) 
  = 27.002599690371586

So m1 is the [M+27H]²⁷⁺ peak, m2 is [M+28H]²⁸⁺.

Reasoning/intuition:

You don't know M, and you don't know which divisor goes with which peak.
The trick: adjacent peaks must differ by exactly 1 in the divisor

peak_27 ≈ M / 27
peak_28 ≈ M / 28
peak_29 ≈ M / 29

Determine the protein MW using the relationship between m/z, MW, and z.

m/z = (MW + z * mh) / z
MW = z * ((m/z) - mh)

MW = 27.002599690371586 * (m1 - mh)
   = 27.002599690371586 * (1037.4423 - 1.00728)
   = 27986.439950142267 Da

Calculate measurement accuracy by comparing experimental vs. theoretical weight.

error = (m_measured - m_predicted) / (m_predicted)
      = (27986.439950142267 - 26886.32) / (26886.32)
      = 0.0409174610040447
      = 4.092%

Can you observe the charge state for the zoomed-in peak in the intact eGFP mass spectrum? If yes, what is it? If no, explain why.

Isotypes are species of atoms, characterised by the number of neutrons.
Mass of 1 neutron : m=1.003 Da
In an isotope spacing, adjacent peaks are separated by m/z = 1.003/z Da
We can identify a cluster; the spacings are - delta_m = 1474.1005 - 1474.0481 = 0.0524 Da apart
You can calculate the charge as z = 1.003 / delta_m.
Calculating the charge, we get z2 = 19
Apply z2 to compute the MW, and you find the molecular weights from the two calculations agree to within an accepetable tolerance (100 ppm)

# Method 1 — adjacent charge states (main spectrum)
z1 = (m2 - mh) / (m1 - m2)
   = (1000.4302 - 1.00728) / (1037.4423 - 1000.4302)
   = 999.42292 / 37.0121
   = 27.00

MW = z1 * (m1 - mh)
   = 27 * (1037.4423 - 1.00728)
   = 27 * 1036.43502
   = 27983.79 Da

# Method 2 — isotope spacing (inset)
z2 = 1.003 / delta_m
   = 1.003 / 0.0524
   = 19.14
   ≈ 19

# Cross-check — apply z2 to the inset peak
MW_check = z2 * (m_inset - mh)
         = 19 * (1473.74 - 1.00728)
         = 19 * 1472.73272
         = 27981.92 Da

# Agreement
error = |MW - MW_check|
      = |27983.79 - 27981.92|
      = 1.87 Da

ppm = error / MW * 1e6
    = 1.87 / 27983.79 * 1e6
    = 67 ppm  ✓

Working for the isotope spacing:

# Two adjacent isotope peaks: same protein, same charge state z,
# but peak 2 has one extra neutron (so +1.003 Da in true mass)

m1 = (MW         + z*mh) / z   # lighter isotope
m2 = (MW + 1.003 + z*mh) / z   # heavier isotope (+1 neutron)

# Subtract
delta_m = m2 - m1
        = ((MW + 1.003 + z*mh) - (MW + z*mh)) / z
        = 1.003 / z

# Rearrange
z = 1.003 / delta_m

Small note on units: m/z effectively is in Da — mass (Da) divided by charge (just an integer count) leaves you with Da on the x-axis; some textbooks call the unit “Thomson” (Th), but numerically it’s the same as Da.

Homework: Waters Part II — Secondary/Tertiary Structure

Explain the differences between native and denatured protein conformations. What happens when a protein unfolds? How is this determined with a mass spectrometer? What changes appear in the spectra between analyses?

A protein is one or more chains of amino acids. When DNA is read, it is assembled and stored in another form in mRNA, and later assembled by RNA into an amino acid chain.

These amino acids have inherent physical properties that results in various forces which twist the shape into a certain form. This process is referred to as protein folding and is a process of energy minimisation. When the folding reaches an equilibrium the protein is referred to as folded into its native form.

The 3D form of a protein is what confers its functionality. Proteins are in a sense 3D machines composed of chemical elements. Some proteins are merely static shapes, such as signalling molecules. Whereas others have a dynamic mechanical function, such as the ATP synthase.

How is this determined with a mass spectrometer? What changes appear in the spectra between analyses?

A mass spectrometer is a device which measures the mass-to-charge ratio of gaseus ions, which can be used to identify chemical substances.

Molecules are chemical compounds of atoms (ie. CO2 - one carbon, two oxygen atoms). An atomic element is composed of protons and neutrons in its nucleus, and electrons in its orbit. Ionisation is unpairing an electron from an atom.

A mass spectrometer ionises a substance, producing a charge (a free electron). For most small molecules and atoms (like Carbon-12), ionization predictably produces a single, stable charge state (typically +1).

A mass spectrometer outputs a plot of intensity (y) and $m/z$ mass-charge quotient (x). Mass can be used to characterise a specific atom, and charge is emitted when the substance is present and ionised (thus emitting an electron).

Thus a mass spectrometer can be used to map the presence of atoms and larger structures (molecules and proteins).

From what we know:

a denatured protein is highly charged, where each basic residue (amino acid) tends to become protonated.
a native protein has few charges, evenly spread across the surface.

The differences between the two runs:

Native — envelope at high m/z (~3000–4000), narrow (4–5 peaks), few charges. Compact part, few surface sites.
Denatured — envelope at low m/z (~1000–2000), broad (15–25 peaks), many charges. Sprawl, sites everywhere.
Mass itself — native shows the assembled mass: subunits stuck together, plus any bound ligand or metal. Denatured shows only individual subunit masses; the assembly has come apart and the cargo has fallen off.
Peak sharpness — denatured peaks are crisp; native peaks are fuzzier because the folded part drags along bound water, salt adducts, and some conformational wobble.

To detect proteins, typically a divide-and-conquer strategy is used. Proteins are cut using an enzyme (a protease) into smaller groups of residues (peptides), and then the spectral measurement of the mass-charge profiles of all the individual peptides is used to match against a signature of existing measured peptides.

Digestion: divide-and-conquer using enzyme.
Separation: separate using liquid chromatography (LC), peptides pass through specrometer gradually.
Ionisation: peptides are converted into charged, gas-phase ions.

This relies on a database of peptide fragments, whose usage is detailed below:

Peptide identification problem: given an unknown peptide — a chain of amino acids — identify which amino acids it contains and in what order, using only mass-charge measurements. One measurement of the intact peptide gives total mass but not order; many different orderings yield the same mass. Workaround: break many copies of the peptide at random points along the chain, producing sub-chains of every possible length, then measure the mass-charge of every sub-chain. The resulting set of mass-charges encodes the sequence, but noisily and incompletely — decoding the amino-acid order directly from that pattern is ill-posed because many candidate sequences fit any partial set of sub-chain masses. Reformulate as lookup against the known list of proteins in the organism, which fixes every peptide that could possibly exist (~10⁶ candidates). For each candidate, predict what its sub-chain mass-charges should be; filter the candidate set down to ~10¹ by demanding the intact-peptide mass match; score the survivors by how well their predicted sub-chain masses overlap with the observation; take the best. Bound the error rate by running the same pipeline against a catalog of fake (reversed) sequences and tuning the score threshold so fake matches stay below 1% of accepted matches — a calibrated null check standing in for ground truth.

In cases where the peptide sequence is unknown, de novo sequencing is performed.

From the native eGFP mass spectrum, discern the charge state of the peak at ~2800 m/z. What is the charge state? How can you tell?

Homework: Waters Part III — Peptide Mapping (Primary Structure)

Count the Lysines (K) and Arginines (R) in the eGFP sequence; circle or highlight them.

MVSKGEELFT GVVPILVELD GDVNGHKFSV SGEGEGDATY GKLTLKFICT TGKLPVPWPT LVTTLTYGVQ CFSRYPDHMK QHDFFKSAMP EGYVQERTIF FKDDGNYKTR AEVKFEGDTL VNRIELKGID FKEDGNILGH KLEYNYNSHN VYIMADKQKN GIKVNFKIRH NIEDGSVQLA DHYQQNTPIG DGPVLLPDNH YLSTQSALSK DPNEKRDHMV LLEFVTAAGI TLGMDELYK

How many peptides will be generated from tryptic digestion?

Navigate to the ExPASy PeptideMass tool.
Copy/paste the eGFP sequence.
Replicate the parameters shown in Figure 4.
Report the number of peptides generated.

Based on the LC-MS peptide map data, count the chromatographic peaks between 0.5–6 minutes (>10% relative abundance).

23-25

Does the peak count match the predicted peptide number? Are there more or fewer peaks?

There are more peaks!

Identify the m/z of the peptide in Figure 5b. Determine the charge (z) of the most abundant charge state using isotope separation. Calculate the singly charged peptide mass [M+H]⁺.

z = (m2 - mH) / (m1 - m2) 
m/z = (MW + z * mh) / z
MW = z * ((m/z) - mh)
mh = 1.00728
m = MW + z * mh

Most abundant/intense peak: 525.76712

# spacing
delta_b = 526.25918 - 525.76712 = 0.49206 Da 

# charge
z = 1.003 / 0.49206 = 2.03836930456
  = 2

# molecular weight
MW = z * (m/z - mh)
   = 2 * (525.76712 - 1.00728)
   = 1,049.51968 Da

M+H = m + mh
M+H = 1,049.51968 + 1.00728
    = 1,050.52696

Note: MW is the neutral protein’s mass (nothing added); m is the mass of the ion — i.e., the protein plus the z protons it picked up to get charged.

Identify the peptide by comparing to expected masses from the PeptideMass tool. Calculate the mass accuracy in ppm.

HAHA NICE!

1050.5214 	115-123 	0 	FEGDTLVNR

Peptide is the sequence FEGDTLVNR

What percentage of the sequence is confirmed by peptide mapping (see Figure 6)?

88%?

Homework: Waters Part IV — Oligomers

Using charge detection mass spectrometry data and known subunit masses (Table 1), identify the locations of:

“FU” = functional unit (each KLH subunit is built from ~7–8 globular FU domains of ~50 kDa each); “decamer” = 10 subunits assembled into a ring. So:

7FU Decamer = 10 KLH2 subunits (each with 7 FUs) → ~3.5 MDa
8FU Didecamer = 20 KLH1 subunits (8 FUs each) in 2 stacked rings → ~8 MDa
8FU 3-Decamer = 30 KLH1 subunits in 3 stacked rings → ~12 MDa
8FU 4-Decamer = 40 KLH1 subunits in 4 stacked rings → ~16 MDa
7FU Decamer - 3.4
8FU Didecamer - 8.33
8FU 3-Decamer - 12.67
8FU 4-Decamer - predicted 16 MDa, not labelled explicitly on chart

Homework: Waters Part V — Did I Make GFP?

Fill in the table with lab-acquired data from the Waters Immerse Lab showing theoretical vs. observed molecular weight with PPM mass error.

N/A - no not have access to node / lab.

Week 11 HW

https://2026a.htgaa.org/2026a/course-pages/weeks/week-11/index.html

Part A: The 1,536 Pixel Artwork Canvas | Collective Artwork

Contribute at least one pixel to the global artwork experiment before editing ends. Discuss on the Discourse forum.

I missed the deadline for this, sorry!

Part B: Cell-Free Protein Synthesis | Cell-Free Reagents

Provide 1–2 sentence descriptions of each component’s role in the cell-free reaction:

E. coli Lysate: BL21 (DE3) Star Lysate (includes T7 RNA Polymerase)

Provides the core machinery for translation and transcription of proteins.

Salts/Buffer: Potassium Glutamate, HEPES-KOH pH 7.5, Magnesium Glutamate, Potassium phosphate monobasic, Potassium phosphate dibasic

Potassium Glutamate — Provides K⁺ at high concentration (~100–200 mM) to support ribosome function and translation fidelity
HEPES-KOH pH 7.5 — Zwitterionic Good’s buffer that holds pH near the optimum for transcription/translation enzymes despite acid production
Magnesium Glutamate — Supplies Mg²⁺, the critical divalent cofactor for ribosome assembly, tRNA structure, aminoacyl-tRNA binding, peptidyl transferase, and RNA polymerase activity
Potassium phosphate monobasic (KH₂PO₄) / dibasic (K₂HPO₄) — Together act as a secondary phosphate buffer near pH 7 and supply inorganic phosphate that feeds the energy regeneration system

Energy/Nucleotide System: Ribose, Glucose, AMP, CMP, GMP, UMP, Guanine

Ribose — Sugar substrate that gets phosphorylated to ribose-5-phosphate and then to PRPP, supplying the sugar-phosphate backbone for nucleotide salvage
Glucose — Primary carbon/energy fuel feeding glycolysis to regenerate ATP via substrate-level phosphorylation
AMP, CMP, GMP, UMP — Cheap nucleoside monophosphate inputs that kinases phosphorylate up to NTPs for transcription and translation
Guanine — Free base salvaged with PRPP into GMP, cheaply replenishing the heavily-consumed GTP pool used in translation

Translation Mix (Amino Acids): 17 Amino Acid Mix, Tyrosine, Cysteine

17 Amino Acid Mix — Provides 17 of the 20 proteinogenic amino acids as monomers that aminoacyl-tRNA synthetases load onto tRNAs for ribosomal polymerization

Additives: Nicotinamide

Inhibits NAD+ consuming enzymes. NAD⁺ is required at the GAPDH step of glycolysis, so once it’s gone, ATP regeneration from glucose stalls and translation dies.

Backfill: Nuclease Free Water

Nuclease-Free Water — Brings the reaction to final volume while avoiding contaminating RNases/DNases that would degrade the mRNA and DNA template.
- water that’s been DEPC-treated and/or filtered, packaged sterile, and certified by the manufacturer to have no detectable RNase activity

Describe the main differences between the 1-hour optimized PEP-NTP master mix and the 20-hour NMP-Ribose-Glucose master mix (2–3 sentences).

(This is my best effort answer using Claude’s LLM!)

The whole design question of any CFPS recipe is: what’s your refueling strategy.

PEP-NTP — bring premium fuel and a pressurized recharge cartridge. PEP-NTP mix supplies pre-made high-energy substrates directly.

Glucose-NMP — bring crude oil and an onboard refinery. NMP-Ribose-Glucose mix uses glucose-fed glycolysis as a slower but sustained ATP regenerator.

Part C: Planning the Global Experiment | Cell-Free Master Mix Design

For each of the 6 fluorescent proteins used for collaborative painting, identify and explain at least one biophysical or functional property affecting cell-free expression or readout (1–2 sentences each):

sfGFP

Engineered “superfolder” variant with rapid, chaperone-independent folding and fast maturation (~14 min), which is why it’s the default CFPS reporter

mRFP1

Slow chromophore maturation (hours) plus low brightness means much of the protein made during a short CFPS reaction never becomes fluorescent inside the readout window

mKO2

Combines fast maturation with comparatively low pKa for fluorescence (~5.5), making readout robust to the pH drift that occurs in glycolysis-fueled cell-free reactions as organic acids (lactate, pyruvate) accumulate

mTurquoise2

Highest quantum yield of any cyan FP (~0.93) and an unusually long fluorescence lifetime, so it gives a strong signal at low CFPS expression levels

mScarlet_I

The “I” (improved-maturation) variant trades a modest QY hit for a maturation time of ~36 min versus multi-hour for parent mScarlet,

Electra2

Blue FP with excitation ~403 nm and emission ~456 nm, which collides directly with NAD(P)H autofluorescence

Create a hypothesis for adjusting one or more reagents in the cell-free mastermix to improve a specific biophysical or functional property and maximize fluorescence over 36-hour incubation. Clearly state the protein, reagents, and expected effect.

Hypothesis: For expression of eGFP in BL21(DE3) Star E. coli lysate, increasing the ATP-regeneration capacity and stabilizing redox balance will extend productive translation time, improve chromophore maturation, and increase total fluorescence over a 36-hour incubation.

Protein: enhanced green fluorescent protein, eGFP.

Reagents to adjust: increase glucose or replace part of the glucose with maltodextrin as a slower-release carbon source

Expected effect: Maltodextrin should feed glycolysis more gradually than free glucose, reducing early fuel depletion and pH stress while sustaining ATP regeneration for longer

Begin composing master mix compositions here once assigned artwork wells are received.

Final phase: analyze fluorescence data to determine favorable reagent compositions (due one week after data return). Reaction composition per well: 6 μL Lysate, 10 μL 2X Optimized Master Mix, 2 μL assigned fluorescent protein DNA template, 2 μL custom reagent supplements (20 μL total).

Not sure where data was, sorry doing this HW late!!

Week 12 HW

https://2026a.htgaa.org/2026a/course-pages/weeks/week-12/index.html

No HW this week.

Week 13 HW

https://2026a.htgaa.org/2026a/course-pages/weeks/week-13/index.html

Work on final project

Week 14 HW

https://2026a.htgaa.org/2026a/course-pages/weeks/week-14/index.html

Finish and present final project