1 Describe a biological engineering application or tool you want to develop and why. An idea for your HTGAA final project / Your current research / A topic you are curious about I would like to develop a biological engineering solution that enables fast-food and beverage chains to produce biodegradable packaging made from their own food waste. The idea is to create a closed-loop system in which organic waste generated by restaurants—such as coffee grounds, food scraps, or plant-based residues—can be processed through bioengineering techniques (such as microbial fermentation, mycelium growth, or biopolymer extraction) and transformed into functional, compostable packaging materials.
Part 1: Benchling & In-silico Gel Art See this week’s lab protocol “Gel Art: Restriction Digests and Gel Electrophoresis” for details. Overview:
Make a free account at benchling.com Import the Lambda DNA. Simulate Restriction Enzyme Digestion with the following Enzymes:EcoRI, HindIII, BamHI, KpnI, EcoR, VSacI, and SalI. Create a pattern/image in the style of Paul Vanouse’s Latent Figure Protocol artworks. You might find Ronan’s website a helpful tool for quickly iterating on designs! </> Below are the steps I've done to finished this assignment. Account at benchling.com √ Have the .gb file of Lamdba_NEB saved. Upload the .gb file to the platform. Visual display of the structure of the DNA sequence. Very interesting! Start to add a new list of restriction enzymes. Adding them one by one. Using the analyze tool to the right to see the result(s). Patterns I got from each enzyme and all enzymes together. I tried to get a Lychee out of them. Can you see it?
1 Python Script for Opentrons Artwork I have played around the colab and adjusted some parameters in the examples. Here is an interesting one I liked:
For my individual assignment, I decided to create a pattern of a typical fruit Lychee from my hometown. I chose the sketch below as a base, and then generated the coordinates with the help of automation art interface.
After running the code in the colab notebook, I got the following image:
1 Conceptual Questions 1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)
If we assume the 500 g of meat is entirely protein, and take the average mass of one amino acid to be about 100 Da ≈ 100 g/mol, then:
Describe a biological engineering application or tool you want to develop and why.
An idea for your HTGAA final project / Your current research / A topic you are curious about
I would like to develop a biological engineering solution that enables fast-food and beverage chains to produce biodegradable packaging made from their own food waste. The idea is to create a closed-loop system in which organic waste generated by restaurants—such as coffee grounds, food scraps, or plant-based residues—can be processed through bioengineering techniques (such as microbial fermentation, mycelium growth, or biopolymer extraction) and transformed into functional, compostable packaging materials.
My motivation for developing this solution comes from two main concerns. First, many fast-food chains such as McDonald’s, Luckin Coffee, and other takeaway brands still rely heavily on conventional plastic packaging. While plastic is durable, its longevity far exceeds what is actually needed for single-use food packaging, which typically serves its purpose for only minutes or hours. As a result, these materials accumulate in landfills and natural environments, causing long-term pollution. I believe that packaging designed to biodegrade naturally after use would be a more environmentally responsible alternative.
Second, while some existing eco-friendly solutions—such as paper straws or certain plant-based plastics—are already on the market, they often provide a poor user experience. For example, paper straws tend to absorb liquid quickly, become soft, and lose structural integrity, which makes them inconvenient for consumers. Therefore, I aim to develop a hybrid biodegradable material that balances environmental sustainability with practical usability, ensuring that the packaging is both pleasant to use and significantly less harmful to the planet.
Overall, this project would not only reduce plastic waste but also encourage circular use of resources within the food industry, turning waste into valuable materials rather than pollution.
2
Describe one or more governance policy goals related to ensuring this application contributes to an ethical future & prevents harm.
In developing a biodegradable food packaging solution made from restaurant food waste, I believe that strong and thoughtful governance is essential to ensure that this innovation contributes to an ethical future while minimizing potential harm. In particular, I would align my work with three key policy goals: ensuring safety and security, promoting constructive use, and protecting equity and consumer autonomy.
First, with regard to safety and security, my application must align with and operate within existing regulatory frameworks that govern plastic use and food contact materials in China. China has already introduced important policies to limit plastic pollution, such as the “Opinions on Further Strengthening Plastic Pollution Control” issued by the National Development and Reform Commission and the Ministry of Ecology and Environment, as well as the so-called “Plastic Ban Order” (限塑令), which restricts or phases out certain single-use plastic products in the catering and retail sectors. These policies signal a clear national commitment to reducing environmental harm from plastics. My proposed biodegradable packaging should therefore not only comply with these regulations but actively support their goals by offering a viable, lower-impact alternative to conventional plastics.
At the same time, because my solution involves transforming food waste into materials that may directly contact food, a critical governance priority is to ensure strict compliance with food safety standards. In China, food contact materials are regulated under standards such as GB 4806.1 (General Safety Requirements for Food Contact Materials and Articles) and GB 4806.7 (Food Contact Plastics), which define permissible substances, hygiene requirements, and safety testing protocols. A key policy goal for my application would be to establish clear certification pathways, standardized testing procedures, and transparent quality benchmarks so that producers can reliably demonstrate that their materials are safe, non-toxic, and free from harmful contamination. This reduces health risks for consumers and increases trust in innovative, sustainable materials.
Second, in terms of promoting constructive use, governance should encourage the responsible adoption of circular, waste-to-material systems within the food industry. Policies could support pilot programs, technical guidelines, and best-practice frameworks that help restaurants and manufacturers safely process food waste into packaging materials. By making production processes as simple, standardized, and reproducible as possible, governance can lower barriers to adoption while ensuring consistent quality and safety across different producers.
Third, regarding equity and autonomy, it is crucial that consumers have sufficient information to make informed choices. Because packaging derived from food waste may be unfamiliar or even concerning to some people, I believe there should be clear labeling requirements that explain what the material is made of, how it is processed, and how it meets food safety standards. Transparency and traceability empower consumers rather than exposing them to hidden risks. Additionally, governance should consider equity across different businesses: small restaurants and local vendors should receive support—such as subsidies, technical training, or shared processing facilities—so that sustainable packaging is not only accessible to large corporations but can be adopted more broadly across society.
In summary, my application would be guided by governance policies that:
align with China’s plastic reduction policies and environmental objectives;
prioritize rigorous food safety standards for biodegradable, food-waste-derived materials;
promote transparent labeling and consumer knowledge; and
ensure fair and practical access to sustainable packaging solutions across different scales of business.
Together, these policy goals would help ensure that my proposed biological engineering application contributes to a safer, more sustainable, and more ethically responsible future.
3
Describe at least three different potential governance actions by considering the purpose, design, assumptions, and risks of failures & “success”
Draw upon your existing knowledge and a little additional digging, and feel free to use analogies to other domains such as 3D printing, drones, financial system, etc.
Purpose: What is done now and what changes are you proposing?
Design: What is needed to make it “work”? Consider the actor(s) involved - who must opt in, fund, approve, or implement?
Assumptions: What could you have wrong? Incorrect assumptions? Uncertainties?
Risks of Failures & Success: How might this fail, including any unintended consequences of the “success” of your proposed actions?
In developing a biodegradable food-waste-derived packaging system for fast-food and beverage businesses, I propose at least three complementary governance actions. Each action addresses different points in the system: production, certification, and consumer engagement. Below, I describe their purpose, design, underlying assumptions, and potential risks of failure or unintended “success.”
Governance Action 1 — Mandatory Food-Waste Collection & Standardized Upcycling Protocols for Large Chains
Purpose — What is done now and what I propose
Currently, most restaurant food waste is either discarded as general trash or handled through municipal waste systems, with little incentive for businesses to sort, store, or repurpose it. Even though some chains have adopted paper-based alternatives (e.g., pulp cup holders, paper straws, paper bags), many still rely heavily on plastic packaging, as I observed in my community research with students.
I propose a governance policy requiring large fast-food and coffee chains (e.g., national or regional franchises) to separate and store suitable organic waste streams (such as coffee grounds, plant residues, or food scraps) and follow standardized protocols for safe collection and transfer to approved bio-processing facilities that convert waste into packaging materials.
Design — What is needed to make it work
Actors who must opt in / implement: restaurant chains, third-party waste processors, and certified bio-manufacturers.
Government role: define which waste streams are acceptable, set storage standards (temperature, contamination control), and license approved upcycling facilities.
Funding: could involve a mix of corporate investment and government incentives (e.g., tax reductions or subsidies for participating businesses).
Infrastructure: shared regional processing hubs to reduce the burden on individual restaurants.
Assumptions — What could be wrong
I may be overestimating restaurants’ willingness or capacity to participate; some may see sorting and storing waste as operationally burdensome.
I assume that enough usable food waste can be collected consistently, which may not be true in smaller or less standardized outlets.
I also assume that centralized processing facilities would be economically viable at scale.
Risks of Failure & “Success”
Risk of failure: Restaurants might comply only superficially (minimal sorting, poor quality separation), leading to contaminated inputs that undermine material safety.
Unintended risk of “success”: If demand for “packaging-grade” food waste grows too quickly, businesses might overproduce or prioritize waste generation rather than reduction—similar to how some recycling systems have historically created perverse incentives.
Analogy: This is similar to e-waste or battery recycling mandates, where success depends not just on collection rules but on real downstream processing capacity and quality control.
Governance Action 2 — A Dedicated Food-Contact Certification System for Bio-Derived Packaging
Purpose — What is done now and what I propose
Existing food-contact regulations are primarily designed for conventional plastics or paper products. However, materials derived from food waste or biofilms (such as the SCOBY-based biofilm my students and I experimented with) fall into a grey area. In our project, although SCOBY films were highly eco-friendly, users worried about hygiene because of their uneven texture and slight acidity.
I propose a new, dedicated certification category for bio-derived and upcycled food packaging, integrated with existing standards (such as GB 4806.1 and GB 4806.7) but tailored to materials made from biological processes.
Design — What is needed to make it work
Actors involved: government regulators, independent testing laboratories, material scientists, and manufacturers.
Key elements:
Clear definitions of acceptable biological inputs.
Standardized testing for microbial safety, chemical leaching, durability, and user safety.
A visible labeling system (e.g., a “Certified Bio-Upcycled Food Contact Material” mark).
Approval process: materials must pass both lab tests and real-world pilot trials before commercial use.
Assumptions — What could be wrong
I assume that scientific testing can fully capture real-world risks; however, novel materials may behave unpredictably in different temperatures, humidity levels, or food types.
I also assume that certification costs will not be prohibitively expensive for smaller innovators, which may not be realistic without subsidies.
Risks of Failure & “Success”
Risk of failure: If certification is too strict or slow, innovation could be stifled, discouraging companies from developing sustainable alternatives.
Risk of “success”: If certification becomes a marketing tool rather than a meaningful safety benchmark, companies might “greenwash” their products while still producing suboptimal materials—similar to some issues seen with vague “biodegradable” labels in plastics.
Analogy: This is comparable to drone certification or medical device approval—necessary for safety, but potentially slowing innovation if poorly designed.
Governance Action 3 — Consumer Transparency, Labeling, and Public Engagement Program
Purpose — What is done now and what I propose
Currently, most consumers have little understanding of what their takeaway packaging is made of, where it comes from, or how it decomposes. Based on my student project interviews, users often reject alternatives like paper straws because of poor experience, and they were even more hesitant about SCOBY-based materials due to perceived hygiene concerns.
I propose a governance action that combines mandatory transparent labeling with public education campaigns about circular materials and responsible consumption.
Design — What is needed to make it work
Actors: government agencies, restaurants, packaging manufacturers, schools, and NGOs.
Key components:
Clear on-package labels explaining:
source of the material (e.g., “made from upcycled food waste”),
biodegradability conditions,
and food-safety certification status.
Public campaigns (e.g., in schools, cafés, and social media) explaining why these materials matter and how to dispose of them properly.
Pilot programs in selected cities to test consumer acceptance before national rollout.
Assumptions — What could be wrong
I assume that better information will lead to more responsible consumer behavior; however, price and convenience may still dominate decisions.
I may also underestimate how skeptical some consumers will remain, even with clear labeling.
Risks of Failure & “Success”
Risk of failure: If labels are too technical or confusing, consumers may ignore them entirely.
Risk of “success”: If awareness rises but affordable alternatives remain limited, consumers could feel frustrated or morally burdened without real choices—similar to debates around ethical fashion or carbon labels in food.
Analogy: This resembles nutrition labeling or carbon labeling in food systems—powerful for awareness, but insufficient without systemic change.
Overall Reflection: What Success Would Mean (and Why It Matters)
If these governance actions work together, I believe the “success” would be more than just reduced plastic waste—it would cultivate a culture where citizens see consumption as a shared responsibility rather than a disposable act. Restaurants would become active participants in circular systems, and consumers would be more mindful of what they use and discard.
At the same time, I recognize that the biggest risks lie in business resistance to additional costs and labor, and potential health or hygiene concerns if bio-derived materials are poorly regulated. Strong, thoughtful governance is therefore essential to balance environmental ambition, practical usability, and public safety.
4
Score each of your governance actions against your rubric of policy goals.
You can score from 1-3, 1 as best, or n/a
Does the option:
Option 1
Option 2
Option 3
Enhance Biosecurity
• By preventing incidents
1
1
• By helping respond
1
1
Foster Lab Safety
• By preventing incident
1
1
• By helping respond
1
Protect the environment
• By preventing incidents
1
• By helping respond
1
Other considerations
• Minimizing costs and burdens to stakeholders
1
1
• Feasibility?
1
1
• Not impede research
• Promote constructive applications
1
1
1
5
Based on scores, describe which governance option or combination of options, you would prioritize, and why.
Outline any trade-offs you considered as well as assumptions and uncertainties.
Think about your audience - very local (MIT, Cambridge Mayoral Office), to national (President or
Head of a Federal Agency), to international (United Nations Office of the Secretary-General)
Based on my scoring and risk assessment, I would prioritize a combined approach in this order:
(1) Food-contact safety certification (Option 2) — first priority.
Safety is the prerequisite for adoption. In my student project, even highly eco-friendly materials (e.g., SCOBY-based biofilm) triggered hygiene concerns due to uneven texture and slight acidity, which reduced user trust. A dedicated certification pathway (aligned with standards such as GB 4806.1/GB 4806.7 but tailored to bio-derived materials) would reduce health risks and give restaurants and consumers confidence.
Trade-off: certification can slow innovation and raise compliance costs.
Uncertainty: current tests may not fully capture long-term or real-world variability across foods, temperatures, and storage conditions.
(2) Standardized food-waste collection and upcycling protocols for large chains (Option 1) — second priority.
Even with safe materials, the system cannot scale without reliable, uncontaminated inputs. However, I see a major risk that restaurants will resist the added labor and logistics of sorting and storing food waste, so incentives and shared processing infrastructure may be necessary.
Trade-off: increased operational burden and cost for businesses.
Uncertainty: whether consistent, high-quality waste streams can be collected in everyday operations.
(3) Consumer transparency and labeling (Option 3) — third priority.
Labeling and public engagement are essential for autonomy and trust, but they are most effective once safe, scalable products exist. Implemented too early, awareness without affordable options could create frustration rather than adoption.
Trade-off: early transparency may increase skepticism; late transparency may slow trust-building.
Uncertainty: how much consumer behavior changes when price and convenience remain dominant.
Overall, I prioritize safety → supply scalability → consumer trust, because failures in safety or supply would undermine the entire system, while consumer education works best when credible alternatives are already available.
Homework Questions from Professor Jacobson
Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?
The natural DNA replication machinery (DNA polymerases with proofreading) has an extremely low error rate — approximately 1 mistake per 10⁶ to 10⁸ base pairs copied due to both base selection and 3’→5’ exonuclease proofreading.
The human genome is about 3.2 billion base pairs long (≈3.2 × 10⁹ bp).
If polymerase made errors at the high end of ~1 in 10⁶, that would still be thousands of mistakes per whole genome replication. If at ~10⁸, it would be tens of mistakes per genome. Biology deals with this discrepancy through multiple error-correcting and DNA repair systems (proofreading by polymerase and post-replication repair mechanisms) that catch and fix mismatches before they become permanent mutations.
How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?
A DNA “codon” is 3 nucleotides long, and there are 4 possible bases (A, T, C, G), giving 4³ = 64 possible codons.
Of these:
61 code for amino acids,
3 are stop signals that terminate translation.
Because there are 20 amino acids, many amino acids are encoded by multiple codons.
So in theory, there are many different ways (often dozens to hundreds) to encode the same amino acid sequence by swapping one codon for a synonymous one.
Even though multiple codons can code for the same amino acid:
Organisms exhibit codon usage bias — some codons are used much more frequently than others, and rare codons can slow or disrupt efficient translation.
Different codons can affect mRNA structure and stability, which impacts protein expression.
Some synonymous changes can still affect how fast or accurately the ribosome reads the mRNA.
Stop codons and frameshifts are not interchangeable and will completely change the protein.
So while the genetic code is redundant, not all theoretically possible codings are biologically equivalent or work well to produce functional proteins.
Homework Questions from Dr. LeProust
What’s the most commonly used method for oligo synthesis currently?
The most commonly used method is phosphoramidite solid-phase synthesis.
Why is it difficult to make oligos longer than 200nt via direct synthesis?
Direct chemical synthesis has stepwise inefficiency:
Each addition cycle isn’t 100% perfect.
Even small inefficiencies compound exponentially over many steps.
If each coupling step is ~99% efficient, then by 200 bases the chance that every step succeeded becomes very low.
This leads to a mixture of truncated sequences and errors dominating the product.
Hence, oligos longer than ~200 nt become low yield, low purity, and difficult to purify.
Why can’t you make a 2000bp gene via direct oligo synthesis?
Errors and truncations accumulate over too many steps — must assemble from shorter oligos
Homework Question from George Church
What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency"?
For humans and most animals the amino acids that must be obtained from the diet (because they cannot be synthesized fast enough internally) are:
Histidine, Isoleucine, Leucine, Lysine, Methionine, Phenylalanine, Threonine, Tryptophan, Valine, and (for some definitions) Arginine (especially in young animals).
Because lysine is one of the essential amino acids that animals cannot synthesize on their own, the “Lysine Contingency” highlights a real biological dependency rather than a hypothetical one. It shows that organisms are fundamentally constrained by their metabolic capabilities and must rely on their environment (diet or microbes) for certain critical building blocks, making lysine availability a potential evolutionary and ecological vulnerability.
Week 2 HW: DNA Read, Write, and Edit
Part 1: Benchling & In-silico Gel Art
See this week’s lab protocol “Gel Art: Restriction Digests and Gel Electrophoresis” for details. Overview:
Make a free account at benchling.com
Import the Lambda DNA.
Simulate Restriction Enzyme Digestion with the following Enzymes:EcoRI, HindIII, BamHI, KpnI, EcoR, VSacI, and SalI.
Create a pattern/image in the style of Paul Vanouse’s Latent Figure Protocol artworks.
You might find Ronan’s website a helpful tool for quickly iterating on designs!
>
Below are the steps I've done to finished this assignment.
Account at benchling.com √
Have the .gb file of Lamdba_NEB saved.
Upload the .gb file to the platform.
Visual display of the structure of the DNA sequence. Very interesting!
Start to add a new list of restriction enzymes.
Adding them one by one.
Using the analyze tool to the right to see the result(s).
Patterns I got from each enzyme and all enzymes together.
I tried to get a Lychee out of them. Can you see it?
Part 3: DNA Design Challenge
3.1. Choose your protein.
For this assignment, I chose UDP-glucose pyrophosphorylase (UGPase) from Komagataeibacter xylinus.
I selected this protein because it plays a central role in bacterial cellulose biosynthesis.
UDP-glucose pyrophosphorylase catalyzes the conversion of glucose-1-phosphate and UTP into UDP-glucose, which is the immediate activated precursor used by cellulose synthase to polymerize β-1,4-glucan chains.
Since my final project focuses on improving bacterial cellulose production for biofilm-based food packaging materials, understanding and potentially enhancing the precursor supply through UGPase is directly relevant to my research goals.
To obtain the protein sequence, I used the UniProt database. I searched for “UDP-glucose pyrophosphorylase Komagataeibacter xylinus” and identified the corresponding UniProt entry (Accession: P27897). The database provides the complete annotated protein sequence in FASTA format.
Protein Information
Protein name: UDP-glucose pyrophosphorylase
Organism: Komagataeibacter xylinus
UniProt accession: P27897
Function: Catalyzes the formation of UDP-glucose, the direct precursor for cellulose biosynthesis.
3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.
The Central Dogma explains that DNA is transcribed into RNA and translated into protein. Because the genetic code is degenerate (most amino acids are encoded by multiple codons), a given protein sequence can correspond to multiple possible DNA sequences. Therefore, reverse translation tools generate a plausible coding DNA sequence based on a selected codon usage table.
To infer the nucleotide sequence corresponding to my chosen protein (UDP-glucose pyrophosphorylase, UGPase), I used an online reverse translation tool. I kept the default codon usage table, which the tool states was generated from all E. coli coding sequences in GenBank (obtained from the Codon Usage Database). Since E. coli is a representative bacterial organism and my protein is from a bacterium, this provides a reasonable codon-usage reference for generating a plausible coding sequence.
Reverse-translated coding DNA sequence (5’→3’):
reverse translation of Untitled to a 1008 base sequence of most likely codons.
atgagcaccgtgaaaattctggcgaacgtggcgggcgtgaaagcggatggcgtggtggtg
ccgaccggcgatctggcgaaagcgggctgggtgattgtgggcggcgatggcagcctgagc
gaaaccgtgcgcgtgggcaaactgctggaagaagcgcagctgcgcgcgagccgcgatccg
gcggaagtgattgtggcgctgaccccggaaggccatattctgggcgatgcgcagaccgtg
gtgattggcgcgggcggcaccggcaaaagcggctatgaaggcctggcgcgcattctgccg
gatgatagcctgagcgtgccgctgggcattaccgtggaaaaagcgcgcgatgcgtttctg
cgcaacccgattattgatgcgctgggcaaagtgatgggcaaagatgcggtgaacctggtg
gatcagggcgaactggtggatctggatgcgctgggcgtgagccgcgcggcgctgattgat
gcgggcggcggcacccgcggccataccctggcggtggcggcggcgggcgcgaacgcgcgc
ggcctggatcgcctgaaagcgggcgcggataaagcgaaactgggcggcgtggaaattctg
gataaaagcgtgggcgcggcgcatggcctgcaggcgctgcgcggcctgggcattgatagc
gatggcgcggcggtgattctgagccgcaaactgggcagctatgaaaaactgggcgcgggc
accattgtggcgccgctggcgctgctggcggaagcggtgggcgcgaaaggcatggtgtat
ggcgaagcgcgcctgattaccaacggcgaaggccagaccattgtggtggcgggcgcgggc
aacctggtgggcgcggataccattgtggtgaccgaaggctatgatcgcggcattctgagc
ggctatgaaggcgcgcatggcctgcgcgtgggcattgaaggcgtggtgcagccgattggc
gtgaacgcgggcgaagcgaccgatctgggcgtgctgggcgtggatctg
3.3. Codon optimization.
After determining the nucleotide sequence corresponding to my chosen protein (UDP-glucose pyrophosphorylase, UGPase), codon optimization was performed to improve potential expression efficiency.
Although multiple codons can encode the same amino acid, different organisms exhibit codon usage bias, meaning they preferentially use certain synonymous codons over others. If a gene contains codons that are rare in the target host, translation efficiency may decrease due to limited availability of corresponding tRNAs. This can lead to reduced protein expression or translational stalling. Therefore, codon optimization modifies the nucleotide sequence while preserving the amino acid sequence, replacing rare codons with host-preferred synonymous codons.
I used Benchling’s codon optimization tool to optimize the coding sequence. Since Komagataeibacter was not directly available in the organism list, I selected Escherichia coli as a representative bacterial host for codon optimization. E. coli is a well-characterized prokaryotic expression system and provides a reasonable approximation of bacterial codon usage bias.
The optimization process replaced multiple synonymous codons without altering the protein sequence. Notably, the GC content decreased from 66% to 58%, bringing the sequence closer to typical bacterial expression ranges. No additional RNA hairpin structures were introduced during optimization. These changes may improve translation efficiency and overall protein expression in the selected host.
To produce the UDP-glucose pyrophosphorylase (UGPase) protein from the optimized DNA sequence, a cell-dependent expression system can be used.
In this approach, the optimized coding sequence is inserted into a plasmid vector. The plasmid contains regulatory elements necessary for gene expression, such as a promoter to initiate transcription, a ribosome binding site (RBS) to initiate translation, and a transcription terminator. The plasmid also includes an origin of replication and a selectable marker to maintain the plasmid within the host cells.
The recombinant plasmid is then introduced into a bacterial host, such as Komagataeibacter (the intended production organism). Once inside the cell, the DNA sequence follows the central dogma of molecular biology:
DNA → RNA → Protein
First, the host cell’s RNA polymerase binds to the promoter and transcribes the DNA sequence into messenger RNA (mRNA). The mRNA is then recognized by ribosomes, which translate the nucleotide sequence into the corresponding amino acid sequence according to the genetic code. Transfer RNAs (tRNAs) deliver the appropriate amino acids, and the ribosome assembles them into the UGPase protein.
Through this process, the host cell uses its natural transcription and translation machinery to produce the target enzyme. In the context of this project, increased expression of UGPase may enhance the production of UDP-glucose, potentially contributing to improved cellulose biosynthesis.
Part 5: DNA Read/Write/Edit
5.1 DNA Read
(i) What DNA would you want to sequence (e.g., read) and why? I am interested in DNA as a medium for long-term archival data storage. Specifically, I would want to sequence synthetic DNA molecules that encode digital information converted into nucleotide sequences.
DNA has extremely high storage density and long-term stability compared to traditional silicon-based data storage systems. Many forms of data—such as historical records, scientific archives, or cultural information—do not require frequent editing or real-time access. These “cold data” could potentially be stored in DNA to significantly reduce energy consumption, freeing silicon-based servers for high-demand or “hot” data.
Conceptually, I imagine a future system where biological organisms, such as trees, could serve as long-term data carriers. Through photosynthesis and natural growth, such systems could theoretically maintain archived data with minimal external energy input. While this remains speculative, it reflects the idea of integrating biological processes into sustainable information storage infrastructures.
(ii) In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why?
To retrieve information encoded in DNA, sequencing technologies would be required to accurately read the nucleotide sequences.
For DNA-based data storage, high-accuracy sequencing platforms such as Illumina sequencing would be suitable due to their low error rates and scalability. Illumina systems are widely used for short-read, high-fidelity sequencing, which aligns well with synthetic DNA data storage formats.
Alternatively, long-read sequencing technologies such as Oxford Nanopore could be considered, especially if larger DNA constructs are used. Nanopore sequencing offers portability and real-time readout, which may be advantageous for decentralized biological storage systems.
While genome editing technologies like CRISPR can insert DNA sequences into living organisms, they are not sequencing technologies. If biological hosts were used as storage media, genome sequencing would still rely on established sequencing platforms rather than gene-editing tools.
In the long term, bridging silicon-based storage systems and carbon-based biological storage systems may require new bioelectronic interfaces. Such interfaces could enable digital-to-biological encoding and biological-to-digital decoding, potentially integrating molecular biology with computational systems.
5.2 DNA Write
(i) What DNA would you want to synthesize (e.g., write) and why? For this project, I would like to synthesize DNA constructs aimed at enhancing the production of sustainable biomaterials that could potentially replace petroleum-based plastics.
Specifically, I am interested in engineering bacterial cellulose-producing systems, such as Komagataeibacter, to increase the yield and functional properties of cellulose-based biofilms. Bacterial cellulose is a promising biodegradable material with high mechanical strength, purity, and biocompatibility. It has potential applications in food packaging, medical materials, and sustainable textiles.
Conceptual DNA Constructs
Rather than synthesizing an entirely new genome, I would focus on designing modular genetic constructs that include:
Genes involved in UDP-glucose biosynthesis Since UDP-glucose is the direct precursor for cellulose synthesis, enhancing enzymes in this pathway (e.g., UDP-glucose pyrophosphorylase, UGPase) may increase substrate availability and improve cellulose yield.
Regulatory elements for enhanced expression These would include promoters and regulatory sequences that enable controlled, elevated expression of cellulose-related enzymes without imposing excessive metabolic burden.
Material-functionalization modules (optional future direction) Additional genetic modules could encode proteins that modify cellulose properties—for example:
Binding domains that improve mechanical strength
Enzymes that adjust porosity or hydration properties
Structural protein motifs that enable composite biomaterials
The overall goal would not be to fundamentally redesign the organism, but rather to create a tunable genetic system that enhances production of biodegradable cellulose-based biomaterials.
(ii) What technology or technologies would you use to perform this DNA synthesis and why?
To synthesize the DNA constructs, I would use a commercial DNA synthesis platform, such as those provided by biotechnology companies. These platforms chemically synthesize DNA sequences based on a digital design. This allows precise writing of custom genetic sequences for biomaterial production.
Commercial synthesis is suitable because it is reliable, scalable, and widely used in synthetic biology research.
To confirm the DNA sequence, I would use a next-generation sequencing (NGS) technology.
In general, sequencing involves:
Preparing the DNA sample
Reading the nucleotide sequence using a sequencing machine
Converting signals into digital sequence data
Comparing the result to the designed sequence
This ensures that the synthesized DNA matches the intended design.
5.3 DNA Edit
(i) What DNA would you want to edit and why?
I am interested in exploring DNA editing as a way to create a biological–digital interface rather than traditional therapeutic or enhancement applications.
Specifically, I am curious about editing genetic pathways that influence how biological systems sense and respond to external signals. Instead of modifying core human traits, I would focus on engineering cells capable of detecting and translating molecular or electrical signals into readable outputs. Such systems could potentially serve as an interface between biological organisms and external data storage systems, whether silicon-based digital storage or DNA-based archival storage.
The goal would not be human augmentation in terms of physical or cognitive enhancement, but rather the creation of a communication bridge between biological systems and computational systems. This concept resembles brain–computer interfaces, but extended toward a broader biological–informational integration.
(ii) What technology or technologies would you use to perform these DNA edits and why?
For conceptual DNA editing, I would consider programmable genome editing tools such as CRISPR-based systems, which allow targeted modification of specific DNA regions.
These systems function by using a guide RNA to direct a nuclease to a specific DNA sequence. The nuclease creates a break at the targeted location, and the cell’s natural repair mechanisms then introduce changes or insert new genetic material.
In principle, this allows precise editing of genes involved in sensing, signal processing, or molecular response pathways.
Essential steps (conceptual overview)
Identify a target DNA region related to signal sensing or response.
Design a guide RNA to direct the editing machinery.
Deliver the editing components into cells.
Allow the cell’s repair mechanisms to incorporate the desired change.
Verify the modification through sequencing.
Preparation and Inputs (conceptual)
Designed guide RNA sequences
Editing enzymes (e.g., programmable nucleases)
DNA templates (if inserting new sequences)
Target cells
Limitations
Efficiency: Not all cells undergo successful editing.
Precision: Off-target edits can occur.
Complexity: Editing human systems for functional interfaces would involve major ethical and biological challenges.
Ethical considerations: Any editing related to humans must be carefully regulated and ethically justified.
Week 3 HW: Lab Automation
1
Python Script for Opentrons Artwork
I have played around the colab and adjusted some parameters in the examples. Here is an interesting one I liked:
For my individual assignment, I decided to create a pattern of a typical fruit Lychee from my hometown. I chose the sketch below as a base, and then generated the coordinates with the help of automation art interface.
After running the code in the colab notebook, I got the following image:
2
Post-Lab Questions
1. Find and describe a published paper that utilizes the Opentrons or an automation tool to achieve novel biological applications.
In “AssemblyTron: flexible automation of DNA assembly with Opentrons OT-2 lab robots”,the authors demonstrate a published, peer-reviewed example of leveraging the Opentrons OT-2 open-source liquid-handling robot to enable a novel, scalable workflow for synthetic biology. They introduce AssemblyTron, a flexible automation framework that programmatically executes routine yet error-prone steps of DNA assembly—including reaction setup, multi-part mixing, and plate-based transfers—thereby converting manual bench protocols into reproducible, high-throughput robotic procedures. The novelty is not merely automating pipetting, but packaging DNA construction as a modular, reusable pipeline that lowers the barrier to running DBTL (Design–Build–Test–Learn) cycles at scale. By increasing throughput and consistency while reducing human variability, AssemblyTron enables faster iteration across many genetic construct variants, accelerating biological design space exploration and optimization—an example of how lab automation can directly advance modern biological engineering.
2. Write a description about what you intend to do with automation tools for your final project.
I will use the Opentrons OT-2 liquid-handling robot to accelerate the Build and Test phases of a DBTL workflow for my final project. My project aims to clone a cellulose-supporting enzyme coding sequence into an E. coli plasmid (for construct building and amplification), then transfer validated constructs into a cellulose-producing bacterium to test whether cellulose/SCOBY formation is faster or yields more material.
With OT-2, I will automate repetitive, error-prone liquid handling steps that otherwise slow down iteration: (1) parallel setup of DNA assembly reactions across multiple construct variants (e.g., different promoter/RBS strengths), (2) high-throughput preparation of colony PCR / verification reactions to screen many clones consistently, and (3) plate-based preparation of production tests by dispensing media and generating controlled condition gradients (e.g., carbon-source levels and buffering conditions) across wells. This automation will allow me to evaluate multiple genotypes and culture conditions in parallel with improved reproducibility, turning my project from a single-shot cloning attempt into a scalable screening pipeline.
# ------------------------------------------------------------
# Goal: Use OT-2 to automate a 24-well screening experiment:
# (construct variants) × (culture conditions) in parallel
# for cellulose/SCOBY pellicle formation.
# ------------------------------------------------------------
# 1) Define what you want to compare
constructs = [
"enzyme_high_expression", # plasmid variant A
"enzyme_medium_expression", # plasmid variant B
"empty_vector_control" # negative control
]
conditions = [
{"carbon_level": "low", "buffer": "A"},
{"carbon_level": "mid", "buffer": "A"},
{"carbon_level": "high", "buffer": "A"},
{"carbon_level": "low", "buffer": "B"},
{"carbon_level": "mid", "buffer": "B"},
{"carbon_level": "high", "buffer": "B"},
]
replicates = 2 # number of repeats per (construct × condition)
# 2) Build a 24-well plate layout (a "plate map")
# plate_map maps each well -> {construct, condition, replicate}
plate_map = create_24well_map(constructs, conditions, replicates)
# 3) OT-2 deck setup (conceptual)
ot2.load_labware("24_well_plate", slot=1)
ot2.load_labware("reservoir_for_media_and_stocks", slot=2)
ot2.load_tip_racks(slots=[3, 6])
# 4) Prepare each well with standardized media + condition gradients
for well, meta in plate_map.items():
ot2.dispense(source="base_media", dest=well) # same base in every well
# add carbon source stock according to condition (low/mid/high)
ot2.add(source=f"carbon_stock_{meta['condition']['carbon_level']}",
dest=well)
# add buffer stock according to condition (A or B)
ot2.add(source=f"buffer_{meta['condition']['buffer']}",
dest=well)
ot2.mix(well)
# 5) Inoculation step (choose one practical option)
# Option A (common): you inoculate manually after OT-2 prepares the plate.
# Option B: OT-2 inoculates if you provide cultures in tubes/reservoir wells.
for well, meta in plate_map.items():
ot2.add(source=f"culture_{meta['construct']}", dest=well)
ot2.mix(well)
# 6) Sampling transfer to an assay plate at defined time points
timepoints = ["Day3", "Day5", "Day7"]
ot2.load_labware("96_well_assay_plate", slot=4)
for t in timepoints:
for well in plate_map:
ot2.transfer(source=well, dest=get_assay_well(well, t),
what="supernatant_sample")
# 7) Export the experiment map so results stay traceable
export_json(plate_map, "24well_experiment_map.json")
export_csv_template(plate_map, "cellulose_screen_results.csv")
This pseudocode describes a practical OT-2 workflow using a 24-well plate to screen cellulose/SCOBY formation across multiple plasmid constructs and culture conditions in parallel. First, it defines the construct variants (e.g., high expression, medium expression, and an empty-vector control) and a small set of environmental conditions (e.g., carbon-source levels and buffer types). It then generates a 24-well plate map so every well is traceable to a specific construct–condition–replicate combination. The OT-2 automates repetitive liquid handling by dispensing base media, adding condition “stocks” to create controlled gradients, mixing each well consistently, and optionally inoculating cultures. Finally, at scheduled time points, the OT-2 transfers standardized aliquots (e.g., supernatant samples) into an assay plate for measurements and exports the plate map so that downstream cellulose readouts (pellicle dry mass/thickness and basic chemistry measurements) can be linked back to the exact construct and condition.
Week 4 HW: Protein Design Part 1
1
Conceptual Questions
1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)
If we assume the 500 g of meat is entirely protein, and take the average mass of one amino acid to be about 100 Da ≈ 100 g/mol, then:
Then multiply by Avogadro’s number:
So the answer is approximately 3 × 10^24 amino acid molecules.
2. Why do humans eat beef but do not become a cow, eat fish but do not become fish?
Humans do not become cows or fish after eating them because food is broken down during digestion. Proteins in beef or fish are digested into amino acids, fats into fatty acids, and carbohydrates into simple sugars. Your body then uses these small molecules as building blocks to make human proteins, cells, and tissues according to human DNA.
So, we do not directly turn cow tissue into human tissue. We first deconstruct the food, then rebuild it in our own biological way. In other words, we eat the raw materials, not the identity of the animal.
3. Why are there only 20 natural amino acids?
There are only 20 natural amino acids because this set is enough to make a wide variety of proteins while keeping the genetic system relatively simple and stable. They provide different chemical properties, such as size, charge, polarity, and shape, which allow proteins to fold and function in many ways. Evolution likely selected this set early because it worked well, and once the genetic code was established, it became highly conserved.
4. Can you make other non-natural amino acids? Design some new amino acids.
Yes, it is possible to make non-natural amino acids. Scientists can keep the standard amino acid backbone and change only the side chain to give it new properties. For example, they can design a fluorinated alanine by adding a fluorine atom to the side chain of alanine. This can change its electronic properties and hydrophobicity, and may also increase the molecule’s stability. Non-natural amino acids like this can help expand the functions of proteins.
5. Where did amino acids come from before enzymes that make them, and before life started?
Before life existed, amino acids were likely formed by prebiotic chemistry, meaning ordinary chemical reactions that do not require enzymes or living cells. Experiments such as the Miller–Urey type studies showed that amino acids can form from simple gases and energy sources like lightning. Scientists have also found amino acids and their precursors in meteorites, suggesting that some may have formed in space and later arrived on early Earth. Researchers have also demonstrated plausible amino-acid formation through hydrothermal, impact-driven, and other geochemical processes on the early Earth. So the best answer is that amino acids probably came from a combination of natural chemical reactions on Earth and delivery from space, before enzymes and life evolved.
6. If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?
A helix made from D-amino acids would be expected to form a left-handed α-helix. This is because the common right-handed α-helix in proteins comes from L-amino acids. If you switch to the mirror-image D-amino acids, the preferred helix also becomes the mirror image, which is left-handed.
7. Can you discover additional helices in proteins?
Yes. Scientists can find extra helices in natural proteins using structure methods, and they can also design new helices by changing the amino acid sequence. But a new helix will only work if it fits the protein’s overall 3D structure — it may stabilize the protein, or it may disrupt it.
8. Why are most molecular helices right-handed?
Most molecular helices are right-handed because the shape of their building blocks makes that twist more stable. In biology, proteins are usually made from L-amino acids. When many L-amino acids fold into an α-helix, the right-handed helix avoids steric clashes better and has lower energy than a left-handed one. So it is simply the more stable shape.
9. Why do β-sheets tend to aggregate? What is the driving force for β-sheet aggregation?
β-sheets tend to aggregate because their extended strands can line up easily and form many backbone hydrogen bonds with each other. Hydrophobic side chains can also pack together and avoid water. So, the main driving force is intermolecular hydrogen bonding, helped by hydrophobic interactions, which makes the aggregated β-sheet structure more stable.
2
Protein Analysis and Visualization
Briefly describe the protein you selected and why you selected it.
I chose UDP-glucose pyrophosphorylase (UGPase) from Komagataeibacter xylinus.
I selected this protein because it plays a central role in bacterial cellulose biosynthesis. UDP-glucose pyrophosphorylase catalyzes the conversion of glucose-1-phosphate and UTP into UDP-glucose, which is the immediate activated precursor used by cellulose synthase to polymerize β-1,4-glucan chains.
Since my final project focuses on improving bacterial cellulose production for biofilm-based food packaging materials, understanding and potentially enhancing the precursor supply through UGPase is directly relevant to my research goals.
Identify the amino acid sequence of your protein.
How long is it? What is the most frequent amino acid? You can use this Colab notebook to count the frequency of amino acids. The total sequence length is 336, and the most frequent amino acid is G.
How many protein sequence homologs are there for your protein? Hint: Use UniProt’s BLAST tool to search for homologs. Using UniProt’s BLAST tool, I found 250 sequence homolog candidates for my protein in UniProtKB. The taxonomy results show that these homologs are distributed across a wide range of organisms. Most hits are found in Bacteria (179 results), followed by Eukaryota (70 results), and Archaea (1 result). Within bacteria, the largest groups are Pseudomonadati (91 results) and Bacillati (87 results). In eukaryotes, the hits are distributed among Metazoa (34 results), SAR (18 results), Viridiplantae (8 results), Fungi (8 results), Amoebozoa (1 result), and Haptista (1 result). This suggests that my protein, or related proteins, may be broadly distributed across different branches of life.
Does your protein belong to any protein family? Based on the current BLAST results, I cannot confidently assign my protein to a specific protein family yet, because the matched proteins have diverse annotations and many are uncharacterized.
Identify the structure page of your protein in RCSB.
When was the structure solved? Is it a good quality structure? Good quality structure is the one with good resolution. Smaller the better (Resolution: 2.70 Å). The structure was deposited on 2021-09-14 and released on 2022-09-21. So it was solved by 2021 and became publicly available in 2022. This is a reasonably good-quality structure. Its resolution is 2.57 Å, which is better than 2.70 Å. Since a smaller resolution value indicates higher structural detail, this structure can be considered good quality.
Are there any other molecules in the solved structure apart from protein? Yes. Besides the protein, the structure also contains SO4 (sulfate ions), as shown by the Ligand Interactions (SO4) information.
Does your protein belong to any structure classification family? Yes. According to the PDB entry, this protein is classified as a TRANSFERASE.
Open the structure of your protein in any 3D molecule visualization software.
PyMol Tutorial Here (hint: ChatGPT is good at PyMol commands)
Visualize the protein as “cartoon”, “ribbon” and “ball and stick”. Screenshots attached below.
Color the protein by secondary structure. Does it have more helices or sheets?
After coloring the protein by secondary structure, I observed that it contains more alpha helices than beta sheets.
Color the protein by residue type. What can you tell about the distribution of hydrophobic vs hydrophilic residues?
After coloring the protein by residue type, I observed that hydrophobic residues are distributed mainly throughout the interior/core regions of the protein, while hydrophilic residues, including polar and charged residues, are more exposed on the surface. This suggests a typical soluble protein organization, where hydrophobic residues help stabilize the folded core and hydrophilic residues interact with the surrounding aqueous environment.
Visualize the surface of the protein. Does it have any “holes” (aka binding pockets)?
The protein surface is not smooth; it contains several pockets and cavities of different sizes. These surface depressions could serve as potential binding sites.
3
Using ML-Based Protein Design Tools
Copy the HTGAA_ProteinDesign2026.ipynb notebook and set up a colab instance with GPU.
Choose your favorite protein from the PDB.
I chose vanillin synthase because it produces vanillin, the molecule responsible for the characteristic aroma of vanilla. Vanilla is one of the most recognizable and pleasant food aromas, commonly found in desserts such as ice cream, cakes, and pastries.
We will now try multiple things in the three sections below; report each of these results in your homework writeup on your HTGAA website:
C1. Protein Language Modeling Deep Mutational Scans
a. Use ESM2 to generate an unsupervised deep mutational scan of your protein based on language model likelihoods. b. Can you explain any particular pattern? (choose a residue and a mutation that stands out)
A pattern that stands out in the heatmap is that mutations to cysteine (C) are frequently more negative than many other substitutions. This may be because cysteine is chemically special: its thiol group is reactive and can introduce constraints or unwanted interactions, so the model tends to consider cysteine substitutions less favorable in many positions.
C2. Protein Folding
a. Fold your protein with ESMFold. Do the predicted coordinates match your original structure? b. Try changing the sequence, first try some mutations, then large segments. Is your protein structure resilient to mutations?
The structure predicted by ESMFold is broadly consistent with the experimentally determined structure. Both structures show a compact globular fold composed of several α-helices and β-sheets. While some local differences are visible in loop regions and the orientation of certain helices, the overall topology of the protein appears similar. This suggests that the model successfully captured the general folding pattern of the protein.
unchanged compared to the original structure. Only minor differences appeared in flexible loop regions. This suggests that the protein fold is robust to small conservative mutations.
(L11I)
(L11P)
I introduced both conservative (L11I) and more disruptive (L11P) mutations into the protein sequence and predicted the structures using ESMFold. In both cases, the overall protein fold remained very similar to the original structure. The positions of α-helices and β-sheets were largely preserved, and only small local differences appeared in flexible regions.
This suggests that the protein structure is relatively resilient to single-point mutations. Even mutations that might disrupt local secondary structure do not necessarily change the global fold of the protein.
I replaced residues 184–191 (TNAAVDEN) with GGGGSGGG. After running the structure prediction again, I observed that the overall fold of the protein remained very similar to the original structure. Most secondary structure elements such as α-helices and β-sheets were preserved.
However, some local changes appeared in flexible loop regions, especially near the tail of the protein. This suggests that the protein structure is relatively resilient to mutations, and small or moderate sequence changes mainly affect local conformations rather than the global fold.
C3. Protein Generation
a. Analyze the predicted sequence probabilities and compare the predicted sequence vs the original one. b. Input this sequence into ESMFold and compare the predicted structure to your original.