Part 1: Class Assignment 1. “The Big Idea” In the world of Big Farm, nutrient pollution is a big problem, particularly near farms where fertilizers and manure release excess phosphorus and nitrogen into the environment. This leads to issues like eutrophication, dead zones, and human health impacts. This also leads to losses in other industries such as fishing or recreational activity. Paradoxically, we also frequently see cases of nutrient depletion, particularly in the context of agriculture. Monocropping and poor agricultural practices has led to the depletion of topsoil, making it one of the scarcest resources in the world. According to the UN Food and Agricultural Organization, 90% of our world’s topsoil is at risk by 2050. To combat this, I’m interested in seeing if a circular nutrient economy is possible:
PART 0: BASICS I have attended all lectures and recitation necessary to prepare for this week.
PART 1: GEL ART & BENCHLING I made my free account on Benchling, following Ice’s tutorial in class on Lambda DNA. I then played around with Ronan’s website and Benchling’s Digest feature to try to come up with something I liked. Ultimately, I came up with something that looks vaguely like a lucky cat (if you squint).
Part 1: Python Script Despite having taking 6.100A, I am still not very adept with Python. I am even less skilled with Google Colab, so I had no choice but to use the GUI to generate my Python script. I produced this beautiful piece of art:
I also created this one:
Part 1: Conceptual Questions Assuming protein mass of meat is 20%, 0.20 x 500=100g of actual protein in this meat. If average residue is roughly 100 Da, I can probably assume that’s roughly 100g/mol, meaning I now have 1 mol of residue. In one mole, there’s roughly 6 x 1023 molecules. Thus, there are roughly 6 x 1023 molecules of amino acids in 500 grams of meat We digest these proteins, not incorporate them into our bodies. We rebuild our own proteins using molecular tools in our body, that may or may not come from what we eat. There are only 20 “natural” amino acids because that’s what biology standardized our amino acids as. These amino acids have good chemical diversity, synthetic accessibility and different constraints. Now that we have evolutionarily reached this place, it is difficult to incorporate more without rewiring our whole system. As to why there’s only 20 versus like 100, this could be because if there were too many, it would be rather costly and inefficient, so it’s better to keep our biological systems simple. Yes, noncanonical amino acids are frequently made, and they can also probably be genetically encoded as well. For example, fluoroleucine or p-iodo-phenylalanine. Before enzymes & life, amino acids came from multiple plausible sources. For example, strecker synthsis in watery environments, delivery by meteorites (they were found in chondrites), or even atomsphere/UV activity. They are left handed Yes, proteins contain other helical motiefs like π-helix, collagen triple helix, β-helix / solenoid-like helices. New helical patterns can be found by analyzing high resolution structures. Most molecular helices are right hanaded because of chirality bias in L-amino acids. Another reason could be that right-handed packing avoids clashes and are better for L residues in general (so as a result of sterics). They have sticky edges where backbone hydrogen bond donors/acceptors are exposed at the sheet edges, and adding on another strand would satisfy H-bonds. This allows β-sheets to extend into larger assemblies easily. Drivers include bacbkone hydrogen bonding, hydrophobic effect and shape complentarity. Growth is probably also kinetically favorable. Amyloid diseases tend to form β-sheets because the cross-β amyyloid architecture is very stable and can form many sequences once misfolded. This ultimately templates further misfolding and thus createspersistent aggregates that disrupt cells. Yes, they are often found in nanofibers, hydrogels, templates for mineralization, etc. A simple, reliable design is a β-hairpin that self-assembles with controlled registry: Design rules:
Part 1: SOD1 Binder Peptide Design Part A: The retrieved SOD1 sequence is:
MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ Upon introducing the A4V Mutation, we get:
MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ The following amino acids were generated with their subsequent perplexity scores:
Part 1: Homework Questions Phusion High-Fidelity PCR Master Mix contains several components required for DNA amplification. The Phusion DNA polymerase is the enzyme that synthesizes new DNA strands; it has proofreading (3’→5’ exonuclease) activity, which greatly reduces mutation rates compared with standard Taq polymerase. The mix also includes dNTPs (deoxynucleotide triphosphates), which are the nucleotide building blocks incorporated into the newly synthesized DNA. A reaction buffer provides the correct chemical environment (pH, salts, stabilizers) to maximize enzyme activity and fidelity. Mg²⁺ ions (usually from MgCl₂) act as essential cofactors for polymerase function and influence enzyme efficiency. Primer annealing temperature is mainly determined by the melting temperature (Tm) of the primers, which depends on their sequence composition. Primers with higher GC content generally have higher Tm because G–C pairs form three hydrogen bonds compared with two in A–T pairs. Primer length also affects Tm, as longer primers form more stable duplexes with the template. Salt concentration and Mg²⁺ levels in the reaction buffer influence DNA duplex stability and therefore the optimal annealing temperature. Additionally, primer secondary structure or mismatches can reduce binding stability and may require lower annealing temperatures. PCR and restriction enzyme digestion both produce linear DNA fragments but operate through different mechanisms. PCR uses primers and a DNA polymerase to amplify a specific DNA region from a template through thermal cycling. This method is highly flexible because primers can introduce mutations, overhangs, or homologous regions, making PCR useful when generating fragments for cloning or modifying sequences. In contrast, restriction enzyme digestion uses enzymes that recognize specific short DNA sequences and cut at those sites, producing predictable fragments with defined ends (often sticky or blunt). The digest protocol is simpler and faster if the required restriction sites already exist in the DNA. PCR is preferable when amplifying small regions, adding sequences, or working from low DNA amounts, while restriction digests are preferable when cutting large plasmids or isolating fragments with existing restriction sites without introducing polymerase errors. For Gibson Assembly, DNA fragments must have overlapping homologous sequences at their ends so they can anneal during the assembly reaction. These overlaps are usually designed into PCR primers, ensuring that adjacent fragments share complementary sequences. After PCR amplification or digestion, the fragments should be checked by gel electrophoresis to confirm the correct size and purity. It is also important to verify that the overlaps match the intended assembly order and that no incompatible restriction sites remain within the overlaps. Golden Gate Assembly is a DNA cloning method that uses Type IIS restriction enzymes and DNA ligase in a single reaction to assemble multiple DNA fragments in a defined order. Unlike standard restriction enzymes, Type IIS enzymes cut outside their recognition site, producing custom overhangs that can be designed to be unique for each fragment. During the reaction, the restriction enzyme cuts the DNA to create compatible overhangs, and DNA ligase simultaneously joins the fragments together. Because the recognition sites are removed after ligation, the final assembled product cannot be cut again, allowing the reaction to proceed efficiently toward the correct construct. To model this, I just chose two random sequences (LACMG that we worked with once, and a gibberish one that I have for some reason). I went to Benchling to model the assembly, and this is what came out of it:
Part 1: Homework Questions IANNs allow cells to perform analog, weighted, decision-making rather than simple binary logic. Traditional genetic circuits usually implement Boolean gates, where inputs are treaed as on/ofof signals and outputs are discrete. In contrast, IANNS use components whose activities can vary continuously, allowing inputs to contribute different weights to a final output. This allows cells to integrate multiple signals simultaneously, filter noise and produce grade responses. IANNs overall can scale more easily to complex behaviors, making them better suited for biological environments with continuos noisy signals. A useful application of IANN would be a smart probioitic diagnostic cell that detects complex disease states in the gut.
Inputs: The circuit could receive several molecular signals associatied with inflammation, such as nitric oxide levels, reactive oxygen species or other responsive promoters. Each input drives production of regulators that act with different weights on the expression of a reporter gene. Processing: Each regulator modifies the stability or translation of the reporter mRNA. If the combined signal exceeds a threshold, the cell expresses a fluorescent protein or therapeutic molecule. This allows the cell to classify complex physiological states, rather than triggering on a single biomarker that might fluctuate naturally.
Output: Low combined signal → little or no reporter expression.
Moderate signal → weak expression.
High combined signal → strong reporter or drug release. Limitations: There are several constraints that could limit implementation. For example, gene expression fluctuations can distort weights and thresholds, making outputs inconsistent. Promoters and translation systems may saturate, preventing precise analog weighting. Large networks can slow cell grwoth or destabilize circuits. Furhtermore, large networks could slow cell growth or dsetabilize circuits and tuning these weights rqequires iterative experimental optimization. stuff Part 2: Fungal Materials Several commercial materials are made from fungal mycelium. One example is mycelium-based packaging produced by Ecovative, which grows fungal mycelium through agricultural waste to create molded protective packaging that replaces polystyrene foam. Mycelium composites are also used for insulation panels and structural building materials, such as mycelium bricks and boards that can be grown into shape. Another emerging product is mycelium leather, developed by companies like MycoWorks and Bolt Threads, which produces flexible sheet materials that mimic animal leather for fashion products.
These fungal materials offer several advantages over traditional materials. They are renewable and biodegradable, can be grown from agricultural waste, and require much lower energy input than plastics or synthetic foams. Mycelium materials can also be grown directly into molds, reducing manufacturing steps and waste. However, they also have disadvantages: mechanical strength and durability are generally lower than plastics or synthetic composites, they can be sensitive to moisture, and scaling production with consistent material properties remains challenging. One useful direction would be engineering fungi to produce stronger or more functional mycelium materials. For example, genes could be modified to increase chitin or glucan crosslinking in the cell wall to improve stiffness and toughness of mycelium composites used in construction or packaging. Fungi could also be engineered to produce functional biomaterials, such as mycelium that incorporates conductive proteins for bioelectronics or that secretes adhesives or antimicrobial compounds. Another application could be fungi engineered to capture pollutants, such as heavy metals or microplastics, allowing grown fungal materials to act as environmental filtration systems.
Fungi offer several advantages as engineering hosts compared with bacteria. Because fungi are eukaryotes, they perform complex post-translational modifications and protein folding, which are necessary for many enzymes and biomaterials that bacteria cannot produce efficiently. Filamentous fungi naturally grow large structural networks (mycelium), allowing them to form macroscopic materials without external scaffolds, something bacteria generally cannot do. Fungi also secrete large amounts of enzymes and proteins, making them good platforms for producing extracellular biomolecules or structural polymers.
However, fungi are generally harder to genetically manipulate than bacteria: transformation efficiencies are lower, genetic tools are less standardized, and growth is typically slower.
Subsections of Homework
Week 1 HW: Principles and Practices
Part 1: Class Assignment
1. “The Big Idea”
In the world of Big Farm, nutrient pollution is a big problem, particularly near farms where fertilizers and manure release excess phosphorus and nitrogen into the environment. This leads to issues like eutrophication, dead zones, and human health impacts. This also leads to losses in other industries such as fishing or recreational activity. Paradoxically, we also frequently see cases of nutrient depletion, particularly in the context of agriculture. Monocropping and poor agricultural practices has led to the depletion of topsoil, making it one of the scarcest resources in the world. According to the UN Food and Agricultural Organization, 90% of our world’s topsoil is at risk by 2050. To combat this, I’m interested in seeing if a circular nutrient economy is possible:
A. Capture nitrogen & phosphorus from the water
B. Convert them to stable bioproducts
C. Capsulize them to regenerate soil
Phase 1 would involve pulling the nitrate and phosphate from the environment. There’s plenty of natural phenomenon that I can take inspiratino from in order to do so, but for this aspect I think I would have ot do more research. Some examples I can think of are just creating microbial biofilms on 3D-printed lattices, or mimicking natural filters.
Phase 2 would involve locking this biomass into soil-safe carriers, which would almost certainly involve microbiome engineering as well. Possible options include simple alginate/cellulose pellets, biopolymer beads, mycelium composites or mineralized granules. These would have to be designed to be slow release so that run-off is minimized and we don’t face the issue that inspired this project. One thing to note is that good soil is not just a few nutrients, and requires a balance of other factors, including microorganism diversity and organic matter. It might be possible that the final product is some sort of mixture rather than a homogeneous assortment of pellets.
I believe the development of Phase 2 would increase agricultural diversity across the globe and could also potentially allow for at-home growth in areas where soil generally is not necessarily suitable for doing so. This could reduce traditional lawns and increase area for people to garden in their yards, which is another added benefit for the environment.
2. Governing “The Big Idea”
There are a few goals I would want to target with this project. They can be further broken down into sub-components.
Environmental Protection The most optimistic outcome of the project is the hope that there is a beneficial environmental impact, and close to no environmental harm. To achieve this, there needs to be a couple of considerations:
Adequate testing
There should be field pilots and monitoring over multiple seasons before consideration for deployment
Protect biodiversity
Installation should not affect sensitive habitats, and any scenario where this could occur, impact assessments should be done
Chemicals and components used should not pose a risk to the environment
Environmental Justice & Transparency Potential risks should be addressed prior to the experiment. The project and its applications should also be placed in the correct cultural and social context.
Equitable access for all areas
Small farms and low-income areas need to be considered. In that case, affordability is also a concern
Transparency in historically polluted areas
Communities should be consulted on consent and opinions
Purpose Rather than self-reporting nutrient removal and soil impacts, minimum thresholds should be required with regards to things like nutrient capture efficiency, runoff/leaching rates, carbon footprint, etc.
Design
Federal & state regulators set standards
Companies certify products before sale
Universities and other R&D groups test prototypes under common protocols
Independent parties audit field trials This scenario could be analogous to emissions standards for vehicles
Assumptions
Metrics are measurable and cheap to do so
Lab results translate to real watersheds
Regulators can keep up with new designs
Risks of Failure & “Success”
“Success”:
Firms would try to optimize only for regulated metrics, rather than ecosystem complexity
Start-ups are crowded out by compliance costs
Failure:
Innovation is bottlenecked by strict rules
Loopholes leads to greenwashing
Slow approvals delay overall benefits
Scenario 2: Transparency & Public Accountability
Purpose There would likely be limited visisbility into field performance, thus it may be possible to create open data platforms and certification schemes that let different members of the community to evaluate systems
Design
Univerities publish standard test protocols
NGOs run registries
Firms disclose performance data
Local governments host dashboards
Assumptions
Transparency will deter bad practice
Communities are interested in engaging with data
Transparency puts pressure on firms to improve
Risks of Failure & “Success”
“Success”:
Pressure of reputation stifles experimentation
Surveillance burdens small operaors
Politicization of environmental metrics
Failure:
Data mishandling or misinterpretation
Firms selectively report
Continued public mistrust
Scenario 3: Market-Driven Scaling
Purpose Nurient recovery would probably struggle economically. To counteract this, there could be subsidies provided, nutrient-credit markets and public procruement to accelerate deployment once systems meet safety threshoulds.
Design
Governments pay for nutrient removal
Farmers get rebates for recycled fertilizers
Cities host infrastructure for the capure
community boards approve projects This system could be analogous to current renewable energy tax credits.
Assumptions
Price signals will drive adoption
Farmers would accept the recyled inputs
Monitoring would prevent abuse of the system
Risks of Failure & “Success”
“Success”:
Dependence on incentives
Nutrient extraction from ecologically sensitive waters
Monoculture of this technology
Failure:
Gaming of credits
Inequitable deployment
Political instability
4. Scoring
The following scale is used to score these strategies:
I believe the most important to value here would be the environmental performance standards. It seems that none of the other strategies quite work without solid thresholds and protocols. It also most supports my idea of aligning innovation with environmental protection rather than letting it fall into the hands of the market and the public.
By requiring these thresholds, regulators can ensure these technologies genuinely make a beneficial impact in reducing pollution instead of just shifting risks from waterways to soils or communities.
I have completed: A. Lab Specific Training B. Safety Training in Atlas
Part 3: Week 2 Lecture Prep
1. Questions from Professor Jacobson
Q1: The error rate of polymerase is 1:10^2. Compared to the length of the human genome, this is 300 S per base addition. To deal with this, biology adds quality control steps. For example, proofreading, mismatch repair systems, damage repair pathways and cell-cycle checkpoints. This is necessary to copy the human genome effectively.
Q2: Most amino acids have multiple codons. An average human protein is around ~400 amino acids long (https://bionumbers.hms.harvard.edu/bionumber.aspx?s=n&v=4&id=106445). This means the total number of possible DNA sequences could be on the order of 10^100 or more, so in theory, many different DNA strings could lead to the same amino-acid chain.
In practice, most of these codes don’t work that well because different codon choices can fail or perform poorly for different reasons. Examples could be codon bios, folding effects or other constraints.
2. Questions from Dr. LeProust
Q1: Currently, the most common method for oligo synthesis is: Coupling with phophoramidite –> Capping the unreacted sites –> Oxidizing it –> Deblocking it. In this case, the deblocking step is preparing it for the next nucleotide.
Q2: It is difficult to make longer than 200nt as compounding errors lead to truncated molecules.
Q3: A 2000 bp gene would require 2000 flawless coupling cycles with near-perfect chemistry every time. This involves a lot of effort and this level efficiency is unrealistic. Additionally, this scenario would probably lead to accumulating chemical damage and costs.
3. Questions from George Church
The ten amino acids generally considered essential for animals are arginine, histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, and valine. Essential amino acids cannot be sufficiently synthesized in an animal’s carbon skeleton, so they must be obtained from diet or symbionts.
The “lysine contingency” is the fact that animals in particular have lost the ability to make lysine. Given that it’s an essential amino acid, it now seems that this may be more of an evolutionary constraint that allows an ecosystems to create reliance between species. Our drive for this amino acid has led to unique agricultural systems and food webs that may not exist if we could produce it. For example, lysine production for animal feed is currently a major industry for optimizing livestock growth. If it were non-essential to animals, this industry may not exist, and we may not feel the need to farm so extensively.
I also wonder if it would be possible that animals have developed into lysine dependent over millions of years, in the sense that it was once possible non-essential. In this case, it would’ve been just a self-imposed evolutionary change.
As you can see, I’ve done my best to personalize this page so far. Yippee!
Week 2 HW: DNA READ, WRITE & EDIT
PART 0: BASICS
I have attended all lectures and recitation necessary to prepare for this week.
PART 1: GEL ART & BENCHLING
I made my free account on Benchling, following Ice’s tutorial in class on Lambda DNA. I then played around with Ronan’s website and Benchling’s Digest feature to try to come up with something I liked. Ultimately, I came up with something that looks vaguely like a lucky cat (if you squint).
PART 2: GEL ART & RESTRICTION DIGESTS
Look at my design here!
PART 3: DNA DESIGN CHALLENGE
3.1
The protein I will choose for this is CLOCK which is apparently responsible for regulating the 24H mammalian circadian rhythm. I chose this because the concept of a circadian rhythm in general is interesting to me and its interesting that it’s regulated by a protein and not some psychological factor. Although I guess if you deep dived into all human actions, you could boil it down to the result of some complex protein interactions.
To optimize my codon sequence, I used a tool offered by Twist. Since I’m just playing around with the protein, I left most options blank, for example, “Sites to Avoid Introducing.”
Codon optimization is necessary because different organisms prefer different synonymous codons to encode the same amino acid. Although the genetic code is universal, the frequency with which specific codons are used varies between species. If a gene is expressed in a host organism that does not frequently use certain codons, translation can become slow or inefficient. Optimizing codon usage ensures that the DNA sequence matches the host organism’s codon bias.
I chose to optimize the codon sequence for Homo sapiens because the CLOCK protein is a human transcription factor and is typically studied in mammalian cell systems. Since the goal is to express functional CLOCK protein in a human cellular environment, optimizing for human codon bias ensures efficient translation using human tRNA pools and supports proper protein folding and regulation. Additionally, expressing CLOCK in human cells preserves the relevant post-translational modifications and cellular context necessary for its biological function in circadian regulation.
To produce this CLOCK protein form optimized DNA, both in vivo and in vitro systems can be used.
In vivo methods (cell-dependent) involve inserting the optimized CLOCK DNA sequence into a plasmid expression vector under the control of a strong promoter. This plasmid can then be introduced into human cells. Once inside the cell:
The plasmid enters the nucleus, where transcription occurs
mRNA is processed
mRNA is translated
tRNA pairs with complementary codons on mRNA
The protein is synthesized
The protein folds
Because the DNA was optimized for Homo sapiens, this translation efficiency would be quite high.
In vitro methods (cell-free) involve adding the optimized DNA into a biochemical mixture containing things like tRNAs, amino acids, purified ribosomes, etc. In these systems:
DNA first transcribed into mRNA
Ribosomes in the extract translate the mRNA directly into protein
The reaction follows the same biochemical principles of transcription/translation.
These systems allow rapid protein production and tighter control over experimental conditions, though they may lack some post-translational modifications found in living cells.
In both methods, the fundamental process remains the same.
PART 4: PREPARING A DNA ORDER
I followed the tutorial on the homework page.
PART 5: DNA READ/WRITE/EDIT
5.1 DNA READ
To be honest, I would want to sequence my own DNA the most. All of these other things are probably frequently sequenced in research, but there’s probably a <1% chance that my DNA would ever be sequenced in a relevant way. I could learn a lot about myself that would be almost impossible for me to learn otherwise.
I would use Illumina sequencing to do so. It is a second-generation technology and is well-suited for human DNA because it provides high-throughput and accuracy at a relatively low cost. The input would be genomic DNA extracted from my cells (like blood), which would then be fragmented into short pieces. After fragmentation, adapters would be ligated onto both ends of each fragment, allowing the fragments to bind to a flow cell and serve as primer-binding sites. These fragments are amplified through bridge PCT to create clusters of identical DNA copies. Here, sequencing occurs through sequencing-by-synthesis. Fluorescently labeled reversible terminator nucleotides are added in each cycle and each incorporated base emits a distinct fluorescent signal that is detected by a camera. After imaging, the terminator is remove and the next cycle can begin. Base calling is performed by identifying the fluorescent signal at each cycle to determine the sequence of each fragment. The output would ultimately be millions to billions of short DNA reads.
5.2 DNA WRITE
I don’t know much about DNA synthesis and what’s actually possible. I think sensors would be cool although I’m not creative enough to come up with novel ideas. If I had to choose, I think a biomaterial would be most up my alley, like synthetic spider silk (which is really hard) or something inspired by spider-silk for textiles. The construct could also include an anchoring domain to coat cotton fibers or a crosslinking handle for durability, enabling a bio-based coating or fiber additive that adapts comfort and moisture handling without petroleum-derived polymers. It would just be very interesting to me and it would be a very unique material to work with if I was actually able to synthesize something with the same relative strength as spider silk.
I would use chemical DNA synthesis followed by gene assembly. In modern DNA synthesis, short DNA fragments are chemically built one nucleotide at a time on a solid support. Because longer genes cannot be synthesized in a single piece, multiple short fragments are designed with overlapping regions and then assembled enzymatically into the full-length gene. After assembly, the construct would be inserted into a plasmid and verified using Sanger sequencing where DNA polymerase copies the template strand in the presence of fluorescently labeled chain-terminating nucleotides. When a terminator is incorporated, elongation stops, producing fragments of different lengths that are separated and read to determine the sequence. This method is highly accurate and well suited for confirming a single engineered gene, though it is relatively low-throughput and not ideal for sequencing very large genomes.
5.2 DNA EDIT
I think I would probably be most interested in conservation. I would edit the DNA of reef-building corals to enhance their resilience to oceanic changes. Coral bleaching occurs when heat stress disrupts the relationship between corals and their symbiotic algae, leading to widespread reef decline. Rather than drastically altering the organism, I would focus on targeted edits to genes involved in heat shock response, oxidative stress regulation, and cellular repair pathways. Enhancing the expression or function of these stress-response genes could increase thermal tolerance and reduce bleaching under moderate heat stress.
I would use CRISPR-based genome editing technologies, as they allow precise and targeted modifications to specific DNA sequences. CRISPR systems use a guide RNA designed to match a target gene sequence and a DNA-cutting enzyme to create a break at that location. The cell’s natural DNA repair machinery then repairs the break, either introducing small changes or incorporating a designed DNA template to achieve a specific modification. Preparation would involve identifying the target coral genes, designing guide RNAs to match those sequences, and preparing the necessary components—such as the Cas enzyme, guide RNA, and potentially a repair template. These components would be delivered into coral embryos or cells using appropriate transformation methods. While CRISPR offers high precision compared to older gene-editing tools, limitations include variable editing efficiency, potential off-target edits, and challenges associated with delivering editing machinery effectively in marine organisms.
Week 3 HW: Lab Automation
Part 1: Python Script
Despite having taking 6.100A, I am still not very adept with Python. I am even less skilled with Google Colab, so I had no choice but to use the GUI to generate my Python script. I produced this beautiful piece of art:
I also created this one:
Part 1: Post-Lab Questions
Question 1
The paper I chose is titled “AssemblyTron: flexible automation of DNA assembly with Opentrons OT-2 lab robots.”
This paper introduces AssemblyTron, an open-source Python framework that automates DNA assembly workflows by linking j5 DNA design outputs directly to execution on the Opentrons OT-2 liquid handling robot, addressing a major bottleneck in the “Build” step of synthetic biology’s Design–Build–Test–Learn (DBTL) cycle.
The software can do a variety of tasks. For example, it parses combinatorial design files, generates deck setup instructions, tracks reagent volumes and concentrations, optimizes PCR conditions (including gradient annealing calculations), and produces robot-ready protocols with minimal human intervention.
The authors demonstrate that AssemblyTron can automate PCR setup with optimized annealing temperature gradients, Golden Gate assembly, and homology-based assembly (IVA/AQUA). Performance (transformation efficiency and assembly fidelity) was comparable to manual methods. Overall, the platform reduces time, training burden, cost, and human error in molecular cloning while increasing accessibility to automated synthetic biology workflows.
Question 2
One semi-final concept is: Automated high-throughput screening of heavy-metal–capturing protein/peptide hydrogels to identify formulations that (1) bind Pb²⁺/Cu²⁺/Ni²⁺ strongly, (2) remain mechanically stable, and (3) can be regenerated (release metals on command for reuse).
Rather than focusing on a single formulation, I intend to use automation to systematically explore a formulation space defined by binding motif sequence (e.g., histidine-rich, cysteine-rich, or acidic domains), polymer concentration, crosslink density, pH, and salt conditions. This approach allows rapid mapping of structure–property–function relationships for environmentally relevant remediation materials.
Using the Opentrons OT-2, I will automate preparation of a 96-well hydrogel library. The robot will dispense defined volumes of protein or peptide stock solutions, buffers at varying pH, salt solutions, and crosslinking agents to generate a matrix of conditions with built-in replicates. After gel formation, the OT-2 will add standardized metal solutions and perform timed incubations. By measuring the depletion of metal ions from solution, I can calculate binding capacity and compare performance across formulations.
To evaluate reusability, the robot could also perform regeneration cycles by washing gels and introducing elution buffers to release captured metals. Subsequent rebinding assays will quantify how much capacity is retained after multiple cycles. This enables screening not only for binding strength but also for material durability and practical reuse potential. Where possible, I will include simple mechanical proxies to confirm that optimized metal-binding formulations still form stable hydrogels.
To support the workflow, I plan to design and 3D print custom tube adapters and plate alignment fixtures to improve reproducibility and organization of reagents. If available, Ginkgo Nebula can be used to help design and track sequence variants, generate plate maps, and manage combinatorial condition matrices.
References: Bryant, J. A., Kellinger, M., Longmire, C., Miller, R., & Wright, R. C. (2023). AssemblyTron: Flexible automation of DNA assembly with Opentrons OT-2 lab robots. Synthetic Biology, 8(1), ysac032. https://doi.org/10.1093/synbio/ysac032
Week 4 HW: Protein Design Part I
Part 1: Conceptual Questions
Assuming protein mass of meat is 20%, 0.20 x 500=100g of actual protein in this meat. If average residue is roughly 100 Da, I can probably assume that’s roughly 100g/mol, meaning I now have 1 mol of residue. In one mole, there’s roughly 6 x 1023 molecules. Thus, there are roughly 6 x 1023 molecules of amino acids in 500 grams of meat
We digest these proteins, not incorporate them into our bodies. We rebuild our own proteins using molecular tools in our body, that may or may not come from what we eat.
There are only 20 “natural” amino acids because that’s what biology standardized our amino acids as. These amino acids have good chemical diversity, synthetic accessibility and different constraints. Now that we have evolutionarily reached this place, it is difficult to incorporate more without rewiring our whole system. As to why there’s only 20 versus like 100, this could be because if there were too many, it would be rather costly and inefficient, so it’s better to keep our biological systems simple.
Yes, noncanonical amino acids are frequently made, and they can also probably be genetically encoded as well. For example, fluoroleucine or p-iodo-phenylalanine.
Before enzymes & life, amino acids came from multiple plausible sources. For example, strecker synthsis in watery environments, delivery by meteorites (they were found in chondrites), or even atomsphere/UV activity.
They are left handed
Yes, proteins contain other helical motiefs like π-helix, collagen triple helix, β-helix / solenoid-like helices. New helical patterns can be found by analyzing high resolution structures.
Most molecular helices are right hanaded because of chirality bias in L-amino acids. Another reason could be that right-handed packing avoids clashes and are better for L residues in general (so as a result of sterics).
They have sticky edges where backbone hydrogen bond donors/acceptors are exposed at the sheet edges, and adding on another strand would satisfy H-bonds. This allows β-sheets to extend into larger assemblies easily.
Drivers include bacbkone hydrogen bonding, hydrophobic effect and shape complentarity. Growth is probably also kinetically favorable.
Amyloid diseases tend to form β-sheets because the cross-β amyyloid architecture is very stable and can form many sequences once misfolded. This ultimately templates further misfolding and thus createspersistent aggregates that disrupt cells.
Yes, they are often found in nanofibers, hydrogels, templates for mineralization, etc.
A simple, reliable design is a β-hairpin that self-assembles with controlled registry:
Design rules:
Use alternating hydrophobic / polar residues to create amphipathic strands
Use a tight turn motif
Put charged residues at ends to control solubility and alignment
Keep strands ~6–10 residues each for clean hairpins
with hydrophobes on one face and charged/polar on the other.
Part 2: Protein Analysis and Visualizations
I’m picking the Sonic hedgehog protein. I recently found out it exists and has a silly name. It is an important signaling molecule that plays a role of embryonic development in animals, so it’s pretty cool that such a silly protein name has quite an important role in our lives.
The amino acid sequence is as follows:
MLLLARCLLLVLVSSLLVCSGLACGPGRGFGKRRHPKKLTPLAYKQFIPNVAEKTLGASGRYEGKISRNS
ERFKELTPNYNPDIIFKDEENTGADRLMTQRCKDKLNALAISVMNQWPGVKLRVTEGWDEDGHHSEESLH
YEGRAVDITTSDRDRSKYGMLARLAVEAGFDWVYYESKAHIHCSVKAENSVAAKSGGCFPGSATVHLEQG
GTKLVKDLSPGDRVLAADDQGRLLYSDFLTFLDRDDGAKKVFYVIETREPRERLLLTAAHLLFVAPHNDS
ATGEPEASSGSGPPSGGALGPRALFASRVRPGQRVYVVAERDGDRRLLPAAVHSVTLSEEAAGAYAPLTA
QGTILINRVLASCYAVIEEHSWAHRAFAPFRLAHALLAALAPARTDRGGDSGGGDRGGGGGRVALTAPGA
ADAPGAGATAGIHWYSQLLYQIGTWLLDSEALHPLGMAVKSS
According to the Google Colab notebook:
The length of the protein is: 462 aminoacids.
The most common amino acid is: A, which appears 57 times.
Humans have three Hedgehog homologs: SHH, IHH and DHH. Across species, SHH is conserved. This protein belongs to the Hedgehog signaling protein family.
This particular structure was Deposited: 2019-06-28 and Released: 2019-11-13. It can be considered a good quality structure as the experiment type is XRD with a resolution of 1.43 Å. This is a very small resolution value, and means atomic positions are pretty well defined.
In this particular entry, there are other ions such as Zinc, Magnesium, as well as solvent molecules. This was found by clicking “Ligand Interactions”
This protein belongs to the Hedgehog family of signaling proteins. SHH proteins have a conserved N-terminal signaling domain (Shh-N) found across animals. In structural classification systems (like SCOP, CATH), this domain is placed in a signaling/ligand family with a unique fold that binds Zn²⁺ and interacts with receptors such as Patched. So in structural classification terms, this is not a generic enzyme fold but rather a specific morphogen fold conserved among Hedgehog homologs.
I then used PyMol to visualize my structure.
On the top, from left to right, the images are: ribbon, cartoon and ball-and-stick visualizations. On the bottom, the protein is colored by secondary structure, residue type, and the surface.
From what I can see, it definitely has more helices than sheets. I want to say all the residues are equally represented. I represented hydrophobic residues in orange, and everythign else (varying in charge) in other colors: blue for + charge, red for - charge, cyan for no charge. But other than that, thee are less hydrophobic residues than hydrophilic. Overall, there doesn’t seem to be a very deep binding pocket from any angle.
Part 3: Using ML-Based Protein Design Tools
Deep Mutational Scans
I chose to use model esm2_t6_8M_UR50D
The dark blue means that mutations are not very likely to happen. Around the 400-410 mark, there is a dark blue column that is similar across all the different variations, meaning that this location is probably not extremely likely to mutate. I’m kind of intrigued as to why this could be, and frankly I don’t know that I can come with any good explanation for this.
Latent Space Analysis
This is what the code produced.
My plot looks like a single dense blob with smooth gradients (especially along TSNE3). This suggests that there are no strongly separated protein families. Furthermore, TSNE3 captures a continuous variation in possibly protein length, compositional differences or evolutionary divergence. This looks more like a continuum than discrete groups.
The embedding does not show sharply separated clusters, suggesting that the proteins form a continuous similarity space rather than distinct families. My protein lies within a dense region of the embedding, indicating that it shares sequence-level similarity with many neighboring proteins. The local neighborhood likely contains proteins with related sequence motifs or structural features.
Protein Folding
I was able to find my protein. The first image shows the complete sequence, which seems to differ from the original structure. However, it is difficult to tell because they are oriented differently. When adding mutations (the second image), it seems that the protein is quite susceptible to mutational changes. I deleted a very large chunk, however, and it’s possible that 20 amino acids affects it much less than 50 amino acids deleted.
The following amino acids were generated with their subsequent perplexity scores:
The known binder FLYRWLPSRRGG was added for comparison.
Part B:
There was an apparent issue with the predictions, as the final letter was X, though AlphaFold does not accept this. To solve this issue, any X was replaced with an A:
The images are somewhat blurry, but it just seems like the peptides tend to be very surface bound, typically near the β-barrel region. Interestingly enough, even the known peptide did not have a very high ipTM score, as there were still orange and yellow parts, which made me very curious. iPTM scores were: 0.38, 0.23, 0.38, 0.29, 0.31, for known and index 0-3, respectively. The AlphaFold ipTM score is a key metric for evaluating the accuracy of protein-protein, protein-nucleic acid, or protein-ligand interaction models. It specifically measures the confidence in the relative positions and orientation of the two interacting chains. Values above 0.8 are generally considered high-confidence predictions, while values below 0.6 usually indicate a failed prediction. Evidently, everything I had was a failed prediction, though interestingly enough, index 2 had the same score.
Part C:
Using PeptiVerse, I got the following images:
These predictions pretty much align with what I actually saw in AlphaFold, particularly with the weak binding. Between the known, index 0, 1 and 4, the binding predictions were marginally different, which is suprising because the known had a higher score, and the same score as index 2. However, index 2 has a relatively large jump in predicted affinity. Other than that, they were all predicted to be soluble and non-hemolytic.
If I had to choose a peptide to advance, I would pick index 2, only because on paper it looks slightly stronger than the other options with a slightly higher iPTM and binding affinity score. But to be honest, if I were tasked with this, I would try to generate other peptides first because these are all very poor options.
Part D:
To follow the homework, I had my target protein with the A4V mutation, and I chose 3 samples with a binder length of 12. I also checked off the boxes: Hemolysis, Affinity, Solubility and Motif. For the motif positions, I chose 2-7 because I wanted it to bind near the site 4 mutation.
The following peptides were generated:
‘EKLQCKKTFENQ’
‘KVKQCGFTQGDE’
‘STESGDTSYGTA’
Unsurprisingly to me, these optimized peptides had weak predicted binding affinities according to peptiverse. It’s possible that I just did not pick a good site to bind to, but part of me feels that the tools for predicting proteins are not advanced enough to make significant predictions. They differ from my PepMLM peptides very greatly though, as there seem to be a different general composition and more variety in amino acids.
In theory, if they were worth advancing, I would make sure to observe their binding and activity first. Then, I would try to assess stability and pharmacokinetic propertie,s and stability and half-life would be very useful measurements here. I think it would be worth evaluating almost everything, as in order to be used in clinical applications, peptides should be seriously screened as to not interfere with the human body.
Part 3: Lab
To be honest this lab was super confusing. I generated all of these things from the Google Colab that was generated up until the “stop here” point. This is what was generated:
I identified the probable transmembrane helix by locating the longest hydrophobic stretch in the sequence, which spans approximately residues 40–62. Residues before this region were treated as soluble. Based on this topology estimate, I selected at least 2 mutations from the transmembrane region and at least 2 from the soluble region, prioritizing substitutions with high mutation scores and avoiding changes likely to disrupt conserved or experimentally sensitive residues.
A good set of five mutations would be:
S9Q (soluble region) – Position 9 is in the N-terminal soluble region. The mutation S→Q had a high computational score and similar polarity, so it likely preserves structure while allowing variation.
C29R (soluble region) – Position 29 is also in the soluble domain and appears multiple times in the mutation-score ranking. Substituting arginine introduces a charged residue that may improve solubility while remaining tolerated experimentally.
Y39L (soluble / boundary region) – Position 39 sits right before the transmembrane helix. The mutation to leucine had a strong score and may stabilize the transition into the hydrophobic helix because leucine is hydrophobic.
K50L (transmembrane region) – Position 50 lies inside the predicted transmembrane helix. Replacing lysine (charged) with leucine (hydrophobic) should better match the membrane environment. It also had the highest LLR score, suggesting it is strongly favored.
N53L (transmembrane region) – Position 53 is also inside the hydrophobic helix. Substituting leucine increases hydrophobicity and is consistent with residues typically found in membrane-spanning helices.
I selected mutations with high predicted scores, avoided stop mutations or positions that eliminated protein expression in the experimental dataset, and ensured the substitutions were biophysically reasonable for their regions. For soluble regions I chose substitutions that maintain polarity or introduce tolerated charges, while for the transmembrane helix I favored hydrophobic substitutions that stabilize membrane insertion
Week 6 HW: Genetic Circuits Part I
Part 1: Homework Questions
Phusion High-Fidelity PCR Master Mix contains several components required for DNA amplification. The Phusion DNA polymerase is the enzyme that synthesizes new DNA strands; it has proofreading (3’→5’ exonuclease) activity, which greatly reduces mutation rates compared with standard Taq polymerase. The mix also includes dNTPs (deoxynucleotide triphosphates), which are the nucleotide building blocks incorporated into the newly synthesized DNA. A reaction buffer provides the correct chemical environment (pH, salts, stabilizers) to maximize enzyme activity and fidelity. Mg²⁺ ions (usually from MgCl₂) act as essential cofactors for polymerase function and influence enzyme efficiency.
Primer annealing temperature is mainly determined by the melting temperature (Tm) of the primers, which depends on their sequence composition. Primers with higher GC content generally have higher Tm because G–C pairs form three hydrogen bonds compared with two in A–T pairs. Primer length also affects Tm, as longer primers form more stable duplexes with the template. Salt concentration and Mg²⁺ levels in the reaction buffer influence DNA duplex stability and therefore the optimal annealing temperature. Additionally, primer secondary structure or mismatches can reduce binding stability and may require lower annealing temperatures.
PCR and restriction enzyme digestion both produce linear DNA fragments but operate through different mechanisms. PCR uses primers and a DNA polymerase to amplify a specific DNA region from a template through thermal cycling. This method is highly flexible because primers can introduce mutations, overhangs, or homologous regions, making PCR useful when generating fragments for cloning or modifying sequences. In contrast, restriction enzyme digestion uses enzymes that recognize specific short DNA sequences and cut at those sites, producing predictable fragments with defined ends (often sticky or blunt). The digest protocol is simpler and faster if the required restriction sites already exist in the DNA. PCR is preferable when amplifying small regions, adding sequences, or working from low DNA amounts, while restriction digests are preferable when cutting large plasmids or isolating fragments with existing restriction sites without introducing polymerase errors.
For Gibson Assembly, DNA fragments must have overlapping homologous sequences at their ends so they can anneal during the assembly reaction. These overlaps are usually designed into PCR primers, ensuring that adjacent fragments share complementary sequences. After PCR amplification or digestion, the fragments should be checked by gel electrophoresis to confirm the correct size and purity. It is also important to verify that the overlaps match the intended assembly order and that no incompatible restriction sites remain within the overlaps.
Golden Gate Assembly is a DNA cloning method that uses Type IIS restriction enzymes and DNA ligase in a single reaction to assemble multiple DNA fragments in a defined order. Unlike standard restriction enzymes, Type IIS enzymes cut outside their recognition site, producing custom overhangs that can be designed to be unique for each fragment. During the reaction, the restriction enzyme cuts the DNA to create compatible overhangs, and DNA ligase simultaneously joins the fragments together. Because the recognition sites are removed after ligation, the final assembled product cannot be cut again, allowing the reaction to proceed efficiently toward the correct construct.
To model this, I just chose two random sequences (LACMG that we worked with once, and a gibberish one that I have for some reason). I went to Benchling to model the assembly, and this is what came out of it:
Hopefully the steps I followed were right because I just followed the template that I Was provided through Benchling.
Part 2: Kernel
First, I just messed around with the Bacterial Demos Repository, and clicked o random parts to see what the platform felt like to me. i then went to the J23117 Promoter and viewed what was going on there. I went to the simulate button to simulate the process at a random timestep and duration of my choosing, with E. Coli as the chassis. Here is a screenshot of my results:
Then, I created a repressilator, with the comparison below of the one located in bacterial demos:
I’m fairly confident that I did mine correctly as the simulations look quite similar, and the overall set-up contains similar components.
Afterwards, I just put together some random constructs after fumbling around Kernel forever. Not sure if there were guidelines but I thought they would be cool.
Predictions here:
I think overall, the simulations have similar outputs as to what my predictions are. Not very much to comment on either!
Week 7 HW: Genetic Circuits Part II
Part 1: Homework Questions
IANNs allow cells to perform analog, weighted, decision-making rather than simple binary logic. Traditional genetic circuits usually implement Boolean gates, where inputs are treaed as on/ofof signals and outputs are discrete. In contrast, IANNS use components whose activities can vary continuously, allowing inputs to contribute different weights to a final output. This allows cells to integrate multiple signals simultaneously, filter noise and produce grade responses. IANNs overall can scale more easily to complex behaviors, making them better suited for biological environments with continuos noisy signals.
A useful application of IANN would be a smart probioitic diagnostic cell that detects complex disease states in the gut. Inputs: The circuit could receive several molecular signals associatied with inflammation, such as nitric oxide levels, reactive oxygen species or other responsive promoters. Each input drives production of regulators that act with different weights on the expression of a reporter gene.
Processing: Each regulator modifies the stability or translation of the reporter mRNA. If the combined signal exceeds a threshold, the cell expresses a fluorescent protein or therapeutic molecule. This allows the cell to classify complex physiological states, rather than triggering on a single biomarker that might fluctuate naturally. Output: Low combined signal → little or no reporter expression. Moderate signal → weak expression. High combined signal → strong reporter or drug release.
Limitations: There are several constraints that could limit implementation. For example, gene expression fluctuations can distort weights and thresholds, making outputs inconsistent. Promoters and translation systems may saturate, preventing precise analog weighting. Large networks can slow cell grwoth or destabilize circuits. Furhtermore, large networks could slow cell growth or dsetabilize circuits and tuning these weights rqequires iterative experimental optimization.
stuff
Part 2: Fungal Materials
Several commercial materials are made from fungal mycelium. One example is mycelium-based packaging produced by Ecovative, which grows fungal mycelium through agricultural waste to create molded protective packaging that replaces polystyrene foam. Mycelium composites are also used for insulation panels and structural building materials, such as mycelium bricks and boards that can be grown into shape. Another emerging product is mycelium leather, developed by companies like MycoWorks and Bolt Threads, which produces flexible sheet materials that mimic animal leather for fashion products. These fungal materials offer several advantages over traditional materials. They are renewable and biodegradable, can be grown from agricultural waste, and require much lower energy input than plastics or synthetic foams. Mycelium materials can also be grown directly into molds, reducing manufacturing steps and waste. However, they also have disadvantages: mechanical strength and durability are generally lower than plastics or synthetic composites, they can be sensitive to moisture, and scaling production with consistent material properties remains challenging.
One useful direction would be engineering fungi to produce stronger or more functional mycelium materials. For example, genes could be modified to increase chitin or glucan crosslinking in the cell wall to improve stiffness and toughness of mycelium composites used in construction or packaging. Fungi could also be engineered to produce functional biomaterials, such as mycelium that incorporates conductive proteins for bioelectronics or that secretes adhesives or antimicrobial compounds. Another application could be fungi engineered to capture pollutants, such as heavy metals or microplastics, allowing grown fungal materials to act as environmental filtration systems. Fungi offer several advantages as engineering hosts compared with bacteria. Because fungi are eukaryotes, they perform complex post-translational modifications and protein folding, which are necessary for many enzymes and biomaterials that bacteria cannot produce efficiently. Filamentous fungi naturally grow large structural networks (mycelium), allowing them to form macroscopic materials without external scaffolds, something bacteria generally cannot do. Fungi also secrete large amounts of enzymes and proteins, making them good platforms for producing extracellular biomolecules or structural polymers. However, fungi are generally harder to genetically manipulate than bacteria: transformation efficiencies are lower, genetic tools are less standardized, and growth is typically slower.