<NAFISAT ABDULLAHI> — HTGAA Spring 2026

cover image cover image

About me 🦠🧬

  • Let food be thy medicine and medicine be thy food! - Hippocrates
  • I am interested in how crypto, blockchain technology and decentralized autonomous organizations (DAOs) can be used to accelerate science.
  • I have a Masters in Biochemistry, with a focus on Nutritional and Industrial Biochemistry.
  • I am exploring synthetic biology and protein engineering for food applications.

Contact info

Homework

Labs

Projects

Subsections of <NAFISAT ABDULLAHI> — HTGAA Spring 2026

Homework

Weekly homework submissions:

  • Week 1 HW: Principles and Practices

    First, describe a biological engineering application or tool you want to develop and why. This could be inspired by an idea for your HTGAA class project and/or something for which you are already doing in your research, or something you are just curious about. I want to develop a biological engineering application that uses directed evolution to develop super-enzyme(s) capable of converting cassava peels which are a rich, low-cost source of lignocellulosic waste into prebiotic oligosaccharides. Cassava peels contain Xylan (a type of hemicellulose) that, when broken down through alkali pretreatment and enzymatic hydrolysis, yields Xylooligosaccharides (XOS), which acts as a prebiotic by promoting the growth of beneficial gut bacteria and can be used for food and nutraceutical applications.

  • Week 2 HW: DNA Read, Write & Edit

    Part 1: Benchling & In-silico Gel Art First I created an account on Benchling Then, I imported Escherichia phage Lambda, complete genome with GenBank ID: J02459.1 Fig 1: Linear map of lambda DNA (LAMCG) imported into Benchling. Next, I simulated restriction enzyme digestion with the following enzymes EcoRI HindIII BamHI KpnI EcoRV SacI SalI Fig 2: Simulating restriction enzyme digestion of lambda DNA on Benchling.

  • Week 3 HW: Lab Automation

    This week I designed the Bitcoin logo as automation art and converted it into a grid of XY dot coordinates that can be dispensed by the Opentrons OT-2 onto an agar plate. On the Autormation Art Interface, I browsed the gallery and selected the Bitcoin design, which i redesigned using pink, purple and blue.

  • Week 4 HW: Protein Design I

    Part A. Conceptual Questions I skipped 4 and 11.

  1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons) Based on Claude Sonnet 4.6 using the prompt: “How many molecules of amino acids are in 500 grams of meat? Assume the average amino acid has a molecular weight of 100 Daltons”

Subsections of Homework

Week 1 HW: Principles and Practices

Heaps of Cassava peels Heaps of Cassava peels
  1. First, describe a biological engineering application or tool you want to develop and why. This could be inspired by an idea for your HTGAA class project and/or something for which you are already doing in your research, or something you are just curious about.

I want to develop a biological engineering application that uses directed evolution to develop super-enzyme(s) capable of converting cassava peels which are a rich, low-cost source of lignocellulosic waste into prebiotic oligosaccharides. Cassava peels contain Xylan (a type of hemicellulose) that, when broken down through alkali pretreatment and enzymatic hydrolysis, yields Xylooligosaccharides (XOS), which acts as a prebiotic by promoting the growth of beneficial gut bacteria and can be used for food and nutraceutical applications.

Nigeria is the world’s largest producer of cassava. For every ton of cassava processed, about 10-15% of the total weight is lost in the form of wet peels. Over 95% of these peels are currently wasted, often discarded in open dumpsites where they rot, produce methane, or are burnt, representing a missed opportunity for value creation within local food systems. At the same time, there is a growing global demand for prebiotic oligosaccharides, due to their role in supporting gut health, metabolic health, and immune function. Despite high local demand, Nigeria imports most functional food ingredients, including prebiotics.

  1. Next, describe one or more governance/policy goals related to ensuring that this application or tool contributes to an “ethical” future, like ensuring non-malfeasance (preventing harm). Break big goals down into two or more specific sub-goals. Below is one example framework (developed in the context of synthetic genomics) you can choose to use or adapt, or you can develop your own. The example was developed to consider policy goals of ensuring safety and security, alongside other goals, like promoting constructive uses, but you could propose other goals for example, those relating to equity or autonomy.

The overaching governance goal is to ensure that enzyme-based agricultural biomass conversion technologies advance sustainable development, public health, and economic inclusion without contributing to biosafety risks, environmental harm or extractive IP practices.

Goal 1 - Biosafety and Containment That is, to prevent harm while recognizing limited biosafety infrastructure.

Sub-goals

  • To ensure safe handling of engineered enzymes and host organisms in laboratories and pilot facilities.
  • To avoid environmental contamination from enzyme residues or production organisms.

Goal 2 - Ethical and Regulatory Accountability To strengthen trust and legitimacy through oversight.

Sub-goals

  • To align research and deployment within Nigerian biosafety laws and food safety regulations
  • To prevent diversion into environmentally harmful or monopolistic practices.
  • To ensure that farming communities/processors generating cassava waste are not excluded from decision-making or value creation.

Goal 3 - Industry Standards To prevent unsafe or environmentally damaging scale-ups

Sub-goals

  • To establish minimum quality and safety standards for enzyme-produced prebiotics
  • To prevent pollution of water bodies near cassava processing zones.

Goal 4 - Equitable IP, Data Sharing and Value Distribution To avoid repeating extractive biotechnology models

Sub-goals

  • To enable local researchers and stakeholders co-own innovations
  1. Next, describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”). Try to outline a mix of actions (e.g. a new requirement/rule, incentive, or technical strategy) pursued by different “actors” (e.g. academic researchers, companies, federal regulators, law enforcement, etc). Draw upon your existing knowledge and a little additional digging, and feel free to use analogies to other domains (e.g. 3D printing, drones, financial systems, etc.).

Action 1 - Biosafety by Design Purpose Nigeria’s biosafety system, overseen by the National Biosafety Management Agency (NBMA) is primarily focused on genetically modified crops, not enzyme engineering or cell-free biocatalysis. I propose enzyme-specific biosafety guidance relevant to the Nigerian context and industrial conditions.

Design

  • Actors : NBMA, Universities, research institutes (FIIRO, IITA)
  • Key elements:
    • Preferential use of cell-free enzyme systems or GRAS microbial hosts
    • Mandatory enzyme inactivation steps before waste disposal
    • Tiered containment standards matching laboratory capacities
    • Regulators periodically audit and validate compliance.

Assumptions:

  • Enzymes pose lower risk than living GMOs but still require oversight
  • Clear, practical guidelines increase compliance more than imported standards

Risks of Failure & “Success”:

  • Failure - Informal or unregulated adoption bypassing safety constraints
  • Success - Excessively conservative safety requirements may slow adoption or raise costs for low-resource users.

Action 2 - Bioeconomy Ethics and Community Oversight Boards Purpose To embed ethical review beyond technical biosafety, particularly in cassava producing communities.

Design

  • Actors : Federal Ministry of Science, Technology & Innovation (FMSTI), universities, local governments, farmer associations.
  • Mechanism:
    • Establish regional Bioeconomy Ethics Boards modelled after IRBs (Institutional Review Boards)
    • Require community consultation and benefit-sharing plans for pilot processing plants.
    • Integrate social scientists and local representatives.

Assumptions:

  • Community inclusion reduces resistance and improves sustainability
  • Ethical review can be localized without excessive bureaucracy.

Risks of Failure & “Success”:

  • Failure - Boars lack authority or funding
  • Success - Political capture or elite dominance in decision-making

Action 2 - Blockchain-Enabled DAO Based Governance Purpose: To manage enzyme IP, processing data, and revenue flows in a transparent and participatory way.

Design:

  • Actors: Researchers, startups, cassava cooperatives, fintech partners.
  • DAO features
    • On-chain registration of enzyme variants and performance data
    • Smart contracts governing licensing to food manufacturers and exporters.
    • Revenue-sharing mechainsms allocating tokens to researchers, processing bodies, and community development funds.

Assumptions:

  • Technical knowledge about DAOs
  • Nigerian fintech adoption lowers barriers to blockchain governance
  • That DAO-based governance can meaningfully represent diverse stakeholders.
  • DAOs can complement, not replace, national IP law.

Risks of Failure & “Success”: Failure - Digital exclusion of rural actors Success - Speculation overwhelms productive governance.

  1. Next, score (from 1-3 with, 1 as the best, or n/a) each of your governance actions against your rubric of policy goals.
Governance ActionA. Biosafety EffectivenessB. Institutional FitC. Env. & Community ProtectionD. Equity & Value RetentionE. Scalability Without Harm
Action 1: Biosafety by Design11231
Action 2: Bioeconomy Ethics & Community Boards22112
Action 3: DAO Governance23213
  1. Last, drawing upon this scoring, describe which governance option, or combination of options, you would prioritize, and why. Outline any trade-offs you considered as well as assumptions and uncertainties. For this, you can choose one or more relevant audiences for your recommendation, which could range from the very local (e.g. to MIT leadership or Cambridge Mayoral Office) to the national (e.g. to President Biden or the head of a Federal Agency) to the international (e.g. to the United Nations Office of the Secretary-General, or the leadership of a multinational firm or industry consortia). These could also be one of the “actor” groups in your matrix.

I would prioritize a layered governance approach that combines Action 1 and Action 2 as the foundational governance infrastructure, while treating Action 3 as a longer-term experimental complement.

Biosafety by design directly reduces the likelihood of environmental or laboratory incidents without relying heavily on enforcement capacity. However it scores weakly on equity and value retention and does very little to address how value flows through the system. However, ethics and community boards addresses ethical risks that biosafety frameworks often ignore. It creates social legitimacy and accountability which are essential for long term sustainability. On the other hand, DAO based governance has high transformative potential but it is also early and highly fragile. While Nigeria’s fintech adoption makes blockchain based coordination plausible, the risks of digital exclusion, speculation and legal ambiguity means DAO governance should not be the back-bone of early stage policy development.

Trade-Offs Considered

  • Speed vs. legitimacy: Action 1 enables rapid, safe deployment; Action 2 slows processes but builds trust and social license.
  • Technical control vs. distributive justice: Biosafety-by-design controls risk but does not redistribute value.
  • Innovation vs. institutional readiness: DAO governance promises equity but currently outpaces regulatory and social readiness.

Assumptions and Uncertainties

  • Assumptions include:
    • That Nigeria’s regulatory agencies can adapt existing biosafety frameworks to enzyme engineering without major reform.
    • That community governance structures can be resourced and insulated from political capture
    • That DAO-based governance can eventually interoperate with national IP law rather than conflict with it.
  • Uncertainties include:
    • How quickly informal enzyme production might scale beyond regulatory reach
    • Whether community boards will have meaningful enforcement power
    • Whether blockchain governance will remain accessible or become exclusionary over time

Reflecting on what you learned and did in class this week, outline any ethical concerns that arose, especially any that were new to you. Then propose any governance actions you think might be appropriate to address those issues. This should be included on your class page for this week.

Some of the ethical concerns that arose were malicious individuals using biotechnological innovation for harm instead of good. Although it was not new to me (movies), it was interesting to understand that it could happen in real life as well. In the Nigerian context, governance must be robust enough to prevent harm, flexible enough to scale, and intentional enough to redistribute value.

Use of AI Tools Disclosure: AI tools (ChatGPT and Gemini) were used to refine wording, improve structure, and adapt the context to the Nigeria’s governance landscape. The core ideas and ethical framing were developed independently.

References

  1. Kumar, R., Næss, G., & Sørensen, M. (2024). Xylooligosaccharides from lignocellulosic biomass and their applications as nutraceuticals: a review on their production, purification, and characterization. Journal of the science of food and agriculture, 104(13), 7765–7775. https://doi.org/10.1002/jsfa.13523

Assignment (Week 2 Lecture Prep)

Homework Questions from Professor Jacobson

Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy? Answer Polymerase has an error rate of approximately 1 in 1,000,000 (1:10⁶). This is significantly more accurate than standard chemical synthesis methods, which have an error rate of about 1 in 100. This is massive in comparison to the length of the human genome, which consists of billions of base pairs (between 10⁹ and 10¹⁰ base pairs). This leads to a significant discrepancy between the error rate and the genome length. If the polymerase simply copied at a rate of one error per million bases, every replication of the human genome would result in thousands of errors (roughly 3,000 errors per copy). Biology deals with this discrepancy through proofreading mechanisms. The DNA polymerase enzyme contains built-in “exonuclease” activity (specifically 3’-5’ proofreading exonuclease and 5’-3’ error-correcting exonuclease). which allows the enzyme to detect when an extension error has occurred, pause to remove the incorrect nucleotide, and then resume extending the DNA strand with the correct base. Additionally, biological systems utilize repair proteins (such as MutS, MutL, and MutH) to further identify and correct mismatches that escape the polymerase’s initial proofreading.

How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?

Answer The number of ways to code for a single protein is astronomical due to the redundancy of the genetic code (where multiple codons can specify the same amino acid). An average human protein requires a DNA sequence of approximately 1036 base pairs. This translates to a chain of roughly 345 amino acids. Since there are synonymous codons for most amino acids (codon redundancy), a protein of this length has a massive number of potential DNA coding sequences (roughly 3³⁴⁵, a number far exceeding the number of atoms in the universe). Most codes don’t work in practice due to several physical and biological reasons;

  • RNA cleavage and degradation - The specific nucleotide sequence can create targets for cellular enzymes that destroy RNA.
  • Secondary structure - The DNA sequence determines the folding of the resulting mRNA molecule. Synonymous sequences (which code for the same protein) can fold into vastly different Minimum Free Energy Secondary Structures. If an mRNA folds into a tight structure that blocks the ribosome or is unstable, translation will fail.
  • Biological function and context - Research into Genomically Recoded Organisms demonstrates that altering codon usage (recoding) is not neutral; it can affect biological functions such as viral resistance and cell doubling times. This indicates that the “synonymous” codes interact differently with the cell’s machinery (e.g., tRNA availability or translation speed).

Homework Questions from Dr. LeProust

What’s the most commonly used method for oligo synthesis currently? Answer Electrochemical-based microarray developed by CombiMatrix in 2005.

Why is it difficult to make oligos longer than 200nt via direct synthesis? Answer The core problem is error accumulation from imperfect stepwise yields. This leads to compounding yield loss, domination of truncated products and accumulation of chemical damages. Beyond 150-200nt, the fraction of correct, full-length, error free oligos becomes impractically low and expensive to purify.

Why can’t you make a 2000bp gene via direct oligo synthesis? Answer Basically the same reason as the question above, separating a 2000nt full-length gene from thousands of near-length failures is practically impossible.

Homework Question from George Church

[Using Google & Prof. Church’s slide #4] What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?

The 10 amino acids generally considered essential for animals (meaning they must be obtained through diet) are Arginine, Histidine, Isoleucine, Leucine, Lysine, Methionine, Phenylalanine, Threonine, Tryptophan, and Valine. The “Lysine Contingency” (from the movie Jurassic Park) was a method of biocontainment where the dinosaurs were supposedly genetically modified to be unable to produce Lysine, which made them dependent on supplemental lysine to survive. The idea is flawed from a biological standpoint because since lysine is an essential amino acid, all animals are technically already lysine-dependent, the contingency was unnecessary or based on a deep misunderstanding of basic biology.

Week 2 HW: DNA Read, Write & Edit

Part 1: Benchling & In-silico Gel Art

First I created an account on Benchling

Then, I imported Escherichia phage Lambda, complete genome with GenBank ID: J02459.1

Fig 1: Linear map of lambda DNA (LAMCG) imported into Benchling.

Next, I simulated restriction enzyme digestion with the following enzymes

  • EcoRI
  • HindIII
  • BamHI
  • KpnI
  • EcoRV
  • SacI
  • SalI

Fig 2: Simulating restriction enzyme digestion of lambda DNA on Benchling.

I then created the following design pattern by simulating combined restriction enzyme digestions in single, double, and triple enzymes.

Fig 3: Design pattern generated by simulated restriction enzyme digestions of lambda DNA on Benchling


Part 2: Gel Art - Restriction Digests and Gel Electrophoresis


Part 3: DNA Design Challenge

3.1. Choose your protein.

One of my ideas for the final project is designing a biosensor-coupled aroma production system in which a detectable scent serves as the output signal of a genetic circuit. Specifically, the project focuses on engineering the biosynthesis of Citral, an aromatic acyclic monoterpene aldehyde which is the primary component responsible for the characteristic lemon scent of Cymbopogon citratus(Lemongrass)

The protein of interest is Geraniol dehydrogenase (GeDH), an NAD⁺-dependent oxidoreductase that catalyzes the oxidation of Geraniol to Citral.

Why GeDH?

This specific protein was chosen for several reasons:

  1. It directly produces the compound responsible for the desired aromatic output, making it a critical component of the biosensor system.
  2. Structurally, GeDH belongs to the medium-chain dehydrogenase/reductase family and functions in the cytosol without requiring complex post-translational modifications such as glycosylation. This makes it well-suited for heterologous expression.
  3. In a biosensor-coupled system, placing the gene encoding GeDH under a stimulus-responsive promoter ensures that the final citral scent is only produced when the target signal is detected. Being at the end of the metabolic chain, GeDH functions as the most efficient reporter enzyme for scent-based detection.

Using UniProt, I obtained a protein sequence for GeDH from Castellaniella defragrans, a gram-negative, strictly aerobic, motile bacterium.

>sp|H1ZV38|GEOA_CASD6 Geraniol dehydrogenase OS=Castellaniella defragrans (strain DSM 12143 / CCUG 39792 / 65Phen) OX=1437824 GN=geoA PE=1 SV=1
MNDTQDFISAQAAVLRQVGGPLAVEPVRISMPKGDEVLIRIAGVGVCHTDLVCRDGFPVP
LPIVLGHEGSGTVEAVGEQVRTLKPGDRVVLSFNSCGHCGNCHDGHPSNCLQMLPLNFGG
AQRVDGGQVLDGAGHPVQSMFFGQSSFGTHAVAREINAVKVGDDLPLELLGPLGCGIQTG
AGAAINSLGIGPGQSLAIFGGGGVGLSALLGARAVGADRVVVIEPNAARRALALELGASH
ALDPHAEGDLVAAIKAATGGGATHSLDTTGLPPVIGSAIACTLPGGTVGMVGLPAPDAPV
PATLLDLLSKSVTLRPITEGDADPQRFIPRMLDFHRAGKFPFDRLITRYRFDQINEALHA
TEKGEAIKPVLVF

3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.

I used NovoPro Reverse Translation tool ➡️ https://www.novoprolabs.com/tools/translate-protein-to-dna to transate the GeoA protein sequence from Castellaniella defragrans

>Reverse Translated sequence of GeoA gene from Castellaniella defragrans
ATGAAYGAYACNCARGAYTTYATHWSNGCNCARGCNGCNGTNYTNMGNCARGTNGGNGGNCCNYTNGCNGTNGARCCNGTNMGNATHWSNATGCCNAARGGNGAYGARGTNYTNATHMGNATHGCNGGNGTNGGNGTNTGYCAYACNGAYYTNGTNTGYMGNGAYGGNTTYCCNGTNCCNYTNCCNATHGTNYTNGGNCAYGARGGNWSNGGNACNGTNGARGCNGTNGGNGARCARGTNMGNACNYTNAARCCNGGNGAYMGNGTNGTNYTNWSNTTYAAYWSNTGYGGNCAYTGYGGNAAYTGYCAYGAYGGNCAYCCNWSNAAYTGYYTNCARATGYTNCCNYTNAAYTTYGGNGGNGCNCARMGNGTNGAYGGNGGNCARGTNYTNGAYGGNGCNGGNCAYCCNGTNCARWSNATGTTYTTYGGNCARWSNWSNTTYGGNACNCAYGCNGTNGCNMGNGARATHAAYGCNGTNAARGTNGGNGAYGAYYTNCCNYTNGARYTNYTNGGNCCNYTNGGNTGYGGNATHCARACNGGNGCNGGNGCNGCNATHAAYWSNYTNGGNATHGGNCCNGGNCARWSNYTNGCNATHTTYGGNGGNGGNGGNGTNGGNYTNWSNGCNYTNYTNGGNGCNMGNGCNGTNGGNGCNGAYMGNGTNGTNGTNATHGARCCNAAYGCNGCNMGNMGNGCNYTNGCNYTNGARYTNGGNGCNWSNCAYGCNYTNGAYCCNCAYGCNGARGGNGAYYTNGTNGCNGCNATHAARGCNGCNACNGGNGGNGGNGCNACNCAYWSNYTNGAYACNACNGGNYTNCCNCCNGTNATHGGNWSNGCNATHGCNTGYACNYTNCCNGGNGGNACNGTNGGNATGGTNGGNYTNCCNGCNCCNGAYGCNCCNGTNCCNGCNACNYTNYTNGAYYTNYTNWSNAARWSNGTNACNYTNMGNCCNATHACNGARGGNGAYGCNGAYCCNCARMGNTTYATHCCNMGNATGYTNGAYTTYCAYMGNGCNGGNAARTTYCCNTTYGAYMGNYTNATHACNMGNTAYMGNTTYGAYCARATHAAYGARGCNYTNCAYGCNACNGARAARGGNGARGCNATHAARCCNGTNYTNGTNTTY

3.3. Codon optimization

Codon optimization is important to maximize the production of the protein of interest when it is expressed in a host organism. This need arises because the native source organism and the expression host often possess different molecular machinery. Although the genetic code is degenerate, different species have evolved preferences for specific synonymous codons, a phenomenon called codon usage bias. So, codon optimization acts as a molecular “translator” of some sorts.

For this project, codon optimization was performed using VectorBuilder, with E.coli selected as the expression host.

Why E. coli?

  • Rapid growth and high yields - E. coli doubles every 20-30 minutes in cheap media, enabling grams-per-liter protein production for quick prototyping.
  • Simple genetics and expression - Plasmid-based systems (e.g., pET/T7) offer tight inducible control without eukaryotic PTMs
  • Cost-effective and scalable - Low setup costs and established protocols make it ideal for iterative testing
Codon optimized DNA sequence of the GeoA gene
ATGAATGATACGCAGGATTTTATTAGCGCGCAGGCGGCAGTACTGCGTCAGGTGGGCGGCCCGCTGGCGGTCGAACCGGTGCGTATCTCGATGCCGAAAGGCGATGAAGTTCTGATTCGTATTGCCGGCGTGGGCGTGTGTCATACCGATCTGGTGTGTCGCGATGGCTTCCCGGTTCCGCTGCCGATTGTGCTGGGCCATGAAGGCAGCGGCACAGTGGAAGCTGTGGGCGAACAGGTTCGCACCTTAAAACCGGGCGATCGCGTTGTGCTGAGCTTTAATAGCTGCGGCCACTGCGGTAACTGTCACGACGGTCATCCGAGCAATTGCCTGCAAATGCTGCCGCTGAATTTTGGTGGTGCCCAGCGCGTTGATGGTGGACAGGTGTTAGATGGCGCCGGTCATCCGGTGCAGAGCATGTTTTTTGGCCAGTCGAGCTTTGGCACCCATGCGGTGGCGCGCGAAATTAATGCGGTGAAAGTGGGCGATGATCTGCCGCTGGAGTTACTGGGACCGCTGGGCTGCGGTATTCAGACCGGCGCCGGCGCCGCCATTAATTCTCTGGGTATTGGCCCGGGCCAGAGCCTGGCCATTTTTGGCGGCGGCGGCGTTGGTCTGAGCGCCCTGCTGGGTGCGCGCGCCGTGGGCGCGGATCGCGTGGTAGTAATCGAGCCGAACGCGGCGCGTCGTGCGCTGGCCCTGGAACTGGGTGCGAGCCACGCGCTGGATCCGCATGCGGAAGGCGATCTGGTTGCGGCCATTAAAGCCGCGACCGGCGGCGGCGCGACCCATAGCCTGGATACGACCGGTCTGCCGCCGGTTATTGGTAGCGCCATTGCGTGCACCCTGCCGGGCGGCACCGTGGGTATGGTGGGACTGCCGGCGCCTGATGCCCCGGTTCCGGCGACCCTGCTGGATCTGCTGAGCAAGTCAGTAACGCTGCGTCCAATTACCGAAGGCGATGCGGATCCGCAGCGTTTTATTCCGCGCATGCTGGATTTTCATCGTGCCGGCAAATTTCCGTTTGATCGCCTGATTACCCGCTATCGTTTTGATCAGATTAATGAAGCGCTGCACGCCACCGAAAAAGGCGAAGCAATTAAACCGGTGCTGGTGTTTTAA

References

Lüddeke, F., Wülfing, A., Timke, M., Germer, F., Weber, J., Dikfidan, A., Rahnfeld, T., Linder, D., Meyerdierks, A., & Harder, J. (2012). Geraniol and geranial dehydrogenases induced in anaerobic monoterpene degradation by Castellaniella defragrans. Applied and environmental microbiology, 78(7), 2128–2136. https://doi.org/10.1128/AEM.07226-11

3.4. You have a sequence! Now what?

To produce GeDH from the optimized DNA sequence, I plan to use both cell-free and cell-dependent systems. The cell-free method will be done first to confirm that the gene is expressed and a functional protein is produced before moving on to a cell-dependent system.

Cell-free

In this case, the optimized DNA sequence will be added to a reaction mixture containing RNA polymerase, ribosomes, tRNAs, amino acids, and nucleotides. The DNA is transcribed into mRNA by RNA polymerase, and the ribosome reads the mRNA in codons, with each codon corresponding to a specific amino acid, which is delivered by a matching tRNA. The amino acids are liked to form a polypeptide chain, which will eventually fold into the functional GeDH enzyme

Cell-dependent

The optimzed DNA will be inserted into an expression vector and introduced into the host organism, E.coli. Inside the cell, the host transcription machinery produces mRNA from the inserted DNA. Ribosomes then translate the mRNA into protein using codon-anticodon pairing. The protein then folds into its 3D structure and becomes active.

3.5. [Optional] How does it work in nature/biological systems?

Describe how a single gene codes for multiple proteins at the transcriptional level. Try aligning the DNA sequence, the transcribed RNA, and also the resulting translated Protein!!! See example below.

Part 4: Prepare a Twist DNA Synthesis Order

image.png image.png

Benchling Link: https://benchling.com/s/seq-a3hKqpNADiAR4fUfT5Sx?m=slm-uo9AlXvO5VzOp4O9Re83

image.png image.png

Part 5: DNA Read/Write/Edit

Week 3 HW: Lab Automation

This week I designed the Bitcoin logo as automation art and converted it into a grid of XY dot coordinates that can be dispensed by the Opentrons OT-2 onto an agar plate.

On the Autormation Art Interface, I browsed the gallery and selected the Bitcoin design, which i redesigned using pink, purple and blue.

image.png image.png

Week 4 HW: Protein Design I

Part A. Conceptual Questions

I skipped 4 and 11.

1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)

Based on Claude Sonnet 4.6 using the prompt: “How many molecules of amino acids are in 500 grams of meat? Assume the average amino acid has a molecular weight of 100 Daltons”

The approach used was a dimensional analysis: We need to go from grams of meat → Daltons → number of amino acid molecules via Avogadro’s number. Meat is roughly ~20% protein by weight (the rest is water, fat, connective tissue) Average amino acid mass = 100 Daltons = 100 g/mol Avogadro’s number = 6.022 × 10²³ molecules/mol

Therefore:

  1. Mass of protein in 500g of meat = 500g × 0.20 = 100g of protein
  2. Moles of amino acids:100g / 100 g/mol = 1 mol of amino acids
  3. Number of molecules: 1 mol × 6.022×1023 = ≈ 6×10²³ molecules

So roughly 6 × 10²³ molecules (almost exactly one mole!) of amino acids is taken with a piece of 500g of meat.

2. Why do humans eat beef but do not become a cow, eat fish but do not become fish?

When humans eat meat or fish, our digestive enzymes break down all the proteins into individual amino acids. These amino acids are then absorbed into the bloodstream as simple building blocks, no cow or fish protein enters intact. The body then uses these building blocks to make human proteins, following instructions from human DNA. The amino acids themselves are identical across species, so a leucine or alanine from beef is the same as human leucine or alanine, only the blueprint differs. It’s like dismantling a LEGO cow and rebuilding it using a human instruction manual.

3. Why are there only 20 natural amino acids?

The existence of exactly 20 natural amino acids is best understood through prebiotic availability, chemical sufficiency, genetic code structure, and evolutionary history.

  • Prebiotic availability: The original 20 natural amino acids likely reflect what was chemically possible before life began. The landmark Miller-Urey experiment in 1953 demonstrated that many of the standard amino acids form spontaneously under simulated early Earth conditions. The amino acids found in the Murchison meteorite confirm that these molecules arise readily from abiotic chemistry
  • Chemical sufficiency: The 20 amino acids collectively cover the chemical diversity needed to build virtually all protein functions. providing hydrophobic residues for protein cores (leucine, valine, isoleucine), charged residues for surfaces (lysine, arginine, aspartate, glutamate), catalytic residues for enzyme active sites (histidine, cysteine, serine), and structural residues (glycine for flexibility, proline for rigidity). Additional amino acids would add complexity without meaningfully expanding functional capability.
  • Structure of the genetic code: DNA encodes amino acids using three-base codons, giving 4³ = 64 possible combinations, enough to encode 20 amino acids plus stop signals, with significant redundancy. This redundancy buffers against point mutations. Freeland & Hurst showed that the natural genetic code is extraordinarily efficient at minimizing the effects of errors, outperforming all but 1 in a million randomly generated alternative codes. This suggests there was little evolutionary pressure to expand the code further.
  • Evolutionary freezing: The genetic code is nearly universal across all life, suggesting the 20 amino acids were locked in over 3.5 billion years ago. Francis Crick’s “Frozen Accident” hypothesis. It simply became too deeply embedded in all biological machinery to change. Any mutation introducing a new amino acid would disrupt existing protein synthesis across all organisms, making such a change evolutionarily catastrophic.

References:

Doig A. J. (2017). Frozen, but no accident - why the 20 standard amino acids were selected. The FEBS journal, 284(9), 1296–1305. https://doi.org/10.1111/febs.13982

Freeland, S. J., & Hurst, L. D. (1998). The genetic code is one in a million. Journal of molecular evolution, 47(3), 238–248. https://doi.org/10.1007/pl00006381

Kirschning A. (2022). On the Evolutionary History of the Twenty Encoded Amino Acids. Chemistry (Weinheim an der Bergstrasse, Germany), 28(55), e202201419. https://doi.org/10.1002/chem.202201419

Lawless, J. G. (1973). Amino acids in the Murchison meteorite. Geochimica et Cosmochimica Acta, 37(9), 2207–2212. https://doi.org/10.1016/0016-7037(73)90017-3

4. Can you make other non-natural amino acids? Design some new amino acids.

H₂N — CH — COOH
         |
         R (side chain)

NH₂ → amino group (contains nitrogen)

COOH → carboxyl group (contains carbon and oxygen)

R → side chain is what makes each amino acid unique

I would try using ChemDraw to draw the amino acid backbone and explore other side chain molecules Copy the SIMLES

5. Where did amino acids come from before enzymes that make them, and before life started?

Before life existed, amino acids arose spontaneously from simple chemistry through three main pathways.

  • Atmospheric Chemistry (Miller-Urey): The simplest explanation is that amino acids formed directly from gases present in early Earth’s atmosphere. The Miller-Urey experiment simulated early Earth conditions using methane, ammonia, hydrogen, and water with electrical sparks mimicking lightning, and produced amino acids. SCIRP This demonstrated that no biology is needed — just simple molecules and energy. The key reaction pathway is the Strecker synthesis: an aldehyde, ammonia, and HCN react to produce a simple amino acid. SCIRP All three molecules were abundant on early Earth.
  • Hydrothermal Vents: Deep sea hydrothermal vents provide another plausible source. These environments — along with meteorite delivery — represent the two other major proposed sources of prebiotic amino acid synthesis beyond atmospheric chemistry. Springer The heat, pressure, and chemical gradients at vents can drive spontaneous synthesis of organic molecules including amino acids.
  • Extraterrestrial Delivery: Amino acids didn’t only form on Earth, they arrived from space. The Murchison meteorite, which fell in Australia in 1969, was found to contain extraterrestrial amino acids and hydrocarbons. Wikipedia This confirms that amino acid synthesis occurs across the universe and that early Earth was seeded from space.

References

Lawless, J. G. (1973). Amino acids in the Murchison meteorite. Geochimica et Cosmochimica Acta, 37(9), 2207–2212. https://doi.org/10.1016/0016-7037(73)90017-3

MILLER S. L. (1953). A production of amino acids under possible primitive earth conditions. Science (New York, N.Y.), 117(3046), 528–529. https://doi.org/10.1126/science.117.3046.528

Zhang, X., Tian, G., Gao, J. et al. Prebiotic Synthesis of Glycine from Ethanolamine in Simulated Archean Alkaline Hydrothermal Vents. Orig Life Evol Biosph 47, 413–425 (2017). https://doi.org/10.1007/s11084-016-9520-3

6. If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?

A D-amino acid α-helix would be left-handed.

Natural proteins use L-amino acids, which form right-handed α-helices. D-amino acids are exact mirror images of L-amino acids, so they twist in the opposite direction, producing a left-handed helix.

This is dictated by the Ramachandran plot, D-amino acids cannot adopt the backbone angles required for a right-handed helix without steric clashes, so they naturally favor the mirrored, left-handed conformation.

7. Can you discover additional helices in proteins?

Yes — beyond the familiar α-helix, several other helices exist in proteins, and new ones can still be discovered.

Known helices: The type of helix is determined by its hydrogen bonding pattern — specifically how many residues apart the donor and acceptor atoms are:

HelixH-bondResidues/turn
3₁₀n → n+33.0
αn → n+43.6
πn → n+54.4
PPIInone3.0

The α-helix dominates because it sits in a stability “sweet spot” for L-amino acids. The 3₁₀ helix is the second most common, often appearing at the ends of α-helices. The π-helix was long considered rare, but π-helices are found in 15% of known protein structures and are believed to be an evolutionary adaptation derived by the insertion of a single amino acid into an α-helix. The PPII helix is unique, it is a well-defined regular structure devoid of main-chain hydrogen bonds, stabilized instead primarily by water, and is functionally important despite being largely unclassified in standard databases.

Can we discover new helices? Yes, through three approaches:

  • Mining existing structural databases
  • Non-natural amino acids
  • Extremophile proteins

8. Why are most molecular helices right-handed?

Most molecular helices are right-handed because of the chirality of life’s building blocks, L-amino acids in proteins and D-sugars in DNA.. L-amino acids naturally adopt backbone angles (φ, ψ) that favor right-handed conformations on the Ramachandran plot. The right-handed helix is more stable by about 1 kcal/mol per residue, largely due to favorable hydrogen-bond interactions.

But the reason why amino acids L in the first place remains an open question. Leading hypotheses include:

  • Circularly polarized light from neutron stars preferentially destroying D-amino acids in space
  • Weak nuclear force creating a slight energy bias toward L-amino acids
  • Meteoritic amino acids showing an excess of left-handed chirality, possibly due to exposure to polarized light in space

Once life committed to L-amino acids, right-handed helices became inevitable. Strong cooperativity effects mean that L-amino acids favor right-handed helices, while D-sugars in DNA also favor right-handed conformations. Chirality is self-reinforcing across all biological molecules.

9. Why do β-sheets tend to aggregate? What is the driving force for β-sheet aggregation?

10. . Why do many amyloid diseases form β-sheets? Can you use amyloid β-sheets as materials?

11. Design a β-sheet motif that forms a well-ordered structure.


Part B: Protein Analysis and Visualization

In this part of the homework, you will be using online resources and 3D visualization software to answer questions about proteins. Pick any protein (from any organism) of your interest that has a 3D structure and answer the following questions:

Briefly describe the protein you selected and why you selected it. Identify the amino acid sequence of your protein. How long is it? What is the most frequent amino acid? You can use this Colab notebook to count the frequency of amino acids. How many protein sequence homologs are there for your protein? Hint: Use Uniprot’s BLAST tool to search for homologs. Does your protein belong to any protein family? Identify the structure page of your protein in RCSB When was the structure solved? Is it a good quality structure? Good quality structure is the one with good resolution. Smaller the better (Resolution: 2.70 Å) Are there any other molecules in the solved structure apart from protein? Does your protein belong to any structure classification family? Open the structure of your protein in any 3D molecule visualization software: PyMol Tutorial Here (hint: ChatGPT is good at PyMol commands) Visualize the protein as “cartoon”, “ribbon” and “ball and stick”. Color the protein by secondary structure. Does it have more helices or sheets? Color the protein by residue type. What can you tell about the distribution of hydrophobic vs hydrophilic residues? Visualize the surface of the protein. Does it have any “holes” (aka binding pockets)?


Part C. Using ML-Based Protein Design Tools

In this section, we will learn about the capabilities of modern protein AI models and test some of them in your chosen protein.

Copy the HTGAA_ProteinDesign2026.ipynb notebook and set up a colab instance with GPU. Choose your favorite protein from the PDB.

We will now try multiple things in the three sections below; report each of these results in your homework writeup on your HTGAA website:

C1. Protein Language Modeling

  1. Deep Mutational Scans Use ESM2 to generate an unsupervised deep mutational scan of your protein based on language model likelihoods. Can you explain any particular pattern? (choose a residue and a mutation that stands out) (Bonus) Find sequences for which we have experimental scans, and compare the prediction of the language model to experiment.
  2. Latent Space Analysis Use the provided sequence dataset to embed proteins in reduced dimensionality. Analyze the different formed neighborhoods: do they approximate similar proteins? Place your protein in the resulting map and explain its position and similarity to its neighbors.

C2. Protein Folding Folding a protein Fold your protein with ESMFold. Do the predicted coordinates match your original structure? Try changing the sequence, first try some mutations, then large segments. Is your protein structure resilient to mutations?

C3. Protein Generation Inverse-Folding a protein: Let’s now use the backbone of your chosen PDB to propose sequence candidates via ProteinMPNN Analyze the predicted sequence probabilities and compare the predicted sequence vs the original one. Input this sequence into ESMFold and compare the predicted structure to your original.


Part D. Group Brainstorm on Bacteriophage Engineering

Week 5 HW: Protein Design Part II

Subsections of Labs

Week 1 Lab: Pipetting

cover image cover image

Subsections of Projects

Individual Final Project

cover image cover image

Group Final Project

cover image cover image