In recitation, we discussed that you will pick a protein for your homework that you find interesting. Which protein have you chosen and why? Using one of the tools described in recitation (NCBI, UniProt, google), obtain the protein sequence for the protein you chose
The protein I have chosen for the homework is PETase (poly(ethylene terephthalate) hydrolase) from the bacterium Piscinibacter sakaiensis (previously known as Ideonella sakaiensis).
I find this protein particularly interesting because it represents a breakthrough in addressing one of the world’s major environmental challenges: plastic pollution. PETase is an enzyme that can break down polyethylene terephthalate (PET), a common plastic used in bottles, packaging, and textiles. Discovered in a bacterium isolated from plastic waste, PETase enables the microbe to use PET as a carbon and energy source by hydrolyzing its ester bonds. This natural biological degradation process offers hope for sustainable recycling and bioremediation of plastics, unlike traditional mechanical or chemical methods that are energy-intensive or produce pollutants. The enzyme’s specificity for PET and its activity at relatively mild temperatures also make it exciting for potential biotechnological applications, such as engineered variants for industrial plastic breakdown.
Using UniProt (one of the tools mentioned in recitation for protein information), I retrieved the protein sequence for PETase from Piscinibacter sakaiensis. The UniProt accession is A0A0K8P6T7, and here is the full amino acid sequence (290 residues):
3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.
The Central Dogma discussed in class and recitation describes the process in which DNA sequence becomes transcribed and translated into protein. The Central Dogma gives us the framework to work backward from a given protein sequence and infer the DNA sequence that the protein is derived from. Using one of the tools discussed in class, NCBI or online tools (google “reverse translation tools”), determine the nucleotide sequence that corresponds to the protein sequence you chose above.
Once the nucleotide sequence of your protein is determined, you need to codon optimize your sequence. You may, once again, utilize Google for a “codon optimization tool”.
In your own words, describe why do you need to optimize codon usage. Which organism have you chose to optimize the codon sequence for and why?
Optimization is vital to achieve improvements in protein synthesis efficiency, either in terms of stability, structure, and speed of the processes. This is achieved by employing specific codons that are preferred by the organism of interest. This translates into increased protein expression.
In this case, I selected Escherichia coli , one of the model organisms in protein production in biotechnology. The preference is associated with the ease of manipulation of its genes and rapid proliferation/growth as it is an organism that is not very demanding in terms of conditions. This makes it an ideal organism for this type of experiments.
3.4. You have a sequence! Now what?
What technologies could be used to produce this protein from your DNA? Describe in your words the DNA sequence can be transcribed and translated into your protein. You may describe either cell-dependent or cell-free methods, or both.
In this case, it is possible to use both methods:
Cell-free methods: based on the use of cell extracts or synthetic compounds with the ability to perform translation and transcription by having the respective machinery (ribosomes, RNA polymerase, etc.), without the need for living cells. These are usually encapsulated in cell-free protein synthesis systems (CFPs), capable of producing proteins that are collected directly. An example of this is through the use of a system that incorporates the preparation of a bacterial lysate and encapsulation in vesicles. There are also commercial CFPs kits that could be used to produce a protein of interest.
Cell-dependent methods: based on the use of live cells, in this case it is possible to work with plasmids for the production of recombinant proteins in E. coli . One of the most widely used series in recent years is the pET line, allowing efficient protein translation. In these systems, the incorporated machinery of the cells is what allows these processes to be executed, and it is also necessary to have: a DNA sequence, a terminator, a regulatory sequence, ARN polymerase, enhancers, and start and termination codons, among others. In addition to the insertion of the gene or genes, it is also necessary to carry out bacterial transformation processes, induce expression, and finally extract the purified protein.
Part 4: My first Benchling plasmid 🧬
Part 5: DNA Read/Write/Edit
5.1 DNA Read
(i) What DNA would you want to sequence (e.g., read) and why? This could be DNA related to human health (e.g. genes related to disease research), environmental monitoring (e.g., sewage waste water, biodiversity analysis), and beyond (e.g. DNA data storage, biobank).
I consider that it could be of interest to work with the eae gene of the enteropathogenic pathotype of E. coli (EPEC), responsible for encoding the intimin protein, necessary for adherence to the intestinal epithelium and which causes diarrheal affections as a consequence worldwide. This could be very useful for environmental monitoring and the study of epidemiological patterns in developing countries such as Ecuador. Since it is one of the main pathogens of public health risk, sequencing is proposed as an alternative for the study in complex environments such as river waters or important sources of high contamination.
(ii) In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why? Also answer the following questions:
a. Is your method first-, second- or third-generation or other? How so?
The first-generation Sanger method is proposed for this case. It is positioned in this category as one of the first methods used in DNA sequencing in 1977. It is based on the addition of deoxynucleotides that facilitate DNA chain elongation. It is also useful in this case because of its accuracy, ease, cost, and, above all, because the size of the strand of interest is manageable for the technology (881bp).
b. What is your input? How do you prepare your input (e.g. fragmentation, adapter ligation, PCR)? List the essential steps.
Extraction of DNA from study samples (e.g. contaminated water). The use of an extraction kit is suggested to ensure higher purity of the sample and avoid other contaminants.
Performing a conventional PCR to obtain an adequate amount of the fragment, ensuring that it is in a pure form. Only PCR conventional components are required as normal nucleotides (dNTPs) and a thermostable DNA polymerase (Taq polymerase).
c. What are the essential steps of your chosen sequencing technology, how does it decode the bases of your DNA sample (base calling)?
For Sanger sequencing the DNA obtained from PCR is mixed with other reagents: nucleotides (dNTPs) and other special nucleotides that are fluorescently labeled (ddNTPs).
The polymerase then synthesizes a new strand and when a ddNTP is added, the process is stopped, resulting in fragments of different lengths.
These fragments are separated in a capillary electrophoresis process where the shorter fragments migrate faster and in turn, the fragments are excited by a laser which emits a specific signal for each fragment.
These signals can then be recorded by a detector and translated into a nucleotide sequence.
d. What is the output of your chosen sequencing technology?
The method generates an electropherogram, which is a graph showing the fluorescence peaks corresponding to each nucleotide in the DNA sequence. Where each color represents a specific base (A, T, C, G).
5.2 DNA Write
(i) What DNA would you want to synthesize (e.g., write) and why? These could be individual genes, clusters of genes or genetic circuits, whole genomes, and beyond. As described in class thus far, applications could range from therapeutics and drug discovery (e.g., mRNA vaccines and therapies) to novel biomaterials (e.g. structural proteins), to sensors (e.g., genetic circuits for sensing and responding to inflammation, environmental stimuli, etc.), to art (DNA origamis). If possible, include the specific genetic sequence(s) of what you would like to synthesize! You will have the opportunity to actually have Twist synthesize these DNA constructs! :)
For this section, I would be interested in synthesizing DNA associated with Shiga toxin as the Stx2 responsible for multiple outbreaks at the global level and the cause of hemolytic uremic syndrome. This toxin is usually produced by serotypes of pathogenic E. coli ( STEC), so its synthesis could be of interest in the development of recombinant vaccines, by obtaining attenuated antigens.
(ii) What technology or technologies would you use to perform this DNA synthesis and why? Also, answer the following questions:
I would make use of the Gibson Assembly technology because it is highly accurate and efficient compared to others such as Golden Gate, and I consider this to be essential in vaccine development. In addition, it is sufficiently suitable for the assembly of a plasmid with an attenuated version of the toxin and is flexible in case modifications are necessary to improve the immune response.
What are the essential steps of your chosen sequencing methods?
In the first instance, it is necessary to synthesize or amplify an attenuated version of the protein (toxin) of interest. This means removing the domains or parts associated with toxicity but retaining the elements that activate the immune response in patient’s body. This gene can be obtained by PCR and must have overlapping ends that match the plasmid where the insertion will be made.
The plasmid to be used is also pre-designed and linearized to facilitate insertion.
The next step is the assembly, which consists of mixing these components in a tube with Gibson’s mix containing: exonuclease responsible for generating the overlapping ends, polymerase that fills these spaces, and ligase that joins these fragments.
Finally, the next step is the transformation of the organism chosen, in this case, E. coli, by the addition of this recombinant plasmid.
b. What are the limitations of your sequencing method (if any) in terms of speed, accuracy, and scalability?
Among the limitations of this method are the possible formation of secondary structures and the need for long overlapping sequences which could lead to complications in the design and synthesis. The cost could also be relatively high compared to the other alternatives.
5.3 DNA Edit.
(i) What DNA would you want to edit and why? In class, George shared a variety of ways to edit the genes and genomes of humans and other organisms. Such DNA editing technologies have profound implications for human health, development, and even human longevity and human augmentation. DNA editing is also already commonly leveraged for flora and fauna, for example in nature conservation efforts, (animal/plant restoration, de-extinction), or in agriculture (e.g. plant breeding, nitrogen fixation). What kinds of edits might you want to make to DNA (e.g., human genomes and beyond) and why?
For this part of the paper, I would again bring up the idea of modifying the genes of plants that are subject to desiccation problems such as bananas. I believe that the agricultural sector in countries like Ecuador has great potential to test these technologies and improve yield and productivity levels.
(ii) What technology or technologies would you use to perform these DNA edits and why? Also answer the following questions:
How does your technology of choice edit DNA? What are the essential steps?
It starts with the design of the construct of interest, in this case consisting of the DREB1A gene, which is inserted into an expression vector together with its promoter.
This vector is then introduced into A. tumefaciens and the plants of interest are infected in an in vitro culture, which will allow the integration of the gene of interest. The principle of this technology is based on the ability of this bacterium to transfer DNA to other cells, using its Ti plasmid in which the region associated with the tumors is replaced by the region of interest. Thus, when this bacterium infects plant tissue, this genetic alteration is also transferred.
Subsequently, the plants that have been transformed correctly are selected, this can be through a fluorescent marker such as GFP.
Additionally, expression tests can be performed by RT-qPCR, and lastly, the regeneration and re-planting of the culture of interest is performed.
b. What preparation do you need to do (e.g. design steps) and what is the input (e.g. DNA template, enzymes, plasmids, primers, guides, cells) for the editing?
This process requires the selected gene of interest, a suitable vector compatible with A. tumefaciens including a promoter, terminator, and selection marker. Also, designed primers, restriction enzymes, ligases, culture media, and growth hormones.
c. What are the limitations of your editing methods (if any) in terms of efficiency or precision?
The main limitations revolve around the efficacy of the transformation because it is subject to a process of transgenesis, which could compromise the specificity and accuracy of the editing. In addition to possible unwanted adverse effects due to random insertions.
1. Published Paper Using Opentrons for Novel Biological Applications
One compelling example is the paper “Semi-automated Production of Cell-Free Biosensors” by Dylan M. Brown, Daniel A. Phillips, and colleagues (bioRxiv preprint October 13, 2024; formally published in ACS Synthetic Biology, 2025).
The team used the affordable Opentrons OT-2 liquid-handling robot to scale up manufacturing of cell-free synthetic biology biosensors for point-of-need diagnostics (e.g., detecting fluoride in drinking water). They developed a semi-automated protocol that precisely assembles viscous cell-free reaction mixes (DNA template + PANOx extract + buffers) into full 384-well plates in ~30 minutes—something that was previously done manually with high operator-to-operator variability.
Key novel application: They created and lyophilized hundreds of identical fluoride-riboswitch biosensors that can be rehydrated in the field and give a clear colorimetric or fluorescent readout. By optimizing robot parameters (dispense height, mix volume, aspiration rate), they achieved reproducibility that matched or exceeded manual assembly while drastically reducing hands-on time and batch-to-batch variation. This opens the door to cheap, deployable diagnostics in low-resource settings (they reference prior field tests in Kenya and Costa Rica). The work is especially elegant because it shows how open-source automation turns cell-free systems from lab curiosities into manufacturable products—exactly the kind of scalability we need in synthetic biology.
2. What I Intend to Do with Automation Tools for My Final Project
Project Title: Microbial “Plastic Eaters” – Engineering On-Site Industrial Recycling Pods with Recombinant PETase/MHETase in a Cell-Free + Bacterial Pipeline
My final project builds a portable “recycling pod” that uses engineered bacteria (or their secreted enzymes) to break down PET plastic waste directly on factory floors. The bottleneck is rapid optimization of PETase and MHETase variants for faster degradation, higher temperature tolerance, and better secretion. Automation will let me screen dozens-to-hundreds of variants in parallel, run degradation assays remotely, and iterate in days instead of weeks.
Here is exactly what I plan to automate:
A. High-Throughput Variant Library Assembly & Cell-Free Expression Screening (Primary automation goal – inspired by the cell-free biosensor paper above)
Opentrons OT-2 (or cloud lab equivalent) will perform Golden Gate assembly of PETase mutant libraries (active-site saturation + secretion-signal variants).
Echo transfer or Opentrons p20 multi-channel will dispense 50–100 ng of each linearized plasmid + cofactors into 96-well or 384-well plates.
Bravo / Opentrons stamps in the cell-free protein synthesis (CFPS) master mix (E. coli lysate + energy components).
Multiflo dispenses the full reaction volume to start expression.
PlateLoc seals the plate.
Inheco or Opentrons temperature module incubates at 30 °C / 37 °C for 4–16 h.
XPeel removes seal.
PHERAstar or plate reader measures either (a) fluorescence (GFP-fused PETase) or (b) enzymatic activity via p-nitrophenyl ester surrogate substrate at 405 nm.
Pseudocode / Opentrons Python sketch:
fromopentronsimportprotocol_apimetadata={'apiLevel':'2.15'}defrun(protocol:protocol_api.ProtocolContext):# Labwaretiprack=protocol.load_labware('opentrons_96_tiprack_20ul',1)source_plate=protocol.load_labware('nest_96_wellplate_200ul_flat',2)# DNA variantscfps_plate=protocol.load_labware('nest_96_wellplate_200ul_flat',3)temp_module=protocol.load_module('temperature module gen2',4)temp_module.set_temperature(30)p20=protocol.load_instrument('p20_multi_gen2','left',tip_racks=[tiprack])# Step 1: Transfer DNA variantsforcolinrange(8):# 8 columns = 96 variantsp20.pick_up_tip()p20.transfer(2,source_plate.columns()[col],cfps_plate.columns()[col],mix_after=(3,10))p20.drop_tip()# Step 2: Add CFPS master mix (multi-channel)p20.pick_up_tip()p20.distribute(18,master_mix_reservoir,cfps_plate.wells(),disposal_volume=5)p20.drop_tip()# Incubate & read laterprotocol.pause("Incubate 6 h at 30 °C")
B. 3D-Printed Custom Holders (from Opentrons 3D Printing Directory style) I will design and print (using the class Prusa or lab printer) a PET-flake assay tray: a 96-well-compatible holder that securely positions 5 mm × 5 mm shredded PET flakes or thin PET film strips at the bottom of each well. The holder has sloped walls and a mesh bottom so supernatant can be easily aspirated for downstream HPLC or weight-loss measurements without losing plastic particles. This turns a messy manual assay into a clean, robot-friendly 96-well format.
C. Cloud-Lab Integration (Ginkgo Nebula / similar remote biofoundry) Once top variants are identified on the Opentrons, I will upload the best 10–20 constructs to Ginkgo Nebula (or equivalent cloud laboratory) for larger-scale bacterial expression and real PET degradation in 1 L bioreactors. The cloud lab will:
Run parallel fermentations with automated sampling.
Perform continuous OD600, pH, and TPA/EG monomer quantification via inline HPLC.
Return lyophilized enzyme powders ready for pod prototyping.
D. Full Degradation Validation Loop After cell-free hits, Opentrons will set up 24–48 replicate mini-reactions with purified enzyme + real factory PET scraps, incubate with shaking, and automatically sample at 0/24/48/72 h for mass-loss and LC-MS readout. This closed loop (design → assemble → express → assay → analyze) will run with minimal intervention, letting me test 50+ variants per week.
By combining the Opentrons for precision liquid handling, 3D-printed custom labware for PET-specific assays, and cloud-lab scale-up, I will move from gene sequence to validated high-performance enzyme cocktail in a matter of weeks—exactly what an industrial recycling pod needs. This automation plan directly mirrors the cell-free biosensor paper’s success in scaling reproducible reactions and will make my project robust, repeatable, and genuinely ready for Lagos factory floors.
Week 1: Principles & Practices
About Me
My name is Peter Olawumi, and I’m based in Ibadan, Nigeria. As a software developer with the handle @dev_roc, I’m passionate about bridging technology and biology to create innovative, accessible solutions for real-world problems, especially in the Global South. Joining HTGAA is an exciting opportunity to explore synthetic biology and apply it to challenges like waste management in our growing industrial sectors.
Proposed Biological Engineering Application or Tool
I propose developing microbial “Plastic Eater” pods for on-site industrial recycling. These are compact, factory-floor bioreactors using engineered bacteria to break down PET plastic waste into reusable monomers.
Why this? In bustling manufacturing plants in Lagos and Ibadan, discarded PET bottles and packaging pile up daily, leading to costly hauling, environmental pollution, and health risks from microplastics. Traditional recycling is energy-intensive and inefficient, with global rates at just 18%. In Nigeria, informal recycling dominates but lags in efficiency. My tool would be a lunchbox-sized pod that processes 500g-1kg of PET scraps per cycle at ambient temperatures, yielding 80-90% monomer recovery (terephthalic acid and ethylene glycol) for repolymerization or new chemicals. It’s low-energy, scalable, and deployable without shipping, inspired by natural degraders like Ideonella sakaiensis, supercharged with synthetic biology for faster action.
The core: Engineer Ideonella sakaiensis or a surrogate like Pseudomonas putida with optimized PETase and MHETase enzymes, fused to secretion signals and reporters for efficiency. This could cut waste transport emissions by 40%, create bio-recycling jobs, and align with UN SDG 12 for sustainable consumption.
Governance/Policy Goals
To ensure this tool contributes to an ethical future, I focus on non-malfeasance (preventing harm). I’ve adapted the synthetic genomics framework for safety/security and equity.
Goal 1: Biosafety Lockdown – Prevent Unintended Microbial Escapes and Toxicity This goal contains recombinant strains to avoid ecological disruptions, like outcompeting native microbes or leaching toxins in biodiverse areas like Lagos lagoons.
Sub-goal 1a: Engineered Containment Mechanisms – Integrate two orthogonal kill switches (e.g., mazEF toxin-antitoxin and light-inducible CRISPRi) in plasmids. Validate with in vitro escape assays (>99.99% die-off in 48 hours via qPCR).
Sub-goal 1c: Toxicity Profiling for Byproducts and Enzymes – Conduct assays on outputs (Ames test for genotoxicity <2x induction; yeast screen for endocrine disruption EC50 >100μM). Cap enzyme secretion to avoid risks.
Goal 2: Equitable Deployment – Ensure Broad Access Without Widening Industrial Divides This prevents social harms like job displacement, promoting inclusive scaling inspired by the African Union’s biotech equity charter.
Sub-goal 2a: Open-Source IP and Tech Transfer – Classify designs as Creative Commons (CC-BY-SA) for non-commercial use in developing economies. Host on iGEM registry with modular parts for local adaptations.
Sub-goal 2b: Socio-Economic Impact Audits – Use agent-based modeling (NetLogo) to forecast job shifts (e.g., aim for Gini coefficient drop <0.1). Include community “right-to-reject” via town halls (>60% approval).
Sub-goal 2c: Adaptive Monitoring for Long-Term Equity – Integrate IoT sensors into pods for blockchain-ledger yield tracking (70% monomer value back to operators). Cap market share (<30%) to avoid over-reliance.
Governance Actions
I’ve outlined three actions: a regulatory rule, an incentive program, and a technical strategy, involving different actors. Analogies draw from drones (certification), finance (buffers), and 3D printing (open designs).
Action 1: Mandatory Pre-Deployment “Escape-Proof” Certification (Regulatory Rule by Federal Agencies) Analogy: FAA drone certification for safe airspace.
Purpose: Current Nigerian biosafety (NBMA 2015 Act) is ad-hoc, risking spills. Propose standardized “synbio passport” with <0.01% escape risk proven via simulations, shifting to proactive approvals.
Design: Amend Biosafety Regulations (2020) for dossiers (COPASI models, assays, audits). Actors: NBMA approves (6-month review); companies fund (₦500k-1M, offset by permits); academics validate. Use open API for data.
Assumptions: Regulators have capacity (50+ assessors); models translate to real-world (e.g., floods); industry complies without loopholes.
Action 2: “Green Pod” Subsidy Incentives with Equity Audits (Incentive Program by Industry-Academia Consortia) Analogy: Basel III capital buffers for financial resilience.
Purpose: Factories prioritize profits over equity; propose 40% tax credits for adopters passing audits (30% revenue shared with informal sectors), shifting to impact investing.
Design: Co-designed by MAN/universities, funded by 1% levy (₦10B pot). Actors: Companies self-audit (NetLogo); consortia approve; NGOs monitor. Use blockchain for payouts; train 1k workers/year.
Assumptions: Big firms lead (70% pilot adoption); audits capture nuances; economic stability holds.
Action 3: Open-Source “Watchdog” Microbial Sentinel Network (Technical Strategy by Academic Researchers) Analogy: Thingiverse for 3D printing with safety mods.
Purpose: Fragmented tracking leaves surveillance gaps; propose free platform with sentinel kits (qPCR for HGT) for crowdsourced monitoring, shifting to community-driven oversight.
Design: Led by UNILAG/iGEM Africa with $500k grants. Actors: Researchers upload (CC-BY); factories deploy ($50/unit); NBMA integrates. Use Raspberry Pi/ML for alerts; beta in HTGAA, then 100-node pilot.
Assumptions: Open-source thrives (1k contributors); low-tech adoption; data privacy holds.
Using an adapted rubric (1 = best/strong positive, 3 = weak/neutral, n/a = not applicable):
Does the option:
Action 1
Action 2
Action 3
Enhance Biosecurity
• By preventing incidents
1
2
1
• By helping respond
2
3
1
Foster Lab Safety
• By preventing incidents
1
n/a
2
• By helping respond
2
n/a
1
Protect the Environment
• By preventing incidents
1
2
2
• By helping respond
2
3
1
Promote Equity
• By ensuring access
3
1
2
• By minimizing divides
3
1
2
Other Considerations
• Minimize costs/burdens
2
1
1
• Feasibility
2
2
1
• Not impede research
3
2
1
• Promote constructive apps
2
1
2
Explanation: Action 1 excels in prevention but burdens innovation (higher costs). Action 2 boosts equity and feasibility via incentives but weaker on direct security. Action 3 is feasible and responsive but risks privacy issues.
Prioritization and Trade-offs
I prioritize a combination of Action 2 (incentives) and Action 3 (sentinel network), starting with academics and industry consortia, targeted at national audiences like Nigeria’s Ministry of Science & Technology and international like the African Union. Why? This balances proactive equity (Action 2’s audits prevent divides) with responsive monitoring (Action 3’s crowdsourcing flags harms early), scoring well on feasibility and constructive uses without heavy regulation that could slow adoption in resource-limited settings.
Trade-offs: Incentives may increase short-term costs (levy) but yield long-term savings (20% waste reduction); open-source risks IP theft but promotes access. Assumptions: Strong community buy-in (e.g., 70% SME uptake); uncertainties include enforcement in informal sectors and tech literacy. If unaddressed, fall back to Action 1 for high-risk deployments.
Reflection on Class Learnings
From lectures by David Kong, George Church, and Joe Jacobson, I learned about biotech’s rapid evolution and ethical imperatives like biosecurity and equity. A new concern for me: In the Global South, unequal access could exacerbate divides—e.g., advanced tools benefiting only elites. Another: Dual-use risks, where degraders might be misused for harmful polymers.
To address: Propose mandatory equity clauses in grants (e.g., 20% project budget for community training) and international standards for dual-use reviews (adapt WHO guidelines). This ties to my project, emphasizing open designs with built-in safeties.
Lecture 2 Preparation – Homework Answers
For Professor Jacobson Lecture
Error Rate of Polymerase
The error rate of nature’s DNA polymerase (specifically, error-correcting polymerase in biological synthesis) is approximately 1 error per 10⁹ (1 billion) base pairs added.
The human genome is roughly 3 × 10⁹ (3 billion) base pairs long. This means that, on average, DNA replication of the entire human genome would introduce about 3 errors per replication cycle if relying solely on this error rate.
Biology addresses this discrepancy through multiple layers of error correction and repair mechanisms beyond the base polymerase error rate. These include:
Built-in proofreading via 3’–5’ exonuclease activity in the polymerase itself, which immediately detects and corrects mismatches during synthesis.
Post-replication mismatch repair systems that scan for and fix errors shortly after replication.
Additional DNA repair pathways (e.g., base excision repair, nucleotide excision repair, and double-strand break repair) that operate continuously to detect and correct damage from replication errors, environmental factors, or spontaneous mutations.
These combined mechanisms can reduce the effective mutation rate to as low as 10⁻¹⁰ per base pair in vivo, ensuring genome stability across cell divisions.
Number of Ways to Code for an Average Human Protein
An average human protein is encoded by approximately 1036 base pairs of DNA, corresponding to about 345 amino acids (since each amino acid is coded by a 3-base codon, or triplet).
The genetic code uses 64 possible codons (4³) to specify 20 amino acids and 3 stop signals. Excluding stop codons, there are 61 codons for the 20 amino acids, yielding an average degeneracy of about 3.05 codons per amino acid.
For a specific protein sequence of 345 amino acids, the total number of different DNA nucleotide sequences (coding sequences) that could translate to the exact same amino acid sequence is enormous — on the order of 3.05³⁴⁵ ≈ 10¹⁶⁷.
In practice, not all of these theoretically possible coding sequences work effectively to produce the protein of interest (especially in the context of gene synthesis and expression). Important limiting factors include:
Codon usage bias — different organisms prefer certain synonymous codons due to tRNA abundance
mRNA secondary structure and stability (hairpins, degradation signals)
Synthesis errors — chemical DNA synthesis has higher error rates (~1:10² per base)
Regulatory constraints (e.g., in recoded organisms with codon reassignment)
Functional impacts of synonymous changes on folding, translation kinetics, and expression levels
For these reasons, synthetic genes are usually designed with a subset of “optimal” codons rather than exploring the full theoretical space.
For Dr. LeProust Lecture
Most Commonly Used Method for Oligo Synthesis Currently
The most commonly used method for oligonucleotide (oligo) synthesis is solid-phase phosphoramidite chemistry.
This involves a cyclic process on a solid support (controlled pore glass or silicon-based chips, as used by Twist Bioscience):
Coupling — DMT-protected phosphoramidite monomer is added to the growing chain
Capping — Unreacted sites are capped to prevent further extension
Oxidation — Phosphite linkage is oxidized to a stable phosphate
Deblocking — DMT group is removed to allow the next coupling
This method, developed in the early 1980s, remains the industry standard for automated, high-throughput oligo synthesis.
Why It Is Difficult to Make Oligos Longer Than 200 nt Via Direct Synthesis
Direct chemical synthesis of oligos longer than ~200 nucleotides is challenging primarily due to the limitations of coupling efficiency in phosphoramidite chemistry (typically 98–99% per step).
For a 200 nt oligo, theoretical yield of full-length product is approximately (0.99)¹⁹⁹ ≈ 13%, but in practice it is significantly lower due to accumulating side reactions such as:
Depurination (acid-induced base loss)
Incomplete deprotection
Branching and other side products
These issues cause exponential yield drop and increasing error accumulation (deletions, insertions, substitutions), making purification of full-length, error-free products very difficult beyond ~200 nt.
While advanced platforms (e.g. Twist Bioscience) have improved chemistry to routinely reach ~350 nt and demonstrated ~700 nt experimentally (with ~97% full-length material), these are not standard for direct synthesis beyond 200 nt.
Why You Can’t Make a 2000 bp Gene Via Direct Oligo Synthesis
A 2000 base pair gene cannot be made via direct oligo synthesis because current chemical methods are fundamentally limited in length (routine max ~350 nt, experimental ~700 nt).
Attempting 2000 bp directly would result in near-zero yield due to:
Extremely low coupling efficiency over thousands of steps → theoretical yield (0.99)¹⁹⁹⁹ ≈ 10⁻⁹ (practically nonexistent)
Massive accumulation of chemical errors (depurination, oxidation byproducts, etc.)
Impractical purification at that scale
Instead, genes of this length are constructed by assembling multiple shorter oligos (typically 50–300 nt) using enzymatic methods such as:
Gibson assembly
Enzymatic assembly platforms (e.g. Twist HELIX2)
Followed by cloning, error correction, and verification via long-read sequencing
This modular approach overcomes the direct synthesis length barrier.
For George Church Lecture
Suggested Code for AA:AA Interactions
For AA:AA (amino acid–amino acid) interactions in proteins — which enable folding, oligomerization, and interfaces (analogous to NA:NA basepairing or AA:NA ribosomal translation) — I suggest a Side Chain Complementarity Code based on physicochemical properties of amino acid side chains.
This probabilistic code categorizes preferred pairings:
Hydrophobic–Hydrophobic — van der Waals forces (e.g. Leu ↔ Ile, Val ↔ Phe) → core stabilization, coiled-coils, β-sheets
Special / covalent — disulfide bonds (Cys ↔ Cys), metal coordination (e.g. His ↔ His via Zn²⁺)
This framework aligns with natural protein interaction rules and could be extended for synthetic biology applications, e.g. incorporating non-standard amino acids to create novel interaction pairs.