Week 1 HW: Principles and Practices

image.png image.png

First, describe a biological engineering application or tool you want to develop and why. This could be inspired by an idea for your HTGAA class project and/or something for which you are already doing in your research, or something you are just curious about.

I’m excited to develop AstroMicrobes AI, a computational platform that serves as a biological engineering tool for comparative genomics and synthetic design. This idea is inspired by my research interests in astrobiology and synthetic biology, where I’ve been exploring how microbial evolution in extreme environments (like space) can inform bioengineering solutions. It’s also driven by curiosity about bridging dry-lab computational tools with wet-lab-free innovations.

image.png image.png Figure 1: Extremophiles in Space Exploration | Indian Journal of Microbiology | Springer Nature Link

What is Project-AstroMicrobes? Project-AstroMicrobes is a web-based computational platform that analyzes microbial genomes from space and Earth environments to detect mutations, predict their biological impacts, assess risks, and propose engineered solutions. At its core, it’s a biological engineering application that combines bioinformatics, and synthetic biology design tools to create actionable insights without requiring a physical lab (dry-lab only). Users upload genome sequences (e.g., FASTA files), and the platform processes them through a 6-step workflow.

  • Input: Upload space microbe sequences (e.g., from NASA GeneLab) and Earth counterparts (e.g., from NCBI).
  • Comparison: Align sequences and detect mutations (e.g., SNPs, insertions/deletions) using tools like BLAST or MAFFT.
  • Functional Prediction: Use AI models (e.g., trained on protein databases) to predict changes in protein structure and function.
  • Risk Scoring: Generate quantitative scores for adaptation trends, resistance potential, and pathogenicity risks.
  • Drug Target Mapping: Identify altered proteins and suggest potential drug targets by cross-referencing with databases like DrugBank.
  • Synthetic Design Suggestions: Propose engineered genetic constructs (e.g., CRISPR edits or synthetic promoters) to mitigate harmful traits or enhance beneficial ones, with computational simulations for efficacy.
  • The output includes visualizations (e.g., mutation maps), risk reports, and downloadable design blueprints (e.g., in GenBank format) for lab validation by collaborators.

Why Do I Want to Develop This Tool? Microbes in space undergo rapid mutations due to radiation and microgravity, which can lead to adaptations like increased antibiotic resistance or pathogenicity. Traditional genomics tools (e.g., for Earth-based pathogens) don’t account for these space-specific factors. Project - AstroMicrobes fills this by enabling comparative analysis, helping bioengineers design safer microbes for space missions (e.g., planetary protection) or Earth applications (e.g., combating antibiotic resistance). It’s inspired by my curiosity about how space data can “hack” evolution turning observational biology into proactive engineering.

Next, describe one or more governance/policy goals related to ensuring that this application or tool contributes to an “ethical” future, like ensuring non-malfeasance (preventing harm). Break big goals down into two or more specific sub-goals. Below is one example framework (developed in the context of synthetic genomics) you can choose to use or adapt, or you can develop your own. The example was developed to consider policy goals of ensuring safety and security, alongside other goals, like promoting constructive uses, but you could propose other goals for example, those relating to equity or autonomy.

  1. Ensuring Safety and Security This big goal focuses on non-malfeasance by safeguarding against biological hazards, especially in space missions where engineered microbes could pose planetary contamination risks.

Sub-Goal 1: Implement Mandatory Biosafety Checks in Design Outputs. Require the platform to automatically include biosafety features (e.g., kill switches or containment sequences) in all proposed synthetic constructs, with simulations to verify stability and non-proliferation. This prevents accidental harm from lab validation of designs.

Sub-Goal 2: Integrate a user-facing “ethical review” module where designs flagged as high-risk (e.g., those enhancing pathogenicity) must be vetted by independent experts or regulatory bodies (e.g., via a pre-submission check against biosecurity databases like those from the WHO or CDC). This mitigates intentional misuse, such as weaponizing space-adapted microbes.

  1. Promoting Constructive Uses (Encouraging Beneficial Applications)

Sub-Goal 1: Prioritize Open-Source Access for Research and Education. Make core platform features freely available for academic and non-profit use, with incentives (e.g., grants) for applications in space exploration or global health. This promotes transparency and constructive collaborations, reducing the risk of proprietary misuse.

Sub-Goal 2: Develop Use-Case Guidelines and Monitoring. Create policy guidelines limiting commercial uses to ethical domains (e.g., drug discovery vs. bioweapons), with built-in tracking of design downloads and user reports. This ensures ongoing oversight, encouraging innovation while preventing diversion to harmful ends.

  1. Ensuring Equity and Autonomy (Fostering Inclusive Access and User Control)

Sub-Goal 1: Guarantee Equitable Access to Data and Tools. Policy mandates free or subsidized access for underrepresented groups (e.g., researchers in developing countries) and integration with global databases (e.g., open NCBI data), preventing knowledge gaps that could exacerbate health inequities in microbial threats.

Sub-Goal 2: Empower User Autonomy Through Transparency and Consent. Require clear disclosures on AI biases, data privacy (e.g., anonymizing uploaded genomes), and opt-in features for sharing designs. Users must affirm ethical use, fostering autonomy and reducing harm from uninformed or coerced applications.

Next, describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”). Try to outline a mix of actions (e.g. a new requirement/rule, incentive, or technical strategy) pursued by different “actors” (e.g. academic researchers, companies, federal regulators, law enforcement, etc). Draw upon your existing knowledge and a little additional digging, and feel free to use analogies to other domains (e.g. 3D printing, drones, financial systems, etc.). Purpose: What is done now and what changes are you proposing? Design: What is needed to make it “work”? (including the actor(s) involved - who must opt-in, fund, approve, or implement, etc) Assumptions: What could you have wrong (incorrect assumptions, uncertainties)? Risks of Failure & “Success”: How might this fail, including any unintended consequences of the “success” of your proposed actions?

Action 1: Regulatory Rule – Mandatory Pre-Submission Risk Assessment for High-Risk Designs

Purpose Currently, synthetic biology tools allow users to propose genetic constructs without centralized oversight, leading to potential biosecurity gaps (e.g., designs that could enhance microbial pathogenicity for space or Earth applications). I propose a new federal rule requiring users to submit high-risk design outputs (e.g., those with elevated risk scores for pathogenicity) to a regulatory body for pre-approval, similar to how the FAA requires licensing for commercial drones to prevent unauthorized flights. This change shifts from voluntary self-regulation to mandatory vetting, ensuring non-malfeasance by blocking harmful proposals before lab implementation.

Design To make this work, federal regulators would establish a centralized portal integrated with Project - AstroMicrobes. Users must opt-in by uploading flagged designs; the body would fund and staff reviewers (biologists and ethicists) to approve/reject within 30 days, with appeals processes. Companies developing the platform (e.g., a biotech firm hosting it) would implement API integrations for automatic submissions, and law enforcement could access anonymized logs for investigations. No user consent overrides the rule; it’s enforced via software locks on high-risk outputs.

Assumptions I assume regulators have the expertise and resources to handle increased workloads without delays, and that users will comply rather than circumvent via offline tools. Uncertainties include whether “high-risk” thresholds (based on AI scores) are universally agreed upon, potentially varying by jurisdiction.

Risks of Failure & Success Failure could occur if regulators are underfunded, leading to backlogs and users abandoning the platform (analogous to how drone regulations initially slowed adoption). Unintended success consequences might include over-regulation stifling innovation (e.g., benign designs delayed), or a “chilling effect” where researchers avoid space-focused projects due to bureaucracy, reducing equitable access for global teams. In extreme failure, black-market designs could emerge, worsening dual-use risks.

Action 2: Incentive – Corporate Grants for Constructive Applications (Analogous to Financial System Tax Incentives)

Purpose Today, companies in synthetic biology often prioritize profit-driven uses (e.g., proprietary drug designs), with limited incentives for ethical features. I propose companies offer grants or tax credits for verified constructive uses, akin to financial systems’ AML incentives where banks earn rewards for fraud prevention. This encourages directing the tool toward beneficial ends, promoting equity by funding underrepresented researchers in developing regions for health applications.

Design Companies would fund a matching-grant program administered through a neutral body like the Synthetic Biology Consortium. Academic researchers or startups must opt-in by submitting proposals via Project-AstroMicrobes interface, detailing how designs address ethical goals. Approval requires peer review and public reporting; law enforcement could monitor for misuse claims. Implementation involves companies integrating grant applications into the platform, with funds disbursed quarterly to successful applicants.

Assumptions I assume companies view this as a PR win (enhancing brand ethics) and that grants won’t be gamed by low-quality proposals. Uncertainties include market volatility affecting funding availability and whether incentives truly shift priorities away from commercial gains.

Risks of Failure & Success Failure might happen if grants are too small or biased toward large institutions, failing to reach equitable users and perpetuating disparities (like how financial incentives sometimes favor wealthy firms). Success could lead to unintended consequences, such as “grant farming” where researchers inflate benefits, or over-commercialization where ethical projects become profit-driven, undermining autonomy and potentially introducing biases in AI models trained on grant-funded data.

Action 3: Technical Strategy – Built-in AI Bias Audits and User Transparency Features (Analogous to 3D Printing Safety Filters)

Purpose Existing tools may harbor AI biases, with limited user awareness of limitations. I propose academic researchers develop and mandate a technical strategy where the platform includes automated bias audits and transparency dashboards, similar to 3D printing software that filters out unsafe designs (e.g., blocking gun blueprints). This ensures equity and autonomy by empowering users to make informed decisions, reducing harm from flawed predictions.

Design Academic researchers would lead development, funding open-source audits using tools like Fairlearn for AI fairness. Platform developers (companies or open-source communities) must implement this as a core feature (users opt-in to view audits and customize settings). Regulators could approve audit standards, and law enforcement access logs for forensic analysis. No opt-out for audits; it’s enforced via code updates.

Assumptions I assume researchers can access diverse datasets for audits and that users value transparency over ease of use. Uncertainties include technical feasibility and whether biases are fully detectable without human oversight.

Risks of Failure & Success Failure could result from poor audit design, leading to false positives, causing user distrust and abandonment. Successful implementation might unintentionally create over-reliance on audits, reducing user autonomy, or expose sensitive data in transparency features, inviting cyberattacks that compromise equity by targeting vulnerable users.

References

  1. Synthetic biology, security and governance - PMC - NIH (https://pmc.ncbi.nlm.nih.gov/articles/PMC7100137/)
  2. Principles & Practices Assignments (http://fab.cba.mit.edu/classes/S63.21/class_site/pages/class_1.html)
  3. Albert C. Lin Abstract Do-It-Yourself biology, 3D printing (https://papers.ssrn.com/sol3/Delivery.cfm/SSRN_ID3100387_code345050.pdf?abstractid=3100387&mirid=1)
  4. Webinar on Synthetic Biology Governance and Cooperation Opportunities - YouTube (https://www.youtube.com/watch?v=eQu4hVv0DAI)

Next, score (from 1-3 with, 1 as the best, or n/a) each of your governance actions against your rubric of policy goals. The following is one framework but feel free to make your own:

Does the option:Option 1Option 2Option 3
Enhance Biosecurity
• By preventing incidents112
• By helping respond212
Foster Lab Safety
• By preventing incident122
• By helping respond221
Protect the environment
• By preventing incidents221
• By helping respond221
Other considerations
• Minimizing costs and burdens to stakeholders322
• Feasibility?223
• Not impede research223
• Promote constructive applications112

Option 1 - Ethical Licensing Option 2 - AI Option 3 - Compliance Gateway

Last, drawing upon this scoring, describe which governance option, or combination of options, you would prioritize, and why. Outline any trade-offs you considered as well as assumptions and uncertainties. For this, you can choose one or more relevant audiences for your recommendation, which could range from the very local (e.g. to MIT leadership or Cambridge Mayoral Office) to the national (e.g. to President Biden or the head of a Federal Agency) to the international (e.g. to the United Nations Office of the Secretary-General, or the leadership of a multinational firm or industry consortia). These could also be one of the “actor” groups in your matrix.

Audience International research agencies, space agencies, and global science

Recommended strategy - A hybrid governance model Based on the scoring matrix, I would prioritize a combination of Option 1 (Ethical access licensing) and option 2 (AI), with option 3 (Complinace gateway) implemented gradually as an international standard.

This hybrid model provides the stringest balance between biosecurity, lab safety, environmental protection, and scientific freedom.

Why this combination? Ethical access licensing This option scored best for preventing biosecurity and lab safety incidents. It ensures that only trained, verified users can access powerful features.

Prevention is more ethical and cost-effective than response. This system creates a culture of responsibility before harm can occur.

AI Use This option scored highest for helping respond to threats and promoting constructive use. It acts like a biosafety firewall that adapts as new risks emerge.

Even well-trained users can make mistakes. This system provides continuous protection without requiring constant human oversight.

Compliance Gateway Although it scored lower for feasibility, it is essential for planetary protection and biodiversity ethics. Ith should be phased in through international agreements.

IT requires legal alignment across nations and may slow innovation if enforced too early.

Homework Questions from Professor Jacobson

1. Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?

Error Rate of DNA Polymerase

In biological DNA replication, the primary enzymatic machinery is an error-correcting DNA polymerase, which operates via template-dependent 5’-3’ primer extension, supplemented by 5’-3’ error-correcting exonuclease and 3’-5’ proofreading exonuclease activities. This system achieves an error rate of approximately 1:106 (one error per 106 nucleotides incorporated). This fidelity is attained at a throughput of 10 milliseconds per base addition, in stark contrast to chemical synthesis methods, which exhibit an error rate of 1:10^2 and lack inherent correction mechanisms.

Comparison to the Human Genome Length

The human genome is quantified in the slides as approximately 3.2 gigabase pairs (Gbp), equivalent to 3.2 × 109 base pairs. Applying the polymerase error rate of 1:106, a single replication cycle would theoretically introduce circa 3.2 × 103 errors (3.2 × 109 / 106). This disparity highlights a critical vulnerability: uncorrected errors at this scale could precipitate deleterious mutations, oncogenic transformations, or cellular inviability. The slides underscore biology’s adaptive advantage through a throughput-error rate product differential of ~108 relative to chemical approaches, facilitating the replication of extensive genomes with minimal disruption.

Biological Mechanisms for Error Mitigation

To reconcile this discrepancy, biological systems deploy multifaceted error-correction strategies, reducing the effective error rate to ~1:10^9 or lower in vivo.

Mechanisms include:

  1. Intra-synthetic Proofreading The 3’-5’ proofreading exonuclease excises mismatched nucleotides concurrently with polymerization.

  2. Post-incorporation Repair The 5’-3’ exonuclease activity enables excision and resynthesis of erroneous segments.

  3. Ancillary Repair Pathways Mismatch repair systems, such as the MutS complex (Lamers et al., 2000, as cited in the error correction section), perform post-replicative surveillance and rectification.

These processes render biological synthesis inherently “error-correcting,” in opposition to the open-loop paradigms of chemical methods (e.g., phosphoramidite cycles). Consequently, organisms can faithfully replicate large genomes, such as the human 3.2 Gbp, sustaining cellular and evolutionary viability where raw error rates would otherwise prove untenable.

2. How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?

Number of Synonymous DNA Encodings for an Average Human Protein

The genetic code, characterized in the lecture slides as “Life’s Operating System,” consists of 64 codons specifying 20 amino acids and 3 termination signals, with degeneracy enabling multiple nucleotide triplets to encode identical amino acids. The slides indicate that an average human protein comprises 1,036 base pairs (bp), corresponding to approximately 345 codons (1,036 / 3 ≈ 345, excluding the stop codon).

The cardinality of synonymous DNA sequences for such a protein is contingent upon the amino acid composition, with individual residues encoded by 1–6 codons (e.g., 1 for methionine, 6 for leucine). Employing an average degeneracy of ~3 codons per amino acid (reflective of the code’s overall distribution), the theoretical number of encodings approximates 3345, or on the order of 10164 variants (log10(3^345) ≈ 164).

Practical Limitations Precluding Functionality of Many Encodings

  1. Organismal preferences for synonymous codons can attenuate translation efficiency, inducing ribosomal stalling or suboptimal tRNA utilization. Recoding discussions in the slides (e.g., for phage resistance) imply that non-preferred codons may engender expression failure in heterologous hosts.
  2. Sequences predisposed to deleterious folding can impede ribosomal procession or mRNA integrity. Illustrative cases from the slides depict minimum free energy (MFE) configurations at 25°C across GC contents of 10%, 50%, and 90%.
  • Low GC content (e.g., 10%) yields labile structures prone to degradation.
  • Elevated GC content (e.g., 90%) fosters hyperstable hairpins or loops, obstructing translation initiation.
  • These phenomena are governed by base-pairing free energies (A/T ≈ -1.2 kcal/mol; G/C ≈ -2.0 kcal/mol), with GC-rich motifs exacerbating folding propensity and hindering mRNA processing.
  1. Motifs susceptible to endonucleases, such as RNase III in Escherichia coli, precipitate premature mRNA degradation.
  2. Cryptic elements (e.g., promoters, terminators, or splice junctions) can disrupt transcriptional or post-transcriptional regulation.
  3. Gene assembly challenges, particularly for repetitive or GC-biased sequences, amplify inaccuracies in chemical or enzymatic synthesis.
  4. Factors such as tRNA abundance or cellular milieu can preclude functional proteogenesis, necessitating optimization for applications like pharmaceutical or biofuel production.

Homework Questions from Dr. LeProust

1. What’s the most commonly used method for oligo synthesis currently?

The most commonly used method for oligonucleotide (oligo) synthesis currently is solid-phase phosphoramidite chemistry. This approach, developed by Marvin Caruthers in 1981, involves a cyclic process of deblocking, coupling with a phosphoramidite nucleoside, capping unreacted sites, and oxidation, repeated for each nucleotide addition. It is performed on a solid support, such as controlled pore glass (CPG), enabling automated synthesis and high efficiency for short to medium-length oligos.

2. Why is it difficult to make oligos longer than 200nt via direct synthesis?

Synthesizing oligos longer than 200 nucleotides (nt) via direct chemical synthesis is challenging primarily due to the imperfect coupling efficiency in each cycle, typically around 98-99%. As the chain length increases, the yield of full-length product decreases exponentially according to the formula: yield ≈ (efficiency)(n-1), where n is the number of nucleotides. For example, at 99% efficiency, the yield for a 200 nt oligo is approximately 0.99199 ≈ 0.135 (13.5%), but for longer sequences, it drops significantly, leading to low yields, increased truncated products, and higher error rates from side reactions or depurination. Additionally, purification becomes more difficult, and secondary structures in long sequences can hinder synthesis.

3. Why can’t you make a 2000bp gene via direct oligo synthesis?

Direct oligo synthesis cannot produce a 2000 base pair (bp) gene because the length far exceeds the practical limits of chemical synthesis methods like phosphoramidite chemistry, where yields become negligible (e.g., 0.991999 ≈ 10{-8.7}, essentially zero). Genes of this size are double-stranded and require error-free sequences, but direct synthesis accumulates errors and impurities exponentially. Instead, long genes are assembled from shorter oligos (typically 40-300 nt) using enzymatic methods like PCR-based assembly or ligation, as illustrated in classical gene synthesis protocols, to achieve high fidelity and yield.

Homework Question from George Church

Choose ONE of the following three questions to answer; and please cite AI prompts or paper citations used, if any.

What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?

What code would you suggest for AA:AA interactions?

Given the one paragraph abstracts for these real 2026 grant programs sketch a response to one of them or devise one of your own:

The 10 essential amino acids required by all animals (those that cannot be synthesized endogenously in sufficient quantities and must be obtained through the diet) are: arginine, histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, and valine. This list is consistent across various species, including mammals like dogs, pigs, and horses, though some animals (e.g., cats) have an additional requirement for taurine. Slide #4 from Prof. Church’s lecture illustrates the standard genetic code mapping RNA codons to these amino acids (plus others), highlighting the ribosomal translation process that relies on this code to incorporate them into proteins.

The “Lysine Contingency” refers to a fictional genetic failsafe in Jurassic Park, where cloned dinosaurs were engineered to lack the ability to synthesize lysine, forcing dependence on park-supplied supplements to prevent survival if they escaped. Knowing that lysine is one of the 10 essential amino acids reinforces why this approach is inherently flawed: in nature, animals routinely obtain essential amino acids (including lysine) from dietary sources like plants (e.g., beans, soy) or other animals, without needing to synthesize them. Escaped dinosaurs could simply forage or hunt for lysine-rich foods, rendering the contingency ineffective as depicted in the story where they thrive on Isla Nublar. The limitations of single-AA dependency as a safety measure. More robust biocontainment, like full genome recoding to alter multiple codons (as in Syn61Δ3 bacteria), could create true barriers incompatible with wild-type biology, unlike the dietary workaround possible with lysine.