Week 1 HW: Principles and Practices

title: ‘Week 1 HW: Principles & Practices’ weight: 10

Week 1 HW: Principles & Practices

Introduction and Motivation

This week emphasized that biological engineering is not only about what we can build, but also how and why we choose to build it. The lectures and recitation highlighted that ethics, safety, security, and governance should not be treated as external constraints applied only after a technology is developed. Instead, they should be considered as integral design dimensions from the earliest stages of a project.

Revisiting a previous biosensing project through the HTGAA framework allowed me to explicitly articulate design decisions that were originally motivated by technical performance, but which also carry strong ethical, safety, and governance implications. This exercise helped me move beyond a purely technical evaluation and reflect more deeply on responsibility, context, accessibility, and downstream impact.

Class Assignment: Biological Engineering Application and Governance

Biological Engineering Application

The biological engineering application I focus on is a cell-free biosensor based on a Pb²⁺-specific DNAzyme coupled to CRISPR-Cas12a, designed for the ultrasensitive detection of lead in water.

Lead contamination represents a serious public health concern, with no safe threshold for chronic exposure. While analytical techniques such as ICP-MS or atomic absorption spectroscopy provide high sensitivity and specificity, they require centralized laboratories, specialized equipment, trained personnel, and relatively long processing times. This limits their accessibility for frequent, decentralized, or field-based monitoring.

Previous generations of biological sensors, including whole-cell bacterial biosensors, demonstrated the feasibility of biological detection. However, whole-cell systems can suffer from long response times, relatively high detection limits, regulatory barriers, and biosafety concerns related to the use of living genetically modified organisms.

In contrast, this project deliberately adopts a cell-free, in vitro architecture. The goal is to translate the presence of Pb²⁺ into a fluorescent signal in under one hour, while reducing biological containment risks. The proposed system combines:

A Pb²⁺-responsive DNAzyme as the recognition module.
A DNA trigger released or exposed upon Pb²⁺-dependent cleavage.
A CRISPR-Cas12a amplification module activated by the DNA trigger.
A fluorescent reporter cleaved by activated Cas12a to produce a measurable signal.

The motivation behind this application is to combine high sensitivity, portability, and safety by design, enabling environmental monitoring in settings where conventional laboratory infrastructure is unavailable, while minimizing biological risks.

Governance and Policy Goals

Reframing this project within the HTGAA framework led to the identification of several governance and policy goals that extend beyond technical performance.

Goal A — Prevent Harm and Misuse

The first goal is to ensure that the technology does not enable harmful applications or irresponsible deployment.

Specific sub-goals include:

Avoid enabling biological manipulation, propagation, or amplification of hazardous agents.
Prevent repurposing of the sensing platform for unintended or harmful biological activities.
Avoid creating a false sense of security through poorly validated field tests.
Ensure that results are interpreted responsibly and not used to make unsupported public health or environmental claims.

Goal B — Enhance Biosafety and Biosecurity

The second goal is to reduce the biological risks associated with biosensor development and deployment.

Specific sub-goals include:

Minimize risks associated with handling living organisms by using a fully cell-free system.
Reduce the likelihood of accidental environmental release or uncontrolled replication.
Design the system so that it cannot reproduce, evolve, or persist in the environment.
Encourage safe handling, storage, and disposal of biological and chemical reagents.

Goal C — Promote Constructive and Equitable Use

The third goal is to ensure that the technology is used for beneficial, accessible, and socially responsible environmental monitoring.

Specific sub-goals include:

Enable access to sensitive environmental monitoring tools without requiring advanced infrastructure.
Support public health and environmental decision-making rather than surveillance, coercive enforcement, or unsupported alarmism.
Make limitations, false positives, false negatives, and validation requirements clear to users.
Encourage deployment in collaboration with local communities, public health actors, and environmental agencies.

Governance Actions

Option 1 — Safe-by-Design, Cell-Free System Architecture

Purpose

Many biosensing platforms rely on living cells, which introduce biosafety, containment, and regulatory challenges. This project replaces whole-cell systems with a fully cell-free, non-replicative architecture.

The proposed change is to integrate safety directly into the technical design. Instead of relying only on downstream regulation or user behavior, the system itself is designed to reduce the likelihood of biological release, persistence, or replication.

Design

This approach is implemented directly by academic researchers during the design phase and can be reinforced by funding agencies, institutional biosafety committees, and educational programs that prioritize safe-by-design technologies.

Key design features include:

No living genetically modified organisms in the final detection reaction.
No self-replicating biological components.
In vitro CRISPR-Cas12a activity limited to reporter cleavage.
Clear separation between detection chemistry and any organismal engineering.

Assumptions

This option assumes that:

Eliminating living components significantly reduces biosafety risks.
Performance can be maintained or improved in vitro.
The major risks of the platform are related more to deployment, interpretation, and reagent handling than to biological propagation.
Users will understand that a cell-free system is safer, but not risk-free.

Risks of Failure and “Success”

Failure risk: The system may be less robust in complex environmental matrices, such as dirty water samples containing inhibitors, particulates, organic matter, or competing metal ions.

Success risk: A highly portable test could be deployed too broadly without adequate validation, leading to overconfidence in results or inappropriate decision-making based on preliminary measurements.

Option 2 — Transparent Documentation of Limitations and Failures

Purpose

Scientific reporting often emphasizes successful outcomes while underreporting failures, optimization dead ends, matrix effects, and ambiguous results. This option proposes transparent documentation of both successful and unsuccessful experimental steps.

The goal is to improve reproducibility, avoid overclaiming, and make ethical reflection part of the scientific record.

Design

This action can be implemented through:

Detailed lab records.
Public documentation on the HTGAA website.
Clear separation between simulated, preliminary, and experimentally validated results.
Explicit reporting of failed designs, negative controls, and troubleshooting.
Discussion of limitations and uncertainties.

This action is mainly implemented by researchers, students, instructors, and academic communities, but it can also be encouraged by journals, funders, and training programs.

Assumptions

This option assumes that:

Transparency improves reproducibility.
Reporting failures can help others avoid repeating the same mistakes.
Open documentation builds trust.
Students and early-stage researchers can document uncertainty without being penalized for not having a perfect final result.

Risks of Failure and “Success”

Failure risk: Documentation could become superficial or performative if researchers include generic statements without meaningful detail.

Success risk: Excessive documentation requirements could increase workload, especially for students and early-stage researchers, and could discourage experimentation if not balanced with practical expectations.

Option 3 — Context-Specific Deployment Guidelines

Purpose

Environmental biosensors may be deployed in diverse contexts with different ethical, social, legal, and public health implications. A test used for classroom demonstration is not equivalent to a test used for regulatory enforcement or public health decision-making.

This option proposes context-aware deployment guidelines that distinguish between:

Educational use.
Research use.
Preliminary environmental screening.
Public health monitoring.
Regulatory or legal decision-making.

Design

These guidelines would be developed by public health and environmental agencies in collaboration with researchers, local institutions, and community stakeholders.

A context-specific guideline could include:

Minimum validation requirements before field use.
Clear interpretation guidelines for positive and negative results.
Requirements for confirmatory testing with gold-standard methods.
Communication protocols for reporting contamination risks.
Ethical considerations for community-level environmental data.

Assumptions

This option assumes that:

Misuse risk depends strongly on deployment context.
Local institutions have the capacity to enforce or adapt guidelines.
Communities benefit from access to environmental information when it is communicated responsibly.
Preliminary tests should support, not replace, validated analytical methods.

Risks of Failure and “Success”

Failure risk: Guidelines may be inconsistently applied across regions, especially where regulatory infrastructure is weak.

Success risk: If guidelines become too restrictive or bureaucratic, they could delay deployment in high-need environments where accessible monitoring is urgently needed.

Scoring Matrix

Scoring key:
1 = strongest / most favorable alignment with the policy goal
2 = moderate alignment
3 = weakest / least favorable alignment
n/a = not applicable

Policy Goal / Evaluation Criterion	Option 1: Cell-free safe-by-design	Option 2: Transparent documentation	Option 3: Context-specific deployment guidelines
Enhance biosecurity by preventing incidents	1	2	2
Enhance biosecurity by helping respond	2	1	1
Foster lab safety by preventing incidents	1	2	2
Foster lab safety by helping respond	2	1	2
Protect the environment by preventing incidents	2	2	1
Protect the environment by helping respond	2	1	1
Minimize costs and burdens to stakeholders	1	3	2
Feasibility	1	2	2
Not impede research	1	2	3
Promote constructive applications	1	1	2

Prioritization and Recommendation

Based on this analysis, the highest priority should be given to Option 1: safe-by-design, cell-free architecture, complemented by Option 2: transparent documentation of limitations and failures.

This combination embeds ethical and governance considerations directly into technical design and research practice, rather than relying only on downstream regulation. The cell-free architecture reduces the biological risks associated with living engineered organisms, while transparent documentation reduces the risk of overclaiming, improves reproducibility, and helps future users understand the true limits of the system.

This combined approach is particularly relevant for academic research institutions, teaching laboratories, and funding agencies, where early design choices strongly influence future applications. While these decisions may introduce additional development effort, they significantly enhance safety, trust, and long-term societal benefit.

Option 3, context-specific deployment guidelines, is also important, but I would prioritize it at a later stage, once the technical system has been experimentally validated. Deployment governance becomes especially relevant when moving from proof-of-concept research to real-world environmental monitoring.

The main trade-off is that stronger governance can slow deployment. However, for environmental health technologies, speed should not come at the cost of unreliable or poorly interpreted results. A portable lead biosensor should empower communities and researchers, but it should not replace validated confirmatory testing before major public health or regulatory decisions are made.

Weekly Reflection

A key insight from this week is that biosensing technologies are not ethically neutral, even when developed for public health or environmental protection. Portability and accessibility are usually framed as purely positive features, but they can also enable misuse, misinterpretation, or premature deployment if the social and regulatory context is not carefully considered.

Engaging with the recitation examples reinforced the importance of situating my project at the detection and prevention end of the biological intervention spectrum. My proposed system does not edit genomes, release organisms, or introduce engineered biological entities into the environment. However, it still carries ethical responsibilities related to data quality, communication, access, and interpretation.

This week shifted my perspective from asking only:

Can this work?

to also asking:

Should it work this way, under what conditions, and who could be affected by its use?

That mindset is especially important for biosensors intended for environmental monitoring, because the consequences of a result are not only technical. A positive lead detection result could influence public trust, community concern, regulatory response, and resource allocation. Therefore, responsible biosensor development must include validation, transparency, and careful communication from the beginning.

Documentation Practice

In alignment with the course emphasis on documentation, I am recording all in silico design steps, experimental iterations, failed conditions, and troubleshooting decisions. This documentation is intended to support reproducibility, collaborative learning, and ethical transparency.

For this project, I aim to make visible the full design journey rather than only the successful outcomes. This includes:

Conceptual design decisions.
Sequence design rationale.
Simulation and modeling steps.
Failed or uncertain design choices.
Limitations of the proposed detection system.
Safety and governance considerations.

This approach is important because reproducibility and responsible innovation depend not only on final results, but also on documenting how those results were reached.

Week 2 Lecture Preparation

In preparation for Week 2, “DNA Read, Write, and Edit,” I reviewed the lecture questions and answered the required prompts from Professor Jacobson, Dr. LeProust, and one selected question from Professor Church.

Professor Jacobson — Homework Questions

1. What is the error rate of polymerase? How does this compare to the length of the human genome? How does biology deal with that discrepancy?

DNA polymerases are highly accurate, but they are not perfect. A typical raw DNA polymerase error rate can be around 10^{-5 to 10}-6 errors per nucleotide incorporated, depending on the polymerase and biological context. After proofreading and mismatch repair, the final replication error rate can be reduced to approximately 10^{-9 to 10}-10 errors per base per cell division.

This is important because the human genome contains approximately 3.2 billion base pairs in the haploid genome, or about 6.4 billion base pairs in a diploid cell. Even a very low error rate can therefore generate many potential mistakes if no correction mechanisms exist.

Biology deals with this discrepancy through several layers of quality control:

Nucleotide selectivity by DNA polymerases.
Exonuclease proofreading, which removes incorrectly incorporated nucleotides.
Mismatch repair, which corrects errors that escape proofreading.
DNA damage repair pathways, which repair chemically damaged bases or strand breaks.
Cell-cycle checkpoints, which prevent damaged cells from continuing division.
Apoptosis or senescence, which can eliminate cells with severe genome instability.

Together, these mechanisms reduce the mutational burden and help preserve genome integrity across cell divisions.

2. How many different ways are there to code for an average human protein? In practice, what are some of the reasons that all of these different codes do not work to code for the protein of interest?

Because the genetic code is degenerate, most amino acids can be encoded by more than one codon. For a protein of length n, the number of possible DNA coding sequences is the product of the number of synonymous codons available for each amino acid:

Number of possible coding sequences = d1 × d2 × d3 × ... × dn

where each d is the codon degeneracy for a given amino acid.

For an average human protein of several hundred amino acids, this number is astronomically large. A rough estimate using an average degeneracy of about 3 codons per amino acid for a 400-amino-acid protein gives:

3^400 ≈ 10^190 possible coding sequences

However, not all synonymous coding sequences work equally well in practice. Several factors influence whether a DNA sequence can efficiently produce the desired protein:

Codon usage bias: Different organisms prefer different synonymous codons.
tRNA abundance: Rare codons can slow translation or reduce expression.
GC content: Very high or very low GC content can affect synthesis, stability, and amplification.
mRNA secondary structure: Strong structures near the ribosome binding site or start codon can reduce translation.
Cryptic splice sites: In eukaryotic systems, some sequences may be incorrectly spliced.
Premature termination or polyadenylation-like motifs: These can interfere with transcription or RNA processing.
Internal repeats: Repetitive DNA can be difficult to synthesize, clone, or maintain.
Restriction sites: Some sequences may contain sites that interfere with cloning strategies.
RNA stability: Synonymous changes can alter mRNA half-life.
Translation speed and co-translational folding: Codon choice can influence how the protein folds during translation.
Synthesis and assembly constraints: Some DNA sequences are harder to chemically synthesize or assemble.

Therefore, although the theoretical number of coding sequences is enormous, the number of practical, expressible, and functional sequences is much smaller.

Dr. LeProust — Homework Questions

1. What is the most commonly used method for oligo synthesis currently?

The most commonly used method for oligonucleotide synthesis is solid-phase phosphoramidite chemistry.

In this method, the oligonucleotide is synthesized step by step on a solid support. Each nucleotide addition cycle typically includes:

Deprotection, which exposes a reactive hydroxyl group.
Coupling, where the next phosphoramidite nucleotide is added.
Capping, which blocks unreacted chains.
Oxidation, which stabilizes the phosphate linkage.

This cyclic chemistry allows controlled synthesis of DNA or RNA oligonucleotides with defined sequences.

2. Why is it difficult to make oligos longer than 200 nt via direct synthesis?

It is difficult to synthesize oligos longer than approximately 200 nucleotides because oligo synthesis is a stepwise chemical process and each coupling cycle is less than 100% efficient.

Even if each individual step is highly efficient, small inefficiencies accumulate over many cycles. As the sequence becomes longer, several problems increase:

The fraction of full-length correct product decreases.
Truncated products accumulate.
Deletion errors become more likely.
Depurination and chemical damage can occur.
Sequence heterogeneity increases.
Purification becomes more difficult.
Quality control becomes more challenging.

For example, if each coupling step were 99% efficient, the theoretical full-length yield after 200 additions would be much lower than after 50 additions. Therefore, long oligos are harder to synthesize accurately and economically by direct chemical synthesis.

3. Why can’t you make a 2000 bp gene via direct oligo synthesis?

A 2000 bp gene cannot be reliably produced by direct oligo synthesis because the cumulative error rate and loss of full-length product over thousands of synthesis cycles would be too high.

Directly synthesizing a 2000 nucleotide sequence would produce a complex mixture of incomplete, mutated, and damaged products rather than a clean full-length gene. The longer the sequence, the lower the probability that every nucleotide was added correctly.

Instead, genes are usually produced by a modular strategy:

Shorter oligos are chemically synthesized.
These oligos are assembled into larger fragments.
Larger fragments are joined enzymatically or through DNA assembly methods.
The final construct is cloned and sequence-verified.

This strategy improves yield, accuracy, and error correction. It also allows problematic regions to be redesigned or corrected before the final full-length gene is obtained.

George Church — Homework Question

Question chosen

AA:AA and NA:NA codes — What code would you suggest for AA:AA interactions?

Why We Need a Code and What It Can and Cannot Do

Protein-protein interactions are not “pairwise letters” like Watson-Crick base pairing. They depend strongly on three-dimensional context, including distance, orientation, solvent exposure, dynamics, post-translational modifications, pH, ionic strength, and local environment.

Still, a useful amino acid to amino acid interaction “code” can exist as a coarse-grained interaction alphabet: a compact way to describe which residue pairs are likely to attract, repel, stabilize, or modulate protein interfaces.

The goal is not to create a perfect predictor of protein structure. Instead, the goal is to create a portable interaction language that is:

Symmetric: A-B is equivalent to B-A.
Composable: Many local contacts can describe one interface.
Extendable: The code can include non-standard amino acids or post-translational modifications.
Human-usable: The system should be simpler than a full 20 × 20 interaction table.

Proposed AA:AA Interaction Code

I propose a two-layer code.

Layer 1 — Assign Each Amino Acid to an Interaction Class

Each amino acid can be assigned to a dominant chemical interaction class:

Class	Meaning	Amino acids
H	Hydrophobic aliphatic	A, V, L, I, M
Ar	Aromatic	F, Y, W
P	Polar uncharged	S, T, N, Q
D+	Cationic / donor-leaning	K, R, H
A−	Acidic / anionic	D, E
S	Sulfur / thiol special	C
G	Glycine / conformational special	G
Pro	Proline / conformational breaker	P

H and Ar are separated because aromatic residues can participate in π-stacking and cation-π interactions, which are distinct from simple hydrophobic packing. Cysteine is treated separately because it can form disulfide bonds and participate in redox or metal-binding interactions. Glycine and proline are treated separately because their main importance is often conformational rather than purely chemical.

Layer 2 — Use an Interaction Operator Between Classes

A small set of operators can describe the type of contact between classes:

Operator	Meaning	Example
⊕	Favorable hydrophobic packing	H-H, H-Ar, Ar-Ar
±	Electrostatic attraction / salt bridge	D+ - A−
≠	Electrostatic repulsion	D+ - D+ or A− - A−
⋯	Hydrogen bonding	P-P, P-D+, P-A−
π+	Cation-π interaction	D+ - Ar
S-S	Disulfide bond	Cys-Cys
⟂	Conformational modulation	Pro-X or Gly-X

This yields a compact grammar:

Contact = Class(residue 1) OP Class(residue 2)

Examples:

Lys-Glu → D+ ± A−
Leu-Ile → H ⊕ H
Arg-Trp → D+ π+ Ar
Cys-Cys → S-S
Pro-X → Pro ⟂ X

Why This Code Is Useful

This code is useful because it compresses many possible amino acid interactions into a smaller, interpretable set of interaction modes.

Advantages include:

Small alphabet, broad coverage: It reduces the complexity of 20 × 20 amino acid combinations into a readable set of chemical interaction types.
Extendability: It can be expanded to include modified residues or non-standard amino acids.
Connection to protein design: Protein interface design often relies on the same basic principles: hydrophobic cores, hydrogen bond networks, salt bridges, cation-π interactions, disulfides, and conformational constraints.
Interpretability: It provides a human-readable vocabulary for reasoning about protein-protein interfaces.

Known Limitations

This code has important limitations:

Context dependence: The same residue pair can behave differently depending on whether it is buried or solvent-exposed.
pH dependence: Protonation states can change interactions, especially for histidine, acidic residues, and termini.
Geometry dependence: A chemically favorable interaction may not occur if the residues are not properly oriented.
Water mediation: Some contacts are mediated by water molecules rather than direct side-chain interactions.
Many-body effects: Protein interfaces are cooperative networks, not just sums of pairwise contacts.
Not a folding code: This is an interaction vocabulary, not a complete structural prediction system.

If more precision is needed, an environmental tag can be added:

(B) = buried
(E) = exposed

For example:

D+ ± A− (B)

This would represent a buried salt bridge, which may have a different energetic contribution than an exposed salt bridge.

Similarly:

H ⊕ H (B)

would represent buried hydrophobic packing, which is usually more stabilizing than exposed hydrophobic contact.

AI / Prompt Citation

I used ChatGPT to help draft and structure this answer.

Prompt used:

Given George Church’s lecture framing of codes beyond DNA-to-amino-acid translation, propose a concise, extensible AA:AA interaction code that captures major interaction types including hydrophobic contacts, salt bridges, hydrogen bonds, cation-π interactions, disulfides, and conformational effects.

I then edited and adapted the response to fit my own reasoning and the context of this homework.

Lab Preparation Note

The lab preparation and MIT safety training components were listed as required for MIT/Harvard students, but not applicable to Committed Listeners. Therefore, I did not complete the in-person lab-specific safety training or Atlas safety modules as part of this homework.

Summary

This week helped establish a framework for thinking about biological engineering as a technical, ethical, and governance challenge. For my proposed DNAzyme-Cas12a Pb²⁺ biosensor, the most important lesson was that safety and responsibility should be designed into the system from the beginning.

The main governance strategy I would prioritize is a safe-by-design, cell-free architecture, combined with transparent documentation of limitations, failures, and uncertainties. This combination supports biosafety, reproducibility, and constructive use while preserving the educational and scientific value of the project.

Week 1 HW: Principles and Practices

title: ‘Week 1 HW: Principles & Practices’ weight: 10

Week 1 HW: Principles & Practices

Introduction and Motivation

Class Assignment: Biological Engineering Application and Governance

Biological Engineering Application

Governance and Policy Goals

Goal A — Prevent Harm and Misuse

Goal B — Enhance Biosafety and Biosecurity

Goal C — Promote Constructive and Equitable Use

Governance Actions

Option 1 — Safe-by-Design, Cell-Free System Architecture

Purpose

Design

Assumptions

Risks of Failure and “Success”

Option 2 — Transparent Documentation of Limitations and Failures

Purpose

Design

Assumptions

Risks of Failure and “Success”

Option 3 — Context-Specific Deployment Guidelines

Purpose

Design

Assumptions

Risks of Failure and “Success”

Scoring Matrix

Prioritization and Recommendation

Weekly Reflection

Documentation Practice

Week 2 Lecture Preparation

Professor Jacobson — Homework Questions

1. What is the error rate of polymerase? How does this compare to the length of the human genome? How does biology deal with that discrepancy?

2. How many different ways are there to code for an average human protein? In practice, what are some of the reasons that all of these different codes do not work to code for the protein of interest?

Dr. LeProust — Homework Questions

1. What is the most commonly used method for oligo synthesis currently?

2. Why is it difficult to make oligos longer than 200 nt via direct synthesis?

3. Why can’t you make a 2000 bp gene via direct oligo synthesis?

George Church — Homework Question

Question chosen

Why We Need a Code and What It Can and Cannot Do

Proposed AA:AA Interaction Code

Layer 1 — Assign Each Amino Acid to an Interaction Class

Layer 2 — Use an Interaction Operator Between Classes

Why This Code Is Useful

Known Limitations

Optional Refinement

AI / Prompt Citation

Lab Preparation Note

Summary