Subsections of Kshitij Sodani — HTGAA Spring 2026
Homework
Weekly homework submissions:
Week 1 HW: Principles and Practices
- Biological engineering application I propose a “DNA Compiler,” a software tool that helps researchers turn DNA designs into safe, synthesis-ready sequences. The main idea is to build safety checks directly into the design process rather than relying only on downstream screening or manual review. The compiler would analyze a DNA sequence, flag potential issues, and suggest safer alternatives (for example, adjusting sequence features or highlighting areas that require review). It would also generate a clear record of how the design was modified or approved. The goal is to make good safety practices automatic and easy to follow.
Homework Questions from Professor Jacobson Nature’s machinery for copying DNA is DNA polymerase. According to the lecture slides, an error-correcting polymerase has an error rate of approximately 1 error per 10⁶ bases added. The human genome is about 3.2 × 10⁹ base pairs long. Comparing these numbers, if replication relied only on polymerase accuracy, we would expect on the order of thousands of errors during replication of a single human genome. This highlights a discrepancy between the intrinsic error rate of polymerase and the need to faithfully copy very large genomes.
Subsections of Homework
Week 1 HW: Principles and Practices
1. Biological engineering application
I propose a “DNA Compiler,” a software tool that helps researchers turn DNA designs into safe, synthesis-ready sequences. The main idea is to build safety checks directly into the design process rather than relying only on downstream screening or manual review. The compiler would analyze a DNA sequence, flag potential issues, and suggest safer alternatives (for example, adjusting sequence features or highlighting areas that require review). It would also generate a clear record of how the design was modified or approved. The goal is to make good safety practices automatic and easy to follow.
2. Governance and policy goals
Primary goal: reduce harm while supporting useful biological research.
Sub-goals:
- Prevent accidents by identifying risky designs early in the process.
- Improve accountability by keeping a clear record of how designs were created and approved.
- Avoid slowing research unnecessarily by offering helpful suggestions rather than simply blocking designs.
3. Governance actions
Option 1, Institutional adoption
Research institutions could make the DNA Compiler part of their standard workflow. Before ordering synthetic DNA, researchers would run their designs through the tool.
Purpose: move safety checks earlier in the process.
Design: integrate with existing ordering systems and biosafety review procedures.
Assumptions: researchers will use the tool if it is easy and helpful.
Risks: people may try to bypass it if it becomes too restrictive.
Option 2, Vendor integration
DNA synthesis companies could accept or encourage compiler-generated safety reports when customers submit sequences.
Purpose: create a shared safety baseline across different labs and providers.
Design: vendors recognize a standard report format generated by the compiler.
Assumptions: companies see value in reducing risk and simplifying screening.
Risks: could increase costs or create barriers if requirements are too strict.
Option 3, Shared rule updates
A community group maintains and updates the safety rules used by the compiler as new risks or best practices emerge.
Purpose: keep the tool current as biology advances.
Design: periodic updates distributed to users, similar to software updates.
Assumptions: collaboration improves coverage of new issues.
Risks: disagreements about rules or slow updates.
4. Scoring
(1 = best)
| Goal | Option 1 | Option 2 | Option 3 |
|---|---|---|---|
| Enhance biosecurity | 1 | 2 | 2 |
| Foster lab safety | 1 | 2 | 2 |
| Protect environment | 2 | 2 | 2 |
| Minimize burden | 2 | 3 | 2 |
| Feasibility | 1 | 2 | 2 |
| Promote constructive uses | 1 | 2 | 1 |
5. Prioritization
I would prioritize Option 1 first because it is the most practical starting point. Integrating the DNA Compiler into institutional workflows creates immediate benefits by improving design quality and reducing accidents without requiring major policy changes. After adoption grows, Option 2 can extend the approach across the industry by creating shared standards between labs and vendors. Option 3 should develop alongside these steps to ensure that the rules evolve over time, but it likely works best once the tool already has a strong user base.
Week 2 Pre-Lecture: Homework
Homework Questions from Professor Jacobson
Nature’s machinery for copying DNA is DNA polymerase. According to the lecture slides, an error-correcting polymerase has an error rate of approximately 1 error per 10⁶ bases added.
The human genome is about 3.2 × 10⁹ base pairs long. Comparing these numbers, if replication relied only on polymerase accuracy, we would expect on the order of thousands of errors during replication of a single human genome. This highlights a discrepancy between the intrinsic error rate of polymerase and the need to faithfully copy very large genomes.
Biology resolves this by incorporating multiple layers of error correction. DNA polymerases include proofreading activity that detects and removes mismatched nucleotides during synthesis, and additional repair pathways (such as mismatch repair systems shown in the lecture) further correct errors after replication. Together, these mechanisms allow cells to maintain high fidelity despite the large size of the genome.
The lecture states that an average human protein corresponds to about 1036 base pairs. Since codons consist of three nucleotides, this corresponds to roughly a few hundred amino acids. The genetic code is degenerate, meaning that multiple codons can encode the same amino acid. Because there are 64 possible codons but only 20 amino acids, many different DNA sequences can theoretically encode the same protein sequence. The number of possible coding sequences therefore grows exponentially with protein length, so an average human protein can be encoded by a very large number of distinct DNA sequences.
In practice, not all synonymous sequences work equally well. The lecture shows that nucleotide composition (such as GC content) and sequence-dependent secondary structures affect molecular behavior. Different synonymous sequences can produce different RNA folding patterns or energetics, which can influence transcription, translation efficiency, and stability. As a result, biological and physical constraints limit which DNA sequences successfully produce the desired protein, even if they encode the same amino acid sequence.
Homework Questions from Dr. LeProust
The most commonly used method is solid-phase phosphoramidite chemical synthesis. In this approach, nucleotides are added sequentially to a growing DNA chain attached to a solid support. Each cycle consists of coupling a phosphoramidite nucleotide, capping unreacted sites, oxidation, and deprotection, and this cycle is repeated until the desired length is reached.
Direct oligo synthesis proceeds one base at a time, and each chemical addition step is not perfectly efficient. Because the synthesis is iterative, small inefficiencies compound with every cycle. As the sequence length increases:
- The fraction of full-length molecules decreases.
- Products accumulate.
- Overall yield/purity drop significantly.
This makes it increasingly difficult to obtain high-quality long oligos directly.
A 2000 bp gene would require thousands of sequential chemical coupling steps. Since each step has less than 100% efficiency, the probability of producing a perfect full-length molecule becomes extremely low. Errors and truncations would dominate the product mixture.
Instead, long genes are typically made by synthesizing shorter oligos (example around 100–200 nt) and then assembling them enzymatically into longer fragments or full genes. This avoids the exponential loss in yield and accuracy associated with very long direct chemical synthesis.
Homework Question from George Church
Unlike NA:NA base pairing or the NA to AA genetic code, AA:AA interactions are not defined by a strict one-to-one symbolic mapping. Instead, an AA:AA code would be based on physico chemical compatibility between amino acid side chains. Key rules would include charge complementarity (positive interacting with negative residues), hydrogen-bond donor/acceptor matching, hydrophobic residues packing together, and steric shape complementarity for efficient packing. This is similar to lecture notes framing that different biological codes reflect interaction constraints: DNA basepairs emphasize specific pairing rules, while protein interactions emerge from chemical properties and geometry rather than fixed symbolic pairs.
Labs
Lab writeups:
Subsections of Labs
Week 1 Lab: Pipetting
