Subsections of Kshitij Sodani — HTGAA Spring 2026
Homework
Weekly homework submissions:
Week 1 HW: Principles and Practices
- Biological engineering application I propose a “DNA Compiler,” a software tool that helps researchers turn DNA designs into safe, synthesis-ready sequences. The main idea is to build safety checks directly into the design process rather than relying only on downstream screening or manual review. The compiler would analyze a DNA sequence, flag potential issues, and suggest safer alternatives (for example, adjusting sequence features or highlighting areas that require review). It would also generate a clear record of how the design was modified or approved. The goal is to make good safety practices automatic and easy to follow.
Homework Questions from Professor Jacobson Nature’s machinery for copying DNA is DNA polymerase. According to the lecture slides, an error-correcting polymerase has an error rate of approximately 1 error per 10⁶ bases added. The human genome is about 3.2 × 10⁹ base pairs long. Comparing these numbers, if replication relied only on polymerase accuracy, we would expect on the order of thousands of errors during replication of a single human genome. This highlights a discrepancy between the intrinsic error rate of polymerase and the need to faithfully copy very large genomes.
Part A – Conceptual Questions How many amino acid molecules are in 500 g of meat? A typical amino acid has a mass of about 100 g per mole. If you have 500 g, that corresponds to roughly 5 moles. Since one mole contains about 6 × 1023 molecules, 5 moles would contain about 3 × 1024 amino acid molecules.
Subsections of Homework
Week 1 HW: Principles and Practices
1. Biological engineering application
I propose a “DNA Compiler,” a software tool that helps researchers turn DNA designs into safe, synthesis-ready sequences. The main idea is to build safety checks directly into the design process rather than relying only on downstream screening or manual review. The compiler would analyze a DNA sequence, flag potential issues, and suggest safer alternatives (for example, adjusting sequence features or highlighting areas that require review). It would also generate a clear record of how the design was modified or approved. The goal is to make good safety practices automatic and easy to follow.
2. Governance and policy goals
Primary goal: reduce harm while supporting useful biological research.
Sub-goals:
- Prevent accidents by identifying risky designs early in the process.
- Improve accountability by keeping a clear record of how designs were created and approved.
- Avoid slowing research unnecessarily by offering helpful suggestions rather than simply blocking designs.
3. Governance actions
Option 1, Institutional adoption
Research institutions could make the DNA Compiler part of their standard workflow. Before ordering synthetic DNA, researchers would run their designs through the tool.
Purpose: move safety checks earlier in the process.
Design: integrate with existing ordering systems and biosafety review procedures.
Assumptions: researchers will use the tool if it is easy and helpful.
Risks: people may try to bypass it if it becomes too restrictive.
Option 2, Vendor integration
DNA synthesis companies could accept or encourage compiler-generated safety reports when customers submit sequences.
Purpose: create a shared safety baseline across different labs and providers.
Design: vendors recognize a standard report format generated by the compiler.
Assumptions: companies see value in reducing risk and simplifying screening.
Risks: could increase costs or create barriers if requirements are too strict.
Option 3, Shared rule updates
A community group maintains and updates the safety rules used by the compiler as new risks or best practices emerge.
Purpose: keep the tool current as biology advances.
Design: periodic updates distributed to users, similar to software updates.
Assumptions: collaboration improves coverage of new issues.
Risks: disagreements about rules or slow updates.
4. Scoring
(1 = best)
| Goal | Option 1 | Option 2 | Option 3 |
|---|---|---|---|
| Enhance biosecurity | 1 | 2 | 2 |
| Foster lab safety | 1 | 2 | 2 |
| Protect environment | 2 | 2 | 2 |
| Minimize burden | 2 | 3 | 2 |
| Feasibility | 1 | 2 | 2 |
| Promote constructive uses | 1 | 2 | 1 |
5. Prioritization
I would prioritize Option 1 first because it is the most practical starting point. Integrating the DNA Compiler into institutional workflows creates immediate benefits by improving design quality and reducing accidents without requiring major policy changes. After adoption grows, Option 2 can extend the approach across the industry by creating shared standards between labs and vendors. Option 3 should develop alongside these steps to ensure that the rules evolve over time, but it likely works best once the tool already has a strong user base.
Week 2 Pre-Lecture: Homework
Homework Questions from Professor Jacobson
Nature’s machinery for copying DNA is DNA polymerase. According to the lecture slides, an error-correcting polymerase has an error rate of approximately 1 error per 10⁶ bases added.
The human genome is about 3.2 × 10⁹ base pairs long. Comparing these numbers, if replication relied only on polymerase accuracy, we would expect on the order of thousands of errors during replication of a single human genome. This highlights a discrepancy between the intrinsic error rate of polymerase and the need to faithfully copy very large genomes.
Biology resolves this by incorporating multiple layers of error correction. DNA polymerases include proofreading activity that detects and removes mismatched nucleotides during synthesis, and additional repair pathways (such as mismatch repair systems shown in the lecture) further correct errors after replication. Together, these mechanisms allow cells to maintain high fidelity despite the large size of the genome.
The lecture states that an average human protein corresponds to about 1036 base pairs. Since codons consist of three nucleotides, this corresponds to roughly a few hundred amino acids. The genetic code is degenerate, meaning that multiple codons can encode the same amino acid. Because there are 64 possible codons but only 20 amino acids, many different DNA sequences can theoretically encode the same protein sequence. The number of possible coding sequences therefore grows exponentially with protein length, so an average human protein can be encoded by a very large number of distinct DNA sequences.
In practice, not all synonymous sequences work equally well. The lecture shows that nucleotide composition (such as GC content) and sequence-dependent secondary structures affect molecular behavior. Different synonymous sequences can produce different RNA folding patterns or energetics, which can influence transcription, translation efficiency, and stability. As a result, biological and physical constraints limit which DNA sequences successfully produce the desired protein, even if they encode the same amino acid sequence.
Homework Questions from Dr. LeProust
The most commonly used method is solid-phase phosphoramidite chemical synthesis. In this approach, nucleotides are added sequentially to a growing DNA chain attached to a solid support. Each cycle consists of coupling a phosphoramidite nucleotide, capping unreacted sites, oxidation, and deprotection, and this cycle is repeated until the desired length is reached.
Direct oligo synthesis proceeds one base at a time, and each chemical addition step is not perfectly efficient. Because the synthesis is iterative, small inefficiencies compound with every cycle. As the sequence length increases:
- The fraction of full-length molecules decreases.
- Products accumulate.
- Overall yield/purity drop significantly.
This makes it increasingly difficult to obtain high-quality long oligos directly.
A 2000 bp gene would require thousands of sequential chemical coupling steps. Since each step has less than 100% efficiency, the probability of producing a perfect full-length molecule becomes extremely low. Errors and truncations would dominate the product mixture.
Instead, long genes are typically made by synthesizing shorter oligos (example around 100–200 nt) and then assembling them enzymatically into longer fragments or full genes. This avoids the exponential loss in yield and accuracy associated with very long direct chemical synthesis.
Homework Question from George Church
Unlike NA:NA base pairing or the NA to AA genetic code, AA:AA interactions are not defined by a strict one-to-one symbolic mapping. Instead, an AA:AA code would be based on physico chemical compatibility between amino acid side chains. Key rules would include charge complementarity (positive interacting with negative residues), hydrogen-bond donor/acceptor matching, hydrophobic residues packing together, and steric shape complementarity for efficient packing. This is similar to lecture notes framing that different biological codes reflect interaction constraints: DNA basepairs emphasize specific pairing rules, while protein interactions emerge from chemical properties and geometry rather than fixed symbolic pairs.
Week 4 HW: Protein Design
Part A – Conceptual Questions
- How many amino acid molecules are in 500 g of meat?
A typical amino acid has a mass of about 100 g per mole.
If you have 500 g, that corresponds to roughly 5 moles.
Since one mole contains about 6 × 1023 molecules, 5 moles would contain about 3 × 1024 amino acid molecules.
This shows how enormous molecular numbers are, even in everyday amounts of food.
- Why don’t we turn into a cow when we eat beef?
When we digest food, proteins are broken down into individual amino acids.
Our body does not keep cow proteins intact.
Instead, we reuse those amino acids to build new proteins based on instructions from our own DNA.
So what we eat provides building blocks, not the identity of the organism.
- Is it possible to create new, artificial amino acids?
Yes. Chemists can synthesize amino acids that do not occur naturally.
These can include unusual side chains, special reactive groups, or atoms like fluorine.
Such modified amino acids are used in research to design proteins with new properties.
- Where did amino acids originate before life existed?
Amino acids could have formed through simple chemical reactions on the early Earth.
Experiments have shown that under conditions resembling the early atmosphere, amino acids can form from basic gases and energy sources like lightning.
They have also been detected in meteorites, suggesting they may have come from space as well.
- What happens if you build an α-helix from D-amino acids?
Natural proteins use L-amino acids and form right-handed helices.
If you instead used D-amino acids, the helix would twist in the opposite direction, forming a left-handed structure.
- Are there other types of helices beyond the common ones?
Yes. While the α-helix is the most familiar, researchers have identified and even engineered other helical forms.
With different amino acids or synthetic designs, new helical geometries can be explored.
- Why do β-sheets often clump together? What drives this?
β-strands align side by side and form hydrogen bonds.
When these sheets are exposed, they can easily bind to other β-strands.
Aggregation is mainly driven by:
- Hydrogen bonding between strands
- Hydrophobic side chains packing together
- The flat, extended shape of β-sheets that allows stacking
These features make β-sheets prone to sticking together.
- Why are β-sheets common in amyloid diseases? Could they be useful?
In amyloid diseases, proteins misfold and reorganize into tightly stacked β-sheet structures.
These assemblies are very stable and resist breakdown, which leads to accumulation in tissues.
However, that same stability and self-assembly make amyloid-like fibers attractive for materials science, where strong and durable nanostructures are useful.
- Propose a β-sheet sequence that forms an ordered structure.
A repeating pattern that alternates hydrophobic and polar residues can promote organized packing, for example:
Val–Thr–Val–Thr–Val–Thr
This arrangement allows one face of the sheet to interact with water while the other packs tightly against neighboring sheets, helping create a stable and ordered structure.
Labs
Lab writeups:
Dialga: A Legendary Pokemon from the Sinnoh region
Subsections of Labs
Week 1 Lab: Pipetting

