Week 4 HW: Protein Design Part I
This page tackles all homeworks of the week 4 .
Part A. Conceptual Questions
| Questions | Answers |
|---|---|
| 1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons) | Considering the average meat contains 25% of proteins by weight[1], and proteins are approximately 100% composed of amino acids, we need to find the number of amino acids present in 125 grams. As it is already provided that an average amino acid is about 100 Daltons, the estimated number of amino acids is equal to: 125 g 100 Dalton = 125 g 100 g/mol = 1.25 mol × 6.022 × 1023 mol-1 ≈ 7.53e23 molecules |
| 2. Why do humans eat beef but do not become a cow, eat fish but do not become fish? | The human digestive system breaks down all raw materials into its basic forms (for example proteins are broken down into the amino acids) and then these are used for the body's own processes. If the proteins are somehow magically ingested as is, in their same form in the original organism and those proteins somehow get inside a living human cell, then there could be some other issues related to creation of new pathways which are similar to what the protein was used for in the original organism and this could have unintended consequences ultimately killing the carnivore. Nucleic Acids (constituents of DNA/RNA) are also broken down in our digestive system preventing any possibility of incorporating the external DNA fragments within our cells; in the case of a leaky-gut when some protein or DNA fragments might enter the body, the immune system responds appropriately. Thus, the digestive system acts as an information shredder, passing on raw memory-less ingredients to the host. |
| 3. Why are there only 20 natural amino acids? | There are >500 amino acids that occur naturally, but only 22 of them are expressed through living genes.[2] | Why only 22 seems a tough question for evolutionists, my hunch is that even if life started with more than 500 types of animo acids being expressed, there could be eating preferences which could have led to the specific pathways (which we cuurently have) being reinforced while the other pathways disappeared slowly; or, the origin of life that posibly evolved in a pond-cluster started with these 22 essentials while the other pond-clusters didn't make it to large scale organisms. Solid proof can only be found upon trying to replicate a fast-tracked evolutionary process using all pathways consisting of 500 AAs and then observing if the evolution converges to the same 22 AAs.
| 4. Can you make other non-natural amino acids? Design some new amino acids. | As per the definition of Amino Acids, they must contain Amino groups and Carboxylic Acids (and an alpha carbon connecting both groups; research about beta and gamma AAs are also worthwhile). So it should be pretty simple to design a new AA as per this definition. An easy way to generate new AA's (so that they are unique) is to find the heaviest AA designed till date and add another carbon atom somewhere (or at the other end); this is a general trick to keep on extending artificial AAs, but instability issues should be kept in mind. Another important reason for the limit of 22 AAs could be their size, which allows them to pass through the cellular membranes via AA-transporter proteins (larger/heavier AAs may have higher resistance to do that). Further just having a new AA wont matter much unless a new Aminoacyl-tRNA synthetase (aaRS, essential enzymes that attach specific amino acids to their corresponding tRNAs) is designed; humans have 20 different types of aaRS for attaching the 20 standard-essential AAs to their respective tRNAs. Worth Mentioning: Simply adding single carbons can be often ignored by ribosomes, and the translation machinery, better to add larger groups; in this regard adding Phosphoserine analogs with non-hydrolyzable bonds are used to study signaling. |
| 3. Why can’t you make a 2000bp gene via direct oligo synthesis? | - Per step error increases exponentially over thousands of cycles, making such a long synthesis impossible - Longer chains on solid supports block reagent diffusion, dropping coupling efficiency - Additionally, extremely large quantities of chemicals will be required for the steps (which are performed in batches) |
| ~from George Church: | |
| 1. What are the 10 essential amino acids in all animals, and how does this affect your view of the “Lysine Contingency”? | 10 essential amino acids required by most animals are: - Histidine - Isoleucine - Leucine - Lysine - Methionine - Phenylalanine - Threonine - Tryptophan - Valine - Arginine The "Lysine Contingency" from Jurassic Park (1993) was related to genetically engineered dinosaurs unable to synthesise lysine, making them dependent on other lysine sources (thereby making them dependent on humans to feed them Lysine). This is actually not possible as Lysine is already available in meat/fish/grains, etc and even in many single-celled organisms. Thus the dinosaurs could actually still get Lysine from their prey; herbivorous dinosaurs can also obtain Lysine through microbial gut fermentation (through micro-organisms within their guts; it would be impossible for no microbiota to exists as then the digestive system would collapse; it would be another interesting project to understand the consequences of removing all microbiota from a healthy gut of a mouse and seeing the consequences, both computationally via metabolic pathway analysis as well as experimentally). |
| Question 1 | First, describe a biological engineering application or tool you want to develop and why. This could be inspired by an idea for your HTGAA class project and/or something for which you are already doing in your research, or something you are just curious about. |
| Answer 1 | All living cells perform cell division; however, every cell division causes telomere shortening (Telomeres are protective caps at the ends of chromosomes). The limit of the number of cell divisions till a safe limit, such that no useful information is lost (directly from the DNA; telomeres still get shortened in the process), is known as the Hayflick Limit (discovered by Leonard Hayflick in the 1960s).![]() The process of telomere shortening/attrition is one of the (currently 13) Hallmarks of Ageing; therefore, understanding how to increase this limit will be a game-changer. Scientists & Researchers have been trying to do this using different techniques; the purpose of this HTGAA Individual Project is to suggest a few novel methods and try to understand how implementing them fares wrt. to other methods, as well as understand the bio-technical nuances/problems which might occur due to these changes in the DNA, subsequently, during cell division stages… The methods to be explored include:
|
| Question 2 | Next, describe one or more governance/policy goals related to ensuring that this application or tool contributes to an “ethical” future, like ensuring non-malfeasance (preventing harm). Break big goals down into two or more specific sub-goals. Below is one example framework (developed in the context of synthetic genomics) you can choose to use or adapt, or you can develop your own. The example was developed to consider policy goals of ensuring safety and security, alongside other goals, like promoting constructive uses, but you could propose other goals for example, those relating to equity or autonomy. |
| Answer 2 | To ensure the technology does not cause disruption in the evolutionary process of species, biosystems, etc. or allow the development of bioweapons, a few governance or policy goals are suggested.
|
| Question 3 | Next, describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”). Try to outline a mix of actions (e.g. a new requirement/rule, incentive, or technical strategy) pursued by different “actors” (e.g. academic researchers, companies, federal regulators, law enforcement, etc). Draw upon your existing knowledge and a little additional digging, and feel free to use analogies to other domains (e.g. 3D printing, drones, financial systems, etc.). • Purpose: What is done now and what changes are you proposing? • Design: What is needed to make it “work”? (including the actor(s) involved - who must opt-in, fund, approve, or implement, etc) • Assumptions: What could you have wrong (incorrect assumptions, uncertainties)? • Risks of Failure & “Success”: How might this fail, including any unintended consequences of the “success” of your proposed actions? |
| Answer 3 |
|
| Question 4 | Next, score (from 1–3, with 1 as best, or n/a) each of your governance actions against your rubric of policy goals: | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Answer 4 |
|
| Question 5 | Last, drawing upon this scoring, describe which governance option, or combination of options, you would prioritize, and why. Outline any trade-offs you considered as well as assumptions and uncertainties. For this, you can choose one or more relevant audiences for your recommendation, which could range from the very local (e.g. to MIT leadership or Cambridge Mayoral Office) to the national (e.g. to President Trump or the head of a Federal Agency) to the international (e.g. to the United Nations Office of the Secretary-General, or the leadership of a multinational firm or industry consortia). These could also be one of the “actor” groups in your matrix. |
| Answer 5 | Many of the governance opinions suggested hereinabove are already practised in some or the other form (in varying intensities)[7] to prevent biowarfare. However, with the advent of powerful AI infrastructure allowing real-time decision-making, integration of the proposed Knowledge Tree with a continuous data stream from detection units is a promising future (although enhancing cybersecurity risks), allowing immediate detection of environmental contamination at a wider level than previously possible. Furthermore, an international decentralised governing framework of the technology development direction by the scientific community itself is suggested to prevent misuse and/or proliferation. In this regard, a combination of the Technology-Knowledge Graph of participating members, the establishment of a decentralised Oversight Body, and the leveraging of state-of-the-art biologics detection systems, along with real-time data analysis for immediate threat perception through autonomous (AI-enabled, with human in the loop) decision-making, is key towards the development of a tight-knit, trustworthy and unbiased ecosystem. |
| Question 6 | Reflecting on what you learned and did in class this week, outline any ethical concerns that arose, especially any that were new to you. Then propose any governance actions you think might be appropriate to address those issues. This should be included on your class page for this week. |
| Answer 6 | Philosophically speaking, the class focused on why D/Acc [8] (i.e. cautiously moving towards technological progress, ensuring existing or in-research technologies cannot cause near-doomsday events or something even close) is more important than E/Acc[9] (a techno-optimistic utopian idea of allowing unrestricted technological progress). For my individual project idea, the ultimate goal is to test the suggested methods on embryos of smaller organisms (such as worms, flies, and mice). The final implementation in larger organisms and humans needs to be handled extremely carefully. Governance mechanisms must ensure that this does not cascade to humans until the holistic, deep after-effects of such methods are well understood; these mechanisms should intend to extend our current understanding of ripple/butterfly effects across massive timescales, e.g. how much of the chromatic/DNA/genetic edits are actually inherited (if at all) and what could be the evolutionary impact of the same. |
References
- John Hopkins Medicine, PROTEIN CONTENT OF COMMON FOODS, 2019. https://www.hopkinsmedicine.org/-/media/bariatrics/nutrition_protein_content_common_foods.pdf
- Amino Acids, Wikipedia. https://en.wikipedia.org/wiki/Amino_acid
- Nuclear Threat Initiative, NTI | bio proposes new strategies to prevent bioweapons, Dec, 2024. https://www.nti.org/news/nti-bio-proposes-new-solutions-to-prevent-bioweapons-development-and-use/
- Zimmer, C., Creating ‘mirror life’ could be disastrous, scientists warn, Scientific American, Dec, 2024. https://www.scientificamerican.com/article/creating-mirror-life-could-be-disastrous-scientists-warn/
- Hashemi, S., Scientists weigh the risks of 'mirror life,' synthetic molecules with a reverse version of life's building blocks, Smithsonian Magazine, Sep, 2025. https://www.smithsonianmag.com/smart-news/scientists-weigh-the-risks-of-mirror-life-synthetic-molecules-with-a-reverse-version-of-lifes-building-blocks-180987360/
- Ahmad Reza Rezaei, Emergence of techniques to combat biological warfare during and after COVID-19, Preprints.org, Nov, 2024. https://www.preprints.org/manuscript/202411.1220
- Gronvall, G. K., Prevention of the development or use of biological weapons, Health Security, 2017. https://doi.org/10.1089/hs.2016.0096
- Defensive accelerationism, EverybodyWiki Bios & Wiki, Feb, 2025. https://en.everybodywiki.com/Defensive_Accelerationism
- Effective accelerationism, Wikipedia, Jan, 2026. https://en.wikipedia.org/wiki/Effective_accelerationism
Answers to the Homework Assignment Questions:
| Questions | Answers |
|---|---|
| ~from Professor Jacobson: | |
| 1. Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome? How does biology deal with that discrepancy? | DNA polymerase has a raw error rate of approximately 10-4-10-5 errors per nucleotide added; this can cause high errors when compared to the ~3 × 109 base pairs of the human genome, as this can introduce thousands of mutations per cell division. This discrepancy is tackled through multiple layers of error control, including polymerase proofreading, post-replication mismatch repair, and cell-cycle checkpoints or apoptosis that eliminate heavily damaged cells, reducing the effective mutation rate to ~10-9–10-10 per base per division. |
| 2. How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest? | An average human protein is ~300 amino acids, and each amino acid is encoded by 1–6 synonymous codons (let's take an average of ~3 codons per amino acid). This makes the number of possible DNA sequences encoding the same protein roughly ≈ 3300 ≈ 10143 possible nucleotide sequences. In practice, synonymous codons can affect translation dynamics and mRNA stability; rare codons affect translation speed & tRNA bias, slowing ribosomes (waiting for low-abundance tRNAs). |
| ~from Dr. LeProust: | |
| 1. What’s the most commonly used method for oligo synthesis currently? | Phosphoramidite solid-phase synthesis seems the most commonly used method for oligonucleotide (oligo) synthesis currently, as it is an automated chemical process that builds oligonucleotides nucleotide-by-nucleotide on a solid support. |
| 2. Why is it difficult to make oligos longer than 200nt via direct synthesis? | Direct chemical synthesis of oligonucleotides longer than 200nt is extremely difficult due to cumulative errors; even a 1% failure per step eliminates >90% of the desired product by 200nt due to error accumulation. |
| 3. Why can’t you make a 2000bp gene via direct oligo synthesis? | - Per step error increases exponentially over thousands of cycles, making such a long synthesis impossible - Longer chains on solid supports block reagent diffusion, dropping coupling efficiency - Additionally, extremely large quantities of chemicals will be required for the steps (which are performed in batches) |
| ~from George Church: | |
| 1. What are the 10 essential amino acids in all animals, and how does this affect your view of the “Lysine Contingency”? | 10 essential amino acids required by most animals are: - Histidine - Isoleucine - Leucine - Lysine - Methionine - Phenylalanine - Threonine - Tryptophan - Valine - Arginine The "Lysine Contingency" from Jurassic Park (1993) was related to genetically engineered dinosaurs unable to synthesise lysine, making them dependent on other lysine sources (thereby making them dependent on humans to feed them Lysine). This is actually not possible as Lysine is already available in meat/fish/grains, etc and even in many single-celled organisms. Thus the dinosaurs could actually still get Lysine from their prey; herbivorous dinosaurs can also obtain Lysine through microbial gut fermentation (through micro-organisms within their guts; it would be impossible for no microbiota to exists as then the digestive system would collapse; it would be another interesting project to understand the consequences of removing all microbiota from a healthy gut of a mouse and seeing the consequences, both computationally via metabolic pathway analysis as well as experimentally). |
