Homework

Weekly homework submissions:

Week 1 HW: Principles and Practices

1. Biological engineering application or tool to develop and why

Idea

Mi idea its a biological biosensor designed to detect early changes in dependence on the molecular chaperone HSP90 in KIT mutant gastrointestinal stromal tumors (GIST), as functional indicators of cellular adaptation preceding the development of resistance to imatinib.

Figure 1. Conceptual schematic of the proposed biological biosensor. Image generated using AI for graphical representation purposes only.

Gastrointestinal Stromal Tumors (GIST)

Gastrointestinal stromal tumors (GIST) are rare mesenchymal neoplasms but represent the most common subtype of sarcoma of the gastrointestinal tract. The majority of GISTs harbor activating mutations in the receptor tyrosine kinases KIT or PDGFRA, which has enabled the development of targeted therapies such as imatinib. Despite these advances, progression to metastatic disease remains a major clinical challenge, even among patients who share similar mutational profiles (Hemming et al., 2018a; Antonescu et al., s. f.).

Imatinib and KIT Mutations

Imatinib mesylate is a selective tyrosine kinase inhibitor that binds to the ATP binding site of receptor tyrosine kinases, thereby blocking their phosphorylation and downstream proliferative and anti apoptotic signaling cascades. In GIST, its primary therapeutic targets are KIT and, in specific subgroups, PDGFRA, resulting in inhibition of constitutive oncogenic signaling and induction of cell cycle arrest and apoptosis (Fletcher & Rubin, 2007). This approach represented one of the earliest successes of molecularly targeted therapy in oncology, leading to substantial improvements in survival in patients with advanced or metastatic disease (Fletcher & Rubin, 2007).

Approximately 70–85% of GISTs carry activating mutations in KIT, most frequently in exon 11, followed by exons 9, 13, and 17. These mutational subtypes strongly influence tumor biology and therapeutic response to imatinib. Tumors harboring exon 11 mutations exhibit the highest response rates, whereas other mutational subtypes often require dose escalation or develop secondary resistance through additional kinase domain mutations, ultimately leading to clinical progression after an initial response (Fletcher & Rubin, 2007; Killock, 2022).

Figure 2. Molecular structure of imatinib, highlighting its role as a tyrosine kinase inhibitor in KIT-driven tumors and the biological context in which resistance mechanisms may emerge.

Role of HSP90 in GIST and Therapeutic Resistance

Heat shock protein 90 (HSP90) is a molecular chaperone essential for the stability and functional activity of multiple oncogenic proteins. In GIST, KIT both wild type and mutant critically depends on HSP90 to maintain its active conformation. Inhibition of HSP90 results in destabilization and proteasomal degradation of KIT, including variants harboring mutations associated with imatinib resistance, positioning HSP90 as an alternative or complementary therapeutic target to conventional tyrosine kinase inhibitors (NIHMS212509; Frontiers in Immunology, 2024).

In addition, HSP90 is involved in biological processes linked to tumor progression and therapeutic resistance, which has driven interest in combination strategies for patients with advanced or refractory GIST (Frontiers in Immunology, 2024).

Acquired Resistance to Imatinib: A Central Clinical Problem

Despite the initial efficacy of imatinib in most patients with GIST, a substantial proportion eventually develops acquired resistance, leading to tumor progression and limited therapeutic options. This resistance is driven by heterogeneous molecular mechanisms that cannot always be explained solely by the emergence of new KIT or PDGFRA mutations, complicating both prediction and timely clinical monitoring.

Biosensor Design Concept

The proposed biosensor is conceived as a functional molecular sensor designed to detect cellular states associated with acquired resistance to imatinib in GIST, rather than relying solely on static mutational information. While current clinical stratification primarily focuses on the presence or absence of mutations in KIT or PDGFRA, resistance to imatinib frequently arises through dynamic molecular adaptations that are not fully captured by genomic profiling alone (Fletcher & Rubin, 2007; Killock, 2022). This biosensor aims to bridge that gap by translating complex intracellular signaling states into a measurable and interpretable output, reflecting functional tumor behavior beyond mutational status.

Conceptual Design

At a conceptual level, the biosensor would operate by coupling a biological recognition element sensitive to KIT associated signaling activity or HSP90 dependent protein stability to a reporter output that reflects the functional response of the cell to imatinib exposure. Rather than detecting specific mutations, the sensor would respond to changes in intracellular conditions indicative of loss of KIT dependency, activation of alternative survival pathways, or increased reliance on molecular chaperones such as HSP90, all of which have been implicated in imatinib resistance in GIST (Fletcher & Rubin, 2007; NIHMS212509; Frontiers in Immunology, 2024).

Need for New Functional Analytical Tools

In this context, there is an unmet need for tools capable of capturing the functional state of tumor cells and their dependence on KIT-associated oncogenic pathways, beyond static mutational characterization. A biosensor designed to detect dynamic changes in molecular activity related to imatinib sensitivity or resistance could complement current genomic approaches and provide a pathway toward earlier identification of therapeutic resistance.

2. Governance/policy goals related to ensuring that this application or tool contributes to an “ethical” future, like ensuring non-malfeasance (preventing harm).

Clinical safety and non-malfeasance
- How to ensure the biosensor is used strictly to support cancer treatment and does not cause harm? -How to prevent its use as a standalone diagnostic tool, and ensure that clinical decisions are not made solely based on biosensor readouts without proper medical interpretation?
Responsible clinical integration
- How should the biosensor be integrated into clinical workflows so that it provides additional biological information rather than replacing clinician judgment?
- Who is authorized to interpret the results, and under what conditions should the data influence treatment decisions?
Equitable access
- Who will have access to this technology?
- How can barriers related to cost, infrastructure, or healthcare disparities be minimized so that the benefits of early adaptation detection are available to a broader patient population and not only to a privileged few?
Research integrity and prevention of misuse
- How can the biosensor be governed to ensure it is used to better understand KIT-mutant tumor biology and mechanisms of drug adaptation, rather than being applied in ways that could intentionally or unintentionally cause harm?

3. Different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”)

Governance Action 1: Clinical use restriction policy
Purpose: Currently, emerging biosensors may be interpreted as direct indicators of treatment response or resistance. This action proposes restricting the use of the biosensor strictly as a supportive clinical and research tool, not as a standalone diagnostic or decision making instrument.
Design: This would require hospitals and clinical research institutions to implement internal guidelines stating that biosensor outputs must be interpreted by trained oncologists or molecular pathologists. Ethics committees and hospital review boards would approve its use, and researchers would include clear disclaimers in publications and clinical protocols.
Assumptions: This action assumes that clinicians will follow institutional guidelines and that proper training will be available. It also assumes that misuse mainly occurs at the interpretation stage rather than during data generation.
Risks of Failure & “Success”: If the policy fails, clinicians might still rely too heavily on biosensor readouts. If it succeeds too well, overly strict rules could slow down clinical adoption or discourage innovation, delaying potential patient benefits.

Governance Action 2: Mandatory validation and transparency requirements (technical strategy)
Purpose: At present, early biomarkers may be applied before their limitations are fully understood. This action proposes requiring rigorous validation and transparency regarding uncertainty, sensitivity, and limitations of the biosensor.
Design: Academic researchers and biotech developers would be required to publish validation data, uncertainty ranges, and known limitations alongside biosensor results. Regulatory agencies or funding bodies could mandate this as a condition for approval or funding.
Assumptions: This assumes that validation metrics can adequately capture real-world biological variability and that transparency will lead to more responsible interpretation rather than confusion.
Risks of Failure & “Success”: Failure could occur if validation data are incomplete or misleading. If successful, excessive emphasis on uncertainty might reduce clinician confidence or limit the biosensor’s perceived usefulness.

Governance Action 3: Equity oriented access and deployment incentives
Purpose: Advanced molecular tools often benefit only well-resourced institutions. This action proposes promoting equitable access so the biosensor does not widen existing healthcare disparities.
Design: Public research institutions, funding agencies, or non-profit organizations could subsidize deployment in public hospitals or low-resource settings. Open research protocols or cost-reduction strategies could be encouraged at early development stages.
Assumptions: This assumes that cost and infrastructure are the main barriers to access, and that equitable deployment is feasible once the technology is validated.
Risks of Failure & “Success”: If unsuccessful, the technology may remain inaccessible to most patients. If overly successful, rapid deployment without adequate training or infrastructure could increase misinterpretation or misuse

4. Score each of the governance actions against the rubric of policy goals.

Does the option:	Option 1	Option 2	Option 3
Enhance Biosecurity
• By preventing incidents	1	1	3
• By helping respond	2	1	n/a
Foster Lab Safety
• By preventing incident	2	1	n/a
• By helping respond	3	1	n/a
Protect the environment
• By preventing incidents	n/a	n/a	n/a
• By helping respond	n/a	n/a	n/a
Other considerations
• Minimizing costs and burdens to stakeholders	1	2	3
• Feasibility?	1	1	2
• Not impede research	1	2	2
• Promote constructive applications	2	1	1

5. Last, drawing upon this scoring, describe which governance option, or combination of options, you would prioritize, and why. Outline any trade-offs you considered as well as assumptions and uncertainties.

Based on the scoring, I would prioritize a combination of Option 1 (clinical-use restriction policy) and Option 2 (mandatory validation and transparency requirements), as these most directly support non-malfeasance by preventing harmful clinical misuse and reducing misinterpretation of biosensor outputs. While Option 3 (equitable access incentives) is ethically important, prioritizing it too early could increase risks if the technology is deployed before sufficient validation. This approach assumes that clinicians and researchers will follow institutional guidelines and that validation data will be communicated clearly. A key trade-off is that stricter rules and validation requirements may slow adoption and increase administrative burden, but this is justified to protect patient safety. This recommendation is primarily directed toward academic research institutions, clinical research hospitals, and institutional review boards (IRBs), which play a central role in approving, overseeing, and guiding the ethical use of emerging biomedical technologies. This exercise also highlighted the ethical risk of over-relying on early biological signals, even when technologies are developed with good intentions, as well as the tension between promoting access and ensuring safety. As a result, governance actions that clearly define scope of use, emphasize transparency, and support phased deployment are essential to prevent unintended harm.

Questions from lecture’s 2

Homework Questions from Professor Jacobson: [Lecture 2 slides]

Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?

R/:DNA polymerase is the enzyme that copies DNA in cells. It is very accurate, but not perfect. It usually makes one mistake every 100 million to 10 billion bases (DNA letters). The human genome is about 3 billion bases long, so if there were no corrections, we could get a few errors each time the genome is copied. But biology has special systems that fix those mistakes. These systems include proofreading (checking as the DNA is copied) and DNA repair mechanisms. Thanks to these, most errors are fixed, and the final copy is very accurate.

How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?

R/: There are many different ways to write the DNA code for a human protein. That’s because the genetic code is redundant most amino acids can be coded by more than one set of three DNA letters (called codons). For example, the amino acid leucine has six different codons. So, a protein with 300 amino acids can have many thousands of possible DNA sequences that all make the same protein. But in practice, not all of these versions work well. Some codons are rare and slow down protein production. Some DNA sequences can fold into weird shapes that block the process. Also, each organism has codon preferences, meaning it uses some codons more than others. That’s why scientists choose special codons when they want the protein to be made properly in a specific cell.

Homework Questions from Dr. LeProust: [Lecture 2 slides]

What’s the most commonly used method for oligo synthesis currently?

R/: The most common way to make short pieces of DNA (called oligos) is by using a chemical method that adds one DNA letter at a time. This method is called phosphoramidite synthesis. It’s like building a word by adding one letter at a time, very carefully. Scientists do this on a small surface in the lab. It works well for making short DNA pieces, which are used in many experiments.

Why is it difficult to make oligos longer than 200nt via direct synthesis?

R/: It’s hard to make DNA oligos longer than 200 bases because the process isn’t perfect. Each time a new DNA letter is added, there’s a small chance it goes wrong or doesn’t stick. If you’re only adding a few letters, most of the strands are correct. But if you try to add 200 letters, those small mistakes add up and many of the strands end up incomplete or with errors. That’s why longer DNA pieces are usually made by putting together shorter, more accurate pieces.

Why can’t you make a 2000bp gene via direct oligo synthesis?

R/: You can’t make a 2000bp gene directly with oligo synthesis because the process isn’t accurate enough for something that long. When scientists make DNA, they add one letter at a time, and each step has a small chance of making a mistake. For short pieces of DNA (like 100–200 bases), the process works well. But if you try to make something as long as 2000 bases all at once, too many mistakes happen, and most of the DNA ends up broken or wrong. So instead, scientists make shorter pieces and then join them together to build the full gene.

Homework Question from Professor George Church: [Lecture 2 slides]

Choose ONE of the following three questions to answer; and please cite AI prompts or paper citations used, if any.

[Using Google & Prof. Church’s slide #4] What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?

R/: The 10 essential amino acids in all animals are: histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, valine, and arginine (which is considered essential in some conditions, like during growth or illness). These amino acids are called “essential” because animals cannot make them on their own and must obtain them from food sources. The concept of the “Lysine Contingency,” as described in Professor George Church’s lecture (slide #4), is a biosafety strategy in synthetic biology. It involves engineering organisms to be dependent on lysine, meaning they cannot survive without an external supply of it. This acts as a safety switch if the organism escapes into the environment, where lysine is not available, it cannot grow or survive. Understanding that lysine is essential and not naturally produced by the organism itself makes this strategy seem very effective and logical. It adds a layer of control to prevent genetically modified organisms from spreading outside the lab, which is an important concern in biotechnology research.

Sources:

Prof. George Church – Lecture 2, Slide #4: “The Genetic Code”
Wu, G. (2013). Functional amino acids in nutrition and health. NCBI Bookshelf. https://www.ncbi.nlm.nih.gov/books/NBK234922/
Mayo Clinic. What are essential amino acids? https://www.mayoclinic.org/healthy-lifestyle/nutrition-and-healthy-eating/expert-answers/protein/faq-20058354

Week 2 HW: DNA read write and edit

Part 1: Benchling & In-silico Gel Art

The digest sequence with the digestor enzymes gived

Figure 1. Simulated agarose gel showing the LAMCG plasmid digested with different restriction enzymes.

Figure 2. Screenshot of the online tool used to simulate restriction digests and generate the virtual gel pattern for LAMCG.

Testing update

I was arranging the bands to try to form the letter “M.”

Part 3: DNA Design Challenge

3.1. Choose your protein.

I choose protein HSP90

Original sequence: *>sp|P07900|HS90A_HUMAN Heat shock protein HSP 90-alpha OS=Homo sapiens OX=9606 GN=HSP90AA1 PE=1 SV=5 MPEETQTQDQPMEEEEVETFAFQAEIAQLMSLIINTFYSNKEIFLRELISNSSDALDKIRYESLTDPSKLDSGKELHINLIPNKQDRTLTIVDTGIGMTKADLINNLGTIAKSGTKAFMEALQAGADISMIGQFGVGFYSAYLVAEKVTVITKHNDDEQYAWESSAGGSFTVRTDTGEPMGRGTKVILHLKEDQTEYLEERRIKEIVKKHSQFIGYPITLFVEKERDKEVSDDEAEEKEDKEEEKEKEEKESEDKPEIEDVGSDEEEEKKDGDKKKKKKIKEKYIDQEELNKTKPIWTRNPDDITNEEYGEFYKSLTNDWEDHLAVKHFSVEGQLEFRALLFVPRRAPFDLFENRKKKNNIKLYVRRVFIMDNCEELIPEYLNFIRGVVDSEDLPLNISREMLQQSKILKVIRKNLVKKCLELFTELAEDKENYKKFYEQFSKNIKLGIHEDSQNRKKLSELLRYYTSASGDEMVSLKDYCTRMKENQKHIYYITGETKDQVANSAFVERLRKHGLEVIYMIEPIDEYCVQQLKEFEGKTLVSVTKEGLELPEDEEEKKKQEEKKTKFENLCKIMKDILEKKVEKVVVSNRLVTSPCCIVTSTYGWTANMERIMKAQALRDNSTMGYMAAKKHLEINPDHSIIETLRQKAEADKNDKSVK DLVILLYETALLSSGFSLEDPQTHANRIYRMIKLGLGIDEDDPTADDTSAAVTEEMPPLEGDDDTSRMEEVD

3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.

reverse translation of sp|P07900|HS90A_HUMAN Heat shock protein HSP 90-alpha OS=Homo sapiens OX=9606 GN=HSP90AA1 PE=1 SV=5 to a 2196 base sequence of most likely codons. atgccggaagaaacccagacccaggatcagccgatggaagaagaagaagtggaaacctttgcgtttcaggcggaaattgcgcagctgatgagcctgattattaacaccttttatagcaacaaagaaatttttctgcgcgaactgattagcaacagcagcgatgcgctggataaaattcgctatgaaagcctgaccgatccgagcaaactggatagcggcaaagaactgcatattaacctgattccgaacaaacaggatcgcaccctgaccattgtggataccggcattggcatgaccaaagcggatctgattaacaacctgggcaccattgcgaaaagcggcaccaaagcgtttatggaagcgctgcaggcgggcgcggatattagcatgattggccagtttggcgtgggcttttatagcgcgtatctggtggcggaaaaagtgaccgtgattaccaaacataacgatgatgaacagtatgcgtgggaaagcagcgcgggcggcagctttaccgtgcgcaccgataccggcgaaccgatgggccgcggcaccaaagtgattctgcatctgaaagaagatcagaccgaatatctggaagaacgccgcattaaagaaattgtgaaaaaacatagccagtttattggctatccgattaccctgtttgtggaaaaagaacgcgataaagaagtgagcgatgatgaagcggaagaaaaagaagataaagaagaagaaaaagaaaaagaagaaaaagaaagcgaagataaaccggaaattgaagatgtgggcagcgatgaagaagaagaaaaaaaagatggcgataaaaaaaaaaaaaaaaaaattaaagaaaaatatattgatcaggaagaactgaacaaaaccaaaccgatttggacccgcaacccggatgatattaccaacgaagaatatggcgaattttataaaagcctgaccaacgattgggaagatcatctggcggtgaaacattttagcgtggaaggccagctggaatttcgcgcgctgctgtttgtgccgcgccgcgcgccgtttgatctgtttgaaaaccgcaaaaaaaaaaacaacattaaactgtatgtgcgccgcgtgtttattatggataactgcgaagaactgattccggaatatctgaactttattcgcggcgtggtggatagcgaagatctgccgctgaacattagccgcgaaatgctgcagcagagcaaaattctgaaagtgattcgcaaaaacctggtgaaaaaatgcctggaactgtttaccgaactggcggaagataaagaaaactataaaaaattttatgaacagtttagcaaaaacattaaactgggcattcatgaagatagccagaaccgcaaaaaactgagcgaactgctgcgctattataccagcgcgagcggcgatgaaatggtgagcctgaaagattattgcacccgcatgaaagaaaaccagaaacatatttattatattaccggcgaaaccaaagatcaggtggcgaacagcgcgtttgtggaacgcctgcgcaaacatggcctggaagtgatttatatgattgaaccgattgatgaatattgcgtgcagcagctgaaagaatttgaaggcaaaaccctggtgagcgtgaccaaagaaggcctggaactgccggaagatgaagaagaaaaaaaaaaacaggaagaaaaaaaaaccaaatttgaaaacctgtgcaaaattatgaaagatattctggaaaaaaaagtggaaaaagtggtggtgagcaaccgcctggtgaccagcccgtgctgcattgtgaccagcacctatggctggaccgcgaacatggaacgcattatgaaagcgcaggcgctgcgcgataacagcaccatgggctatatggcggcgaaaaaacatctggaaattaacccggatcatagcattattgaaaccctgcgccagaaagcggaagcggataaaaacgataaaagcgtgaaagatctggtgattctgctgtatgaaaccgcgctgctgagcagcggctttagcctggaagatccgcagacccatgcgaaccgcatttatcgcatgattaaactgggcctgggcattgatgaagatgatccgaccgcggatgataccagcgcggcggtgaccgaagaaatgccgccgctggaaggcgatgatgataccagccgcatggaagaagtggat

3.3. Codon optimization.

ATGCCGGAAGAAACCCAGACCCAGGATCAGCCGATGGAAGAAGAAGAAGTGGAAACCTTTGCGTTTCAGGCAGAAATTGCGCAGCTGATGTCTCTGATTATTAATACCTTTTATAGCAATAAAGAAATCTTCCTGCGTGAACTGATTAGCAACAGCAGCGATGCACTGGATAAAATTCGCTATGAATCGCTGACCGATCCGAGCAAACTGGATAGCGGCAAAGAACTGCATATTAATCTGATTCCGAACAAACAGGATCGCACCCTGACCATTGTGGATACCGGCATTGGCATGACCAAAGCGGATCTGATTAATAATCTGGGCACCATTGCCAAATCGGGCACCAAAGCCTTTATGGAAGCCCTGCAGGCCGGCGCGGATATTAGCATGATTGGCCAGTTCGGCGTGGGTTTCTATAGCGCCTATCTGGTGGCCGAAAAAGTGACCGTTATCACCAAACATAATGATGATGAACAGTATGCGTGGGAAAGCTCCGCGGGCGGCAGCTTTACCGTGCGCACCGATACCGGCGAACCGATGGGCCGCGGCACGAAAGTTATTCTGCACCTGAAAGAAGATCAGACCGAGTACTTAGAAGAACGTCGTATTAAAGAAATTGTGAAAAAACATAGCCAGTTCATCGGCTATCCGATCACCCTGTTCGTGGAAAAAGAACGCGATAAAGAAGTTAGCGATGATGAAGCGGAAGAAAAAGAAGATAAAGAAGAAGAAAAAGAGAAGGAAGAAAAAGAGAGCGAAGATAAACCGGAAATTGAAGATGTGGGCTCGGATGAAGAAGAAGAAAAAAAAGATGGCGATAAAAAAAAGAAAAAAAAAATTAAAGAAAAATACATTGATCAGGAAGAACTGAATAAGACCAAACCGATTTGGACCCGTAACCCGGATGACATTACCAACGAGGAATATGGCGAATTTTATAAAAGCCTGACCAACGATTGGGAAGATCACCTGGCGGTTAAACATTTTAGCGTGGAAGGCCAGCTGGAATTTCGCGCGCTGCTGTTCGTACCGCGCCGCGCCCCGTTTGATCTGTTTGAAAATCGCAAAAAAAAAAACAATATTAAACTGTATGTTCGCCGCGTCTTCATTATGGATAATTGCGAAGAACTGATTCCGGAATACCTGAACTTTATTCGCGGCGTGGTTGATAGCGAAGATCTGCCGCTGAACATTAGCCGCGAAATGCTGCAGCAGAGCAAAATTCTGAAAGTGATTCGCAAAAACCTGGTAAAAAAATGCCTGGAACTGTTTACCGAACTGGCGGAAGATAAAGAAAATTATAAAAAGTTTTATGAACAGTTTAGCAAAAACATTAAACTGGGCATTCATGAGGATAGCCAAAATCGTAAGAAACTGAGCGAACTGCTGCGCTACTATACCTCGGCGAGCGGCGATGAAATGGTGAGCCTGAAAGATTACTGTACCCGTATGAAAGAGAATCAGAAACATATTTATTACATCACCGGCGAAACTAAAGATCAGGTGGCGAATAGCGCTTTTGTGGAACGTCTGCGTAAACACGGCCTGGAAGTGATTTACATGATTGAACCGATTGATGAATATTGCGTGCAGCAGCTGAAAGAATTTGAAGGTAAAACCCTGGTTAGCGTTACCAAAGAAGGCCTGGAATTACCGGAAGATGAAGAAGAAAAAAAAAAACAGGAAGAAAAAAAAACCAAATTTGAAAATCTGTGCAAAATTATGAAAGATATTCTGGAAAAAAAAGTGGAAAAAGTGGTGGTCAGCAATCGCCTGGTGACCAGCCCGTGCTGCATTGTGACCAGCACCTACGGCTGGACCGCGAATATGGAACGTATTATGAAAGCGCAGGCCCTGCGCGACAATAGCACCATGGGCTACATGGCCGCGAAAAAACACCTGGAAATTAACCCGGATCACAGCATTATTGAAACCCTGCGTCAGAAAGCGGAAGCGGATAAAAACGATAAATCGGTTAAAGATCTGGTGATTCTGCTGTATGAAACCGCGCTGCTGAGCAGCGGCTTTAGCCTGGAAGATCCGCAGACCCATGCGAATCGCATTTATCGCATGATTAAACTGGGACTGGGTATTGATGAAGATGATCCGACCGCGGATGATACCAGTGCGGCGGTTACCGAAGAAATGCCGCCGCTGGAAGGCGATGATGACACCAGCCGCATGGAGGAAGTGGAT

3.4. You have a sequence! Now what?

What technologies could be used to produce this protein from your DNA? Describe in your words the DNA sequence can be transcribed and translated into your protein. You may describe either cell-dependent or cell-free methods, or both.

R/:Once I have the DNA sequence, I would use a cell-free expression system to produce the protein. In this approach, the DNA is added directly to a reaction mixture that contains all the molecular components required for transcription and translation, such as RNA polymerase, ribosomes, nucleotides, amino acids, tRNAs, and necessary enzymes.

First, the DNA sequence is transcribed into mRNA by RNA polymerase within the reaction mixture. Then, ribosomes bind to the mRNA and translate it into the corresponding amino acid sequence, forming the protein. Because this process occurs in vitro (outside of living cells), it allows rapid protein production without the need for cell transformation, growth, or cloning into a host organism.

Cell-free systems are particularly useful for fast prototyping, testing gene constructs, or producing proteins that may be toxic to living cells. The entire process still follows the central dogma (DNA → RNA → Protein), but it happens in a controlled biochemical environment rather than inside a living cell.

3.5. How does it work in nature/biological systems?

R/:In nature, gene expression follows the central dogma of molecular biology. First, the DNA sequence of a gene is transcribed into messenger RNA (mRNA) by RNA polymerase. In eukaryotic cells, the initial transcript (pre-mRNA) undergoes processing, including 5’ capping, polyadenylation, and splicing to remove introns. Once mature mRNA is formed, it is transported to the cytoplasm, where ribosomes bind to it and translate the nucleotide sequence into an amino acid chain. Transfer RNAs (tRNAs) bring the appropriate amino acids according to codon–anticodon pairing. The resulting polypeptide then folds into its functional three-dimensional structure, sometimes with the help of molecular chaperones. This tightly regulated process ensures that proteins are produced at the right time, in the right amount, and in the appropriate cellular context.

Describe how a single gene codes for multiple proteins at the transcriptional level.

R/:A single gene can produce multiple proteins at the transcriptional level mainly through alternative splicing. In eukaryotic cells, genes are composed of exons and introns. After transcription, the initial RNA transcript (pre-mRNA) contains both regions. During RNA processing, introns are removed, and exons are joined together. However, the cell does not always combine exons in the same way. Different exons can be included or excluded, generating distinct mRNA transcripts from the same gene. These different mRNAs are then translated into different protein isoforms. This mechanism increases protein diversity without increasing the total number of genes in the genome.

Try aligning the DNA sequence, the transcribed RNA, and also the resulting translated Protein.

Part 4: Prepare a Twist DNA Synthesis Order

4.2. Build Your DNA Insert Sequence
Benchling ink: https://benchling.com/s/seq-52XZoIYMGmiV8rSoyWAZ?m=slm-6exDNpn6ZWIEA6nOots7

4.3 to 4.6

Part 5: DNA Read/Write/Edit

5.1 DNA Read

(i) What DNA would you want to sequence (e.g., read) and why? This could be DNA related to human health (e.g. genes related to disease research), environmental monitoring (e.g., sewage waste water, biodiversity analysis), and beyond (e.g. DNA data storage, biobank).

R/:I would like to sequence DNA from gastrointestinal stromal tumor (GIST) samples, specifically focusing on mutations in the KIT and PDGFRA genes. Mainly because KIT mutations cause the receptor to be constitutively activated, meaning it continuously signals cell proliferation even without an external signal. This is why sequencing this gene is important, understanding the specific mutation helps select the most effective existing treatment, such as tyrosine kinase inhibitors like imatinib, for each specific patient. Additionally, since mutations can vary between patients, sequencing improves the precision of medicine by allowing more personalized and targeted therapies.

(ii) In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why? R/: I would use Illumina Also answer the following questions:

Is your method first-, second- or third-generation or other? How so? R/: It’s second generation because it uses massively parallel sequencing, reading millions of fragments simultaneously, unlike Sanger which sequences one fragment at a time. What is your input? How do you prepare your input (e.g. fragmentation, adapter ligation, PCR)? List the essential steps. R/: The input it’s DNA from GIST samples.The indispensable steps are first fragmentation because we have a long DNA and illumin reads short fragments, after we do adapter ligation so the machine recognize each fragment and lastly a PCR amplification so we have enough material to read. What are the essential steps of your chosen sequencing technology, how does it decode the bases of your DNA sample (base calling)? R/:After preparation, the fragments are attached to a flow cell where cluster generation occurs each fragment is copied thousands of times to amplify the signal. Then, sequencing by synthesis begins: fluorescently labeled nucleotides are added one at a time, each base (A, T, C, G) emitting a different color. A camera captures an image after each incorporation, and the computer translates each color into a base this is called base calling. This process repeats for each position along the fragment What is the output of your chosen sequencing technology? R/:The output of Illumina sequencing is millions of short reads sequences of bases (A, T, C, G) typically 150-300 base pairs long. These reads are then aligned to a reference genome using bioinformatics tools to identify mutations in the KIT gene, such as substitutions or insertions/deletions, that may be driving GIST.

5.2 DNA Write

(i) What DNA would you want to synthesize (e.g., write) and why? These could be individual genes, clusters of genes or genetic circuits, whole genomes, and beyond. As described in class thus far, applications could range from therapeutics and drug discovery (e.g., mRNA vaccines and therapies) to novel biomaterials (e.g. structural proteins), to sensors (e.g., genetic circuits for sensing and responding to inflammation, environmental stimuli, etc.), to art (DNA origamis). If possible, include the specific genetic sequence(s) of what you would like to synthesize! You will have the opportunity to actually have Twist synthesize these DNA constructs! :)

R/: I would like to synthesize a genetic biosensor circuit designed to detect early increases in HSP90 dependence in KIT-mutant GIST cells. The biosensor would consist of a synthetic DNA sequence that, when inserted into tumor cells, monitors HSP90 activity levels and produces a detectable signal (such as a fluorescent protein) when levels rise above a threshold. This is clinically relevant because increased HSP90 dependence is an early indicator of cellular adaptation preceding imatinib resistance, meaning the biosensor could alert clinicians before full resistance develops, allowing earlier treatment adjustments. The specific sequence would include a HSP90-responsive promoter driving a reporter gene, designed to activate transcription proportionally to HSP90 activity levels in KIT-mutant cells.

(ii) What technology or technologies would you use to perform this DNA synthesis and why? R/:For synthesizing my biosensor genetic circuit, I would use array-based phosphoramidite synthesis, as offered by Twist Bioscience. This technology is ideal because it allows precise, high-throughput synthesis of specific designed sequences at relatively low cost, without needing a biological template.

Also answer the following questions:

What are the essential steps of your chosen sequencing methods? R/:The essential steps are: (1) the DNA sequence is designed computationally; (2) synthesis occurs on a chip where nucleotides are added one at a time chemically to growing DNA chains; (3) each nucleotide is protected by a chemical group that is removed before the next base is added, allowing controlled sequential addition; (4) the completed sequences are cleaved from the chip and assembled into longer constructs if needed through Gibson assembly or similar methods.

What are the limitations of your sequencing method (if any) in terms of speed, accuracy, scalability? R/:The main limitations are accuracy, errors can accumulate as the sequence gets longer, making synthesis of very long sequences challenging. Additionally, longer sequences are more expensive and slower to produce. Array-based synthesis improves scalability but error rates remain a concern, often requiring sequence verification after synthesis using Sanger or Illumina sequencing.

5.3 DNA Edit

(i) What DNA would you want to edit and why? In class, George shared a variety of ways to edit the genes and genomes of humans and other organisms. Such DNA editing technologies have profound implications for human health, development, and even human longevity and human augmentation. DNA editing is also already commonly leveraged for flora and fauna, for example in nature conservation efforts, (animal/plant restoration, de-extinction), or in agriculture (e.g. plant breeding, nitrogen fixation). What kinds of edits might you want to make to DNA (e.g., human genomes and beyond) and why?

R/:I would want to edit the KIT gene in GIST tumor cells using CRISPR-Cas9 technology. Specifically, I would correct the activating mutations in KIT that cause constitutive activation of the tyrosine kinase receptor, driving uncontrolled cell proliferation. By editing the mutated sequence back to its wildtype form, the receptor would only signal when appropriate, stopping uncontrolled tumor growth at its source rather than just blocking it pharmacologically as imatinib does. This approach is particularly compelling because imatinib resistance is a major clinical challenge in GIST patients eventually develop secondary mutations that render the drug ineffective. A CRISPR-based correction of the primary KIT mutation could offer a more permanent solution, potentially eliminating the tumor’s ability to develop resistance. Additionally, since different patients carry different KIT mutations, CRISPR could be personalized to target each patient’s specific mutation, aligning with the precision medicine approach.

(ii) What technology or technologies would you use to perform these DNA edits and why? Also answer the following questions:

How does your technology of choice edit DNA? What are the essential steps?

R/:I would use CRISPR-Cas9 to correct the activating KIT mutations in GIST tumor cells, as it allows precise, targeted editing of specific DNA sequences and can be personalized for each patient’s mutation. CRISPR-Cas9 edits DNA through the following steps: (1) the guide RNA (gRNA) directs the Cas9 protein to the specific mutated KIT sequence; (2) Cas9 creates a double-strand break at that exact location; (3) a correct DNA template is provided and the cell repairs the break using homology-directed repair (HDR), replacing the mutated sequence with the corrected wildtype sequence.

What preparation do you need to do (e.g. design steps) and what is the input (e.g. DNA template, enzymes, plasmids, primers, guides, cells) for the editing?

R/:The preparation steps include: (1) designing a gRNA that matches the specific KIT mutation of the patient — this is done computationally; (2) preparing the Cas9 protein or encoding it in a plasmid; (3) preparing a HDR template containing the correct wildtype KIT sequence; (4) delivering all components into the tumor cells via a viral vector or nanoparticles. The inputs are therefore: the gRNA, Cas9 enzyme, HDR DNA template, and the target tumor cells.

What are the limitations of your editing methods (if any) in terms of efficiency or precision?

R/:The main limitations are: (1) off-target edits Cas9 may cut unintended genomic locations causing unwanted mutations; (2) low HDR efficiency, especially in non-dividing cells; (3) delivery challenges getting CRISPR machinery efficiently into tumor cells in vivo remains technically difficult; (4) since KIT mutations vary between patients, a new gRNA must be designed for each case, making large-scale application complex.

Week 3 HW: Lab Automation

Generate an artistic design using the GUI at opentrons-art.rcdonovan.com.

Using the coordinates from the GUI, follow the instructions in the HTGAA26 Opentrons Colab to write your own Python script which draws your design using the Opentrons. You may use AI assistance for this coding — Google Gemini is integrated into Colab (see the stylized star bottom center); it will do a good job writing functional Python, while you probably need to take charge of the art concept. If you’re a proficient programmer and you’d rather code something mathematical or algorithmic instead of using your GUI coordinates, you may do that instead.

Google colab link: https://colab.research.google.com/drive/1CuzPdG5pSYIgSalc1o3C1FA-yUbnaW0_?usp=sharing

Post-Lab Questions

Find and describe a published paper that utilizes the Opentrons or an automation tool to achieve novel biological applications.

paper: GPCR signaling measurement and drug profiling with an automated live-cell microscopy system

pdf: https://pmc.ncbi.nlm.nih.gov/articles/PMC9994309/pdf/nihms-1878342.pdf

doi: https://doi.org/10.1021/acssensors.2c01341

This paper addresses a key limitation in GPCR signaling measurement: typically, signals can only be measured one at a time and at a single time point, making it impossible to observe multiple signaling pathways or their dynamics over time. The authors used the Opentrons to automate drug delivery and imaging, allowing them to measure various signaling pathways simultaneously and track their changes over time rather than only capturing a final readout. This automation enabled the discovery that GPR68 cAMP responses are pH-dependent in a kinetic way, and that Ogerin has an unexpected off-target effect at high concentrations through phosphodiesterase inhibition.

Write a description about what you intend to do with automation tools for your final project. You may include example pseudocode, Python scripts, 3D printed holders, a plan for how to use Ginkgo Nebula, and more. You may reference this week’s recitation slide deck for lab automation details.

While your description/project idea doesn’t need to be set in stone, we would like to see core details of what you would automate. This is due at the start of lecture and does not need to be tested on the Opentrons yet.

For my final project, I would automate the process of exposing GIST cells to different imatinib concentrations to observe how HSP90 dependence changes in response to the drug. I would automate this part because it requires testing many concentrations repeatedly, and automating it would reduce pipetting errors that could compromise the dilution series, while also saving time. Additionally, I would automate the biosensor signal readout across all wells, allowing me to track changes in HSP90 activity over time rather than capturing only a single final measurement

Week 4 HW: Protein design part I

Protein design part i

Part A. Conceptual Questions

Answer any NINE of the following questions from Shuguang Zhang: (i.e. you can select two to skip)

How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)

R/: Meat is approximately 20% protein by mass, so 500g of meat contains roughly 100g of protein. Since the average amino acid has a molecular weight of ~100 Daltons (100 g/mol), 100g of protein equals 1 mole of amino acids. Multiplying by Avogadro’s number gives approximately 6 × 10²³ molecules = 600 sextillion amino acid molecules in a single piece of meat.

Why do humans eat beef but do not become a cow, eat fish but do not become fish?

R/: When we digest meat, our digestive enzymes (pepsin, trypsin, chymotrypsin) completely break down all foreign proteins into their individual amino acid building blocks. These amino acids are chemically identical regardless of their source glycine from a cow is the same as glycine from a fish. Once absorbed into the bloodstream, our own DNA acts as the sole blueprint, directing cells to reassemble those amino acids into specifically human proteins. The organism’s identity is determined by its genetic instructions, not by the raw materials it consumes.

Why are there only 20 natural amino acids?

R/: The 20 natural amino acids represent an evolutionary solution that balances three factors: chemical diversity (they collectively cover all necessary properties charged, neutral, hydrophobic, hydrophilic, rigid, flexible), genetic coding capacity (the 4-letter DNA alphabet produces 64 possible codons, providing enough redundancy to encode exactly 20 amino acids reliably), and evolutionary stability (the genetic code was fixed ~3.5 billion years ago in the earliest cells, and changing it would be catastrophic for all life). Two exceptions selenocysteine and pyrrolysine exist in rare organisms, confirming that expansion is possible but extremely costly, having occurred only twice in nearly 4 billion years of evolution.

Can you make other non-natural amino acids? Design some new amino acids.

Where did amino acids come from before enzymes that make them, and before life started?

R/: Amino acids originated through abiotic (non-biological) chemistry on early Earth. The landmark Miller-Urey experiment (1953) demonstrated that passing electricity through a mixture of simple gases (methane, ammonia, hydrogen, and water vapor) mimicking early Earth’s atmosphere and lightning spontaneously produces amino acids. Additionally, amino acids have been discovered in carbonaceous meteorites such as the Murchison meteorite, indicating they form naturally throughout the universe via basic organic chemistry. Hydrothermal vents on the ocean floor also provide conditions where amino acids can form abiotically through mineral catalyzed reactions. This means amino acids predate life entirely they are a natural product of carbon chemistry under energetic conditions, and early life simply inherited and later optimized their production through enzymes.

If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?

R/: A D-amino acid α-helix would be left-handed. Normal α-helices built from L-amino acids are right-handed because the geometry of L-amino acids favors phi/psi backbone angles that produce righthanded coiling. D-amino acids are the exact mirror image of L-amino acids, so their backbone geometry favors the mirror-image conformation a left-handed helix. The structure is equally stable and retains the same hydrogen bonding pattern, but is the complete mirror image of a natural α-helix. This has been confirmed experimentally with synthetic D-peptides.

Can you discover additional helices in proteins?

Why are most molecular helices right-handed?

R/: Most biological helices are right-handed because life universally uses L-amino acids, whose backbone bond angles (phi/psi angles) favor right handed coiling as the lowest energy conformation. This is a direct consequence of homochirality life’s ancient, universal selection of L-amino acids over D-amino acids, which became permanently fixed early in evolution. The Ramachandran plot confirms that right-handed helical conformations occupy the most energetically favorable region for L-amino acid backbone geometry. The deeper origin of L-amino acid selection may involve asymmetric radiation from neutron stars, preferential adsorption on chiral mineral surfaces, or random chance reinforced by self-replication.

Why do β-sheets tend to aggregate?

R/:β-sheets aggregate due to two primary forces. First, the edge strands of β-sheets have unsatisfied hydrogen bond donors and acceptors along their backbone, which seek to form additional hydrogen bonds with the edges of neighboring β-sheets. Second, β-sheets frequently present hydrophobic surfaces on their faces; the hydrophobic effect drives these surfaces together to minimize unfavorable interactions with surrounding water molecules. These two forces combined with the geometric flatness and complementarity of β-sheets create highly stable aggregates. Aggregation follows nucleation dependent kinetics slow initial seed formation followed by rapid self-propagating growth which explains the progressive nature of amyloid diseases.

What is the driving force for β-sheet aggregation?

Why do many amyloid diseases form β-sheets?

R/:Amyloid diseases are characterized by β-sheet formation because β-sheets represent an extremely thermodynamically stable protein conformation. Unlike normal protein folding where hydrophobic regions are buried inside the structure stress conditions (aging, mutations, pH changes) cause proteins to partially unfold and expose their hydrophobic cores. These exposed regions spontaneously reorganize into β-sheets, as hydrogen bonds form along the protein backbone regardless of the specific amino acid sequence. Critically, β-sheets have flat, complementary surfaces that stack onto each other, allowing one misfolded protein to act as a template that recruits and converts neighboring proteins into the same conformation a self propagating cascade. The resulting fibers are insoluble and resistant to cellular degradation machinery, causing progressive accumulation and cell death. This mechanism underlies Alzheimer’s disease (amyloid-β and tau), Parkinson’s disease (α-synuclein), and prion diseases, among others.

Can you use amyloid β-sheets as materials?

R/:Yes, amyloid β-sheet fibers are increasingly recognized as valuable nanomaterials. Their properties as exceptional mechanical strength, chemical stability, nanoscale regularity, and spontaneous self-assembly make them attractive for multiple applications. Current research includes their use as nanoscaffolds for assembling metal nanoparticles into nanowires, drug delivery vehicles that protect and release therapeutics under controlled conditions, food-grade thickeners and emulsifiers derived from whey protein amyloids, and living materials through engineered bacterial curli fibers. Spider silk’s remarkable toughness is partially attributed to embedded β-sheet nanocrystals, inspiring synthetic fiber design. The central challenge is controlling assembly precisely distinguishing between pathological uncontrolled aggregation and useful directed self-assembly.

Design a β-sheet motif that forms a well-ordered structure.

Part B: Protein Analysis and Visualization

Pick any protein (from any organism) of your interest that has a 3D structure and answer the following questions:

Briefly describe the protein you selected and why you selected it.

Identify the amino acid sequence of your protein.

How long is it? What is the most frequent amino acid? You can use this Colab notebook to count the frequency of amino acids.

How many protein sequence homologs are there for your protein? Hint: Use Uniprot’s BLAST tool to search for homologs.

Does your protein belong to any protein family?

Identify the structure page of your protein in RCSB

When was the structure solved? Is it a good quality structure? Good quality structure is the one with good resolution. Smaller the better (Resolution: 2.70 Å)

Are there any other molecules in the solved structure apart from protein?

Does your protein belong to any structure classification family?

Open the structure of your protein in any 3D molecule visualization software:

PyMol Tutorial Here (hint: ChatGPT is good at PyMol commands)

Visualize the protein as “cartoon”, “ribbon” and “ball and stick”.

Color the protein by secondary structure. Does it have more helices or sheets?

Color the protein by residue type. What can you tell about the distribution of hydrophobic vs hydrophilic residues?

Visualize the surface of the protein. Does it have any “holes” (aka binding pockets)?

Week 5 HW: Protein design part II

week-05-hw-protein-design-part-ii

Part A: SOD1 Binder Peptide Design (From Pranam) Part 1: Generate Binders with PepMLM

Begin by retrieving the human SOD1 sequence from UniProt (P00441) and introducing the A4V mutation. original sequence:

sp|P00441|SODC_HUMAN Superoxide dismutase [Cu-Zn]

OS=Homo sapiens OX=9606 GN=SOD1 PE=1 SV=2

MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

A4V mutation:

MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Using the PepMLM Colab linked from the HuggingFace PepMLM-650M model card: https://colab.research.google.com/drive/1tOhdz-ZO91A0u--wk65WY69CSMcL0e70?usp=sharing
Generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence.
To your generated list, add the known SOD1-binding peptide FLYRWLPSRRGG for comparison.
Record the perplexity scores that indicate PepMLM’s confidence in the binders.

Binder Pseudo Perplexity

0 WHYPPVGAEHKE 19.688582

1 WRYPATAARWGX 7.422996

2 WRYPVVAAELWX 12.680934

3 WHYYVVGVAWKX 17.178792

FLYRWLPSRRGG

Part 2: Evaluate Binders with AlphaFold3 https://alphafoldserver.com/fold/79f46573a7da07d6

Navigate to the AlphaFold Server: alphafoldserver.com
For each peptide, submit the mutant SOD1 sequence followed by the peptide sequence as separate chains to model the protein-peptide complex.
Record the ipTM score and briefly describe where the peptide appears to bind. Does it localize near the N-terminus where A4V sits? Does it engage the β-barrel region or approach the dimer interface? Does it appear surface-bound or partially buried?
In a short paragraph, describe the ipTM values you observe and whether any PepMLM-generated peptide matches or exceeds the known binder.

https://alphafoldserver.com/fold/79f46573a7da07d6

For WHYPPVGAEHKE peptide:

ipTM = 0.28
pTM = 0.83

For WRYPATAARWGX peptide:

ipTM = 0.42
pTM = 0.87

For WRYPVVAAELWX peptide:

ipTM = 0.3
pTM = 0.83

For WHYYVVGVAWKX peptide:

ipTM = 0.42
pTM = 0.81

For FLYRWLPSRRGG peptide:

ipTM = 0.31
pTM = 0.83

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse https://huggingface.co/spaces/ChatterjeeLab/PeptiVerse For each PepMLM-generated peptide:

Paste the peptide sequence.
Paste the A4V mutant SOD1 sequence in the target field.
Check the boxes
- Predicted binding affinity
- Solubility
- Hemolysis probability
- Net charge (pH 7)
- Molecular weight

Across the five peptides evaluated, ipTM scores ranged from 0.28 to 0.42, all below the 0.5 threshold typically considered indicative of confident interface prediction, suggesting that none of the peptides form a highly stable predicted complex with A4V mutant SOD1. Interestingly, the two peptides with the highest ipTM scores (WRYPATAARWGG and WHYYVVGVAWKG, both 0.42) did not show the strongest predicted binding affinity in PeptiVerse — instead, WRYPVVAAELWG had the highest affinity (pKd 6.761) despite a lower ipTM of 0.30, indicating that structural confidence and predicted affinity do not perfectly correlate in this case. Notably, none of the peptides were predicted to be hemolytic or poorly soluble, making all of them therapeutically safe from a preliminary standpoint. The known binder FLYRWLPSRRGG performed modestly, with an ipTM of 0.31 and affinity of 5.968, and was outperformed by several PepMLM-generated candidates. The peptide that best balances structural confidence and therapeutic properties is WHYYVVGVAWKG, with the joint-highest ipTM (0.42), strong predicted affinity (6.752), excellent solubility, and very low hemolysis probability (0.068). I would advance WHYYVVGVAWKG for further development based on this combination of structural and therapeutic predicted properties.

Part 4: Generate Optimized Peptides with moPPIt

Due to GPU memory constraints with the available T4 runtime (moPPIt requires an A100 or L4 GPU), the notebook encountered errors during execution. However, based on the moPPIt framework, the key difference between moPPIt generated peptides and PepMLM generated peptides is the level of design control. While PepMLM samples peptides conditioned only on the target sequence, moPPIt uses Multi Objective Guided Discrete Flow Matching (MOG-DFM) to simultaneously optimize binding affinity, solubility, and hemolysis while steering the peptide toward specific residues in this case residues 1-7 near the A4V mutation site. This means moPPIt peptides would be expected to show more consistent engagement with the N-terminal region of SOD1, potentially with better therapeutic profiles overall.

Before advancing any moPPIt-generated peptide to clinical studies, I would evaluate them through the following steps: (1) run PeptiVerse to assess binding affinity, solubility, and hemolysis as done for PepMLM peptides; (2) submit to AlphaFold3 to confirm structural engagement with residues 1-7; (3) perform in vitro binding assays such as surface plasmon resonance (SPR) to measure actual Kd values; (4) test cytotoxicity in neuronal cell lines; and (5) assess proteolytic stability since short peptides are rapidly degraded in vivo

Part C: Final Project: L-Protein Mutants

Stage 1: Engineer novel L-protein mutants using protein design tools Stage 2: Synthesize the L-protein mutant gene via Twist Stage 3: Clone the L-protein mutant gene into a plasmid using Gibson Assembly Stage 4: Test the L-protein mutant’s structural integrity using the Nuclera system Stage 5: Test the L-protein in E. coli with plaque assays

Week 6 HW: Genetic Circuits part I

Assignment: DNA Assembly

Answer these questions about the protocol in this week’s lab:

What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?

R/The Phusion HF PCR Master Mix contains several key components. The Phusion DNA Polymerase is a high-fidelity polymerase with 3’→5’ exonuclease (proofreading) activity, which significantly reduces errors during amplification. dNTPs (dATP, dTTP, dCTP, dGTP) serve as the building blocks for the new DNA strand. The HF Buffer stabilizes pH and ionic conditions optimal for standard templates, while the GC Buffer is used for GC-rich sequences. MgCl₂ acts as an essential cofactor for polymerase activity. Additional stabilizers and additives improve enzyme stability and overall reaction efficiency.

What are some factors that determine primer annealing temperature during PCR?

R/Several factors influence the annealing temperature of primers during PCR. The GC content of the primer is a major factor, since G≡C base pairs have three hydrogen bonds (stronger than A=T), so higher GC content raises the melting temperature (Tm). Primer length also matters — longer primers have a higher Tm. The standard rule is to set the annealing temperature approximately 5°C below the lowest Tm of the two primers. Salt concentration in the reaction buffer stabilizes the DNA duplex and raises Tm. Mismatches between the primer and template lower the Tm. Finally, design tools such as NEB’s Tm Calculator or Primer3 account for all these variables and help determine the optimal annealing temperature.

There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other.

R/Both PCR and restriction enzyme (RE) digestion generate linear DNA fragments, but they differ significantly in mechanism and application. PCR amplifies a target sequence through repeated cycles of denaturation, annealing, and extension in a thermocycler, using two primers that define the fragment’s ends. Restriction enzyme digestion, by contrast, incubates DNA with a sequence-specific enzyme (typically at 37°C) that cuts at defined recognition sites. A key advantage of PCR is flexibility: the primers can be designed to add any desired overhang sequences to the ends of the fragment, which is especially useful for Gibson Assembly. Restriction digestion is limited to cutting wherever recognition sites naturally exist in the sequence. PCR is preferable when you need to amplify a fragment from a template and simultaneously engineer specific end sequences. RE digestion is preferable when the fragment already exists in a plasmid with convenient restriction sites, or when you want to avoid the small but real risk of polymerase-introduced mutations. PCR requires minimal starting material, while RE digestion requires sufficient amounts of purified DNA containing the target sequence.

How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning?

R/Gibson Assembly requires that adjacent fragments share homologous overlaps of 15–40 bp at their ends. For PCR fragments, primers must be designed with 5’ extensions that are homologous to the adjacent fragment or vector. These extensions do not anneal to the template during PCR but are incorporated into the final product, creating the necessary overlaps. For restriction enzyme-digested fragments, the ends must be verified to share sequence with the neighboring fragment, or overlaps can be added through a subsequent PCR step. Before performing the experiment, an in silico simulation using tools like Benchling or SnapGene should be done to confirm that all overlaps are correct and in the right orientation. Gel electrophoresis of all fragments prior to assembly confirms they are the correct size and free of contamination.

How does the plasmid DNA enter the E. coli cells during transformation?

R/The most common method used in the lab is chemical transformation. First, cells are made competent by treatment with ice-cold CaCl₂, which destabilizes the cell membrane and allows it to interact with DNA. The plasmid is then mixed with the competent cells on ice. A brief heat shock at 42°C for approximately 30–45 seconds causes a sudden expansion of the membrane, creating transient pores through which the plasmid DNA can enter the cell. The exact molecular mechanism is not fully understood, but the CaCl₂ treatment is thought to neutralize the negative charges on both the DNA and the membrane, facilitating their interaction. An alternative method, electroporation, uses a high-voltage electrical pulse to physically create pores in the membrane, allowing DNA entry. After transformation, cells are allowed to recover in rich media before being plated on selective antibiotic plates.

Describe another assembly method in detail (such as Golden Gate Assembly) Explain the other method in 5 - 7 sentences plus diagrams (either handmade or online).

R/Golden Gate Assembly is a DNA assembly method that uses Type IIS restriction enzymes, such as BsaI, which cut downstream of their recognition sequence rather than within it. This property allows researchers to design any desired 4-nucleotide overhang at the cut site, independent of the enzyme’s recognition sequence. Each DNA fragment is designed so that the Type IIS recognition sites flank the insert and are oriented to cut outward, removing the recognition site from the final product. When the enzyme cuts, it generates unique 4-nt overhangs on each fragment that are complementary only to the correct neighboring fragment, ensuring directional and specific assembly. A DNA ligase (T4 DNA Ligase) then seals the fragments together. The entire reaction — digestion and ligation — can be performed simultaneously in a single tube by thermocycling between 37°C (digestion) and 16°C (ligation), allowing incorrect assemblies to be re-cut and re-ligated until the correct product accumulates. Because the recognition sites are eliminated from the final product, Golden Gate Assembly is scarless, and it can assemble up to 20 or more fragments in a single reaction, making it highly efficient for building complex genetic constructs.

Model this assembly method with Benchling or Asimov Kernel!

Assignment: Asimov Kernel

Create a Repository for your work

Create a blank Notebook entry to document the homework and save it to that Repository

Explore the devices in the Bacterial Demos Repo to understand how the parts work together by running the Simulator on various examples, following the instructions for the simulator found in the “Info” panel (click the “i” icon on the right to open the Info panel)

Create a blank Construct and save it to your Repository

Recreate the Repressilator in that empty Construct by using parts from the Characterized Bacterial Parts repository

Search the parts using the Search function in the right menu

Drag and drop the parts into the Construct

Confirm it works as expected by running the Simulator (“play” button) and compare your results with the Repressilator Construct found in the Bacterial Demos repository

Document all of this work in your Notebook entry - you can copy the glyph image and the simulator graphs, and paste them into your Notebook

Build three of your own Constructs using the parts in the Characterized Bacterials Parts Repo

Explain in the Notebook Entry how you think each of the Constructs should function

Run the simulator and share your results in the Notebook Entry

If the results don’t match your expectations, speculate on why and see if you can adjust the simulator settings to get the expected outcome

Week 7 HW: Genetic Circuits Part II

< Assignment Part 1: Intracellular Artificial Neural Networks (IANNs)

Assignment Part 1: Intracellular Artificial Neural Networks (IANNs)

1. Advantages of IANNs over Traditional Boolean Genetic Circuits

Continuous Input/Output Behavior

Traditional genetic circuits operate as Boolean functions a gene is either ON or OFF (0 or 1). IANNs instead process continuous values, meaning inputs and outputs can exist across a full range (like protein concentrations or expression levels). This is much closer to how biological systems actually work, where gene expression is rarely all or nothing but rather graded and proportional to stimuli.

Non-Linear Computation

IANNs can compute non-linear functions of their inputs, meaning they can capture complex, threshold-based, and context dependent behaviors. For example, a cell differentiating in response to a morphogen gradient doesn’t just respond to “present/absent” it responds proportionally and with threshold effects that require non-linear processing.

Weighted Multi-Input Integration

IANNs can simultaneously integrate multiple inputs with different weights, meaning some signals contribute more than others to the final output. Boolean circuits would require an exponentially growing number of logic gates to approximate this, making them impractical at scale.

Scalability and Efficiency

As the number of inputs grows, Boolean circuits become combinatorially complex. IANNs handle additional inputs more elegantly because the same architecture naturally accommodates new weighted signals without redesigning the entire circuit.

2. Application: Assessing Metastatic Potential of a Localized Tumor

An IANN could be engineered into cells to continuously monitor the expression levels of multiple biomarkers associated with metastatic progression, such as matrix metalloproteinases (MMPs), epithelial-mesenchymal transition (EMT) markers like E-cadherin and vimentin, and hypoxia inducible factors (HIFs).

Input Behavior

The inputs would be the continuous concentration levels of these biomarkers inside or around the tumor cell. Each input would carry a different weight depending on its known contribution to metastatic risk for example, high MMP expression might be weighted more heavily than moderate HIF levels.

Output Behavior

The output would be a graded, continuous signal such as the expression level of a fluorescent reporter protein proportional to the calculated metastatic risk. A low fluorescence signal would indicate low risk, while high fluorescence would indicate high metastatic potential, allowing spatial visualization of the tumor.

Limitations

Biological noise: natural fluctuations in gene expression inside cells could produce false positives or inconsistent readings
Delivery: engineering and delivering the IANN circuit into tumor cells in vivo remains technically challenging
Crosstalk: circuit components could interfere with endogenous cellular pathways, affecting accuracy
Response time: protein concentration changes are slow compared to electronic systems, potentially delaying the output signal
Stability: the engineered circuit may be lost or silenced over cell divisions due to epigenetic changes or plasmid dilution

3. Multilayer Perceptron Diagram

Diagram: An intracellular multilayer perceptron where Layer 1 outputs an endoribonuclease (e.g., Csy4) that regulates a fluorescent protein output in Layer 2.

Layer 1: X1, X2 → Tx → mRNA → Tl → Csy4 (endoribonuclease)
Layer 2: Csy4 → regulates mRNA of fluorescent protein → Tl → Fluorescent Protein Output

Assignment Part 2: Fungal Materials

1. Existing Fungal Materials and Their Uses

Some examples of fungal materials include Ecovative, used for packaging and construction, and Mylo, a leather made from mycelium used in clothing and accessories.

Advantages over Traditional Materials

Biodegradable: they do not pollute the environment
Can grow into any desired shape
Produced from cheap agricultural waste such as straw or rice husks

Disadvantages over Traditional Materials

Sensitive to moisture
Lower mechanical resistance than traditional counterparts such as plastic or animal leather
Large-scale production cost remains high

2. Genetic Engineering of Fungi and Advantages over Bacteria

Fungi could be genetically engineered to improve their ability to degrade plastic more efficiently, helping reduce environmental contamination. They could also be engineered to produce complex medicines or to make their mycelium materials more resistant to moisture.

Advantages of Synthetic Biology in Fungi vs. Bacteria

Fungi are eukaryotic organisms: they have a nucleus and are more complex cells, similar to human cells which allows them to produce more complex proteins correctly
Fungi can grow as 3D structures through their mycelium network, which bacteria cannot, making them better suited for material production and large-scale industrial applications

Assignment Part 3: First DNA Twist Order

Review Part 3: DNA Design Challenge of the week 2 homework. Design at least 1 insert sequence and place it into the Benchling/Kernel/Other folder you shared in the Google Form above. Document the backbone vector it will be synthesized in on your website.

Insert Design: HSP90-GFP Fusion Biosensor

The insert sequence consists of a fusion protein between the chaperone HSP90 and Green Fluorescent Protein (GFP). To create this fusion, the HSP90 stop codon was removed and replaced with a short linker sequence, followed by the GFP sequence without its start codon, creating a continuous fusion protein where GFP fluorescence directly indicates HSP90 presence and expression levels.

Biological purpose: HSP90 stabilizes oncogenic proteins such as BCR-ABL. When cancer cells develop resistance to imatinib, they frequently overexpress HSP90 to protect mutated BCR-ABL. This biosensor allows real-time visualization of HSP90 levels the brighter the fluorescence, the higher the HSP90 expression and the higher the likelihood of emerging imatinib resistance.

Backbone vector: The insert was placed into the pET-28a(+) backbone vector within the Multiple Cloning Site (MCS). This vector provides a T7 promoter for strong expression, a kanamycin resistance marker for bacterial selection, and a His-tag for protein purification.

https://benchling.com/majobolivar/f_/7qelUF55UQ-week-7-dna-order/

Week 9 HW: Cell Free Systems

🧬 General Homework Questions

Cell-Free Protein Synthesis

Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables.

R/: The main advantage of cell-free protein synthesis over traditional in vivo methods is the greater flexibility and control over experimental variables. In living cells, there are many uncontrollable factors the cell may degrade the target protein through proteolysis, compete for resources, or respond unpredictably to foreign sequences. Cell-free systems eliminate the cell itself as a variable, allowing direct manipulation of reaction conditions such as temperature, pH, and component concentrations.

Name at least two cases where cell-free expression is more beneficial than cell production.

R/: Two cases where cell-free expression is more beneficial than cell production are: first, the production of toxic proteins; proteins that would kill a living host cell can be safely expressed in a cell-free system since there is no living organism to harm. Second, the production of complex human proteins, a cell-free system derived from human or eukaryotic extracts can perform the necessary post-translational modifications, such as glycosylation, that bacterial cells cannot, resulting in a functional protein.

Components of Cell-Free Systems

Describe the main components of a cell-free expression system.
Explain the role of each component.

R/: The main components of a cell-free expression system are: ribosomes, which translate the mRNA into a protein; RNA polymerase and transcription/translation factors, which transcribe the DNA into mRNA and assist in the process; ATP, which provides the energy needed to drive the reactions; amino acids, which are the building blocks assembled into the final protein; and the DNA or mRNA template, which contains the instructions for the protein to be produced.

Energy Regeneration

Why is energy provision and regeneration critical in cell-free systems?
Describe a method to ensure continuous ATP supply in a cell-free experiment.

R/: Energy regeneration is critical in cell-free systems because, unlike living cells, there are no mitochondria or metabolic pathways to continuously produce ATP. Once the initial ATP supply is consumed, all transcription and translation reactions stop, even if all other components are still available. To ensure a continuous ATP supply, one method is to include a creatine phosphate and creatine kinase regeneration system in the reaction. Creatine phosphate donates its phosphate group to ADP, continuously regenerating ATP throughout the experiment. Alternatively, glucose and glycolytic enzymes can be added to the system to mimic the cell’s natural ATP production pathway

Expression Systems Comparison

Compare prokaryotic versus eukaryotic cell-free expression systems.
Choose a protein to produce in each system and explain why.

R/: Prokaryotic cell-free systems, such as those derived from E. coli, are simpler, faster, and less expensive to prepare. However, they cannot perform post-translational modifications such as glycosylation. A good protein to produce in this system would be insulin, a relatively simple protein that can be produced quickly and in large quantities for industrial purposes.

Eukaryotic cell-free systems, such as those derived from human or yeast cells, are more complex but can perform the post-translational modifications necessary for many human proteins to function correctly. A good protein to produce in this system would be erythropoietin (EPO), a hormone that requires glycosylation to be biologically active a modification that a prokaryotic system cannot perform.

Membrane Protein Expression

How would you design a cell-free experiment to optimize the expression of a membrane protein?
What challenges are involved, and how would you address them?

R/: Membrane proteins require a lipid environment to fold correctly. Without a membrane, they aggregate and become insoluble and nonfunctional, making them particularly challenging to produce in cell-free systems.

To address this, I would supplement the cell-free reaction with liposomes or nanodiscs; artificial lipid structures that mimic the cell membrane and provide a surface where the membrane protein can insert itself correctly. Additionally, detergents can be added to keep the protein soluble during synthesis. I would test different lipid and detergent concentrations systematically, varying one condition at a time while keeping all others constant, to find the optimal amounts.

Other variables to optimize include temperature; lower temperatures can help the protein fold correctly instead of aggregating, and DNA template concentration, which affects how much protein is produced without overwhelming the system. Finally, success would be measured by running an SDS-PAGE gel to confirm protein production, and a solubility assay to confirm the protein is folding correctly and not precipitating. A functional activity assay could also be used to verify that the protein is working as expected.

Troubleshooting

Imagine you observe a low yield of your target protein in a cell-free system.
Describe three possible reasons for this.
Suggest a troubleshooting strategy for each reason.

R/: First, ATP depletion: the ATP in the reaction is consumed quickly and synthesis stops. To address this, a creatine phosphate and creatine kinase regeneration system should be added to continuously regenerate ATP throughout the experiment.

Second, insufficient DNA template: if there is not enough DNA, transcription will be limited and little mRNA will be produced. To troubleshoot this, the DNA template should be amplified using PCR before the experiment to ensure a sufficient concentration.

Third, poor ribosome activity: ribosomes require the correct magnesium concentration to function properly. If magnesium levels are too high or too low, translation efficiency drops significantly. To address this, magnesium concentration should be systematically optimized by testing a range of concentrations.

🧪 Homework Question from Kate Adamala

Synthetic Minimal Cell Design

Pick a function and describe it.
What would your synthetic cell do? What is the input and what is the output?

R/:The synthetic minimal cell would function as a biosensor for Chagas disease, caused by the parasite Trypanosoma cruzi. Chagas is known as the “silent disease” because most infected people are unaware they carry it, and current diagnostic methods require specialized laboratory equipment unavailable in many endemic regions. The input would be molecules unique to Trypanosoma cruzi, such as its GPI-anchored surface proteins, present in a patient’s blood sample. The output would be GFP fluorescence — the synthetic cell would glow green under UV light when the parasite is detected, providing a simple yes/no diagnostic result visible to the naked eye.

Could this function be realized by cell-free Tx/Tl alone, without encapsulation?

R/: No. Without encapsulation, the cell-free machinery would be exposed to the complex environment of the patient’s blood, where proteases would rapidly degrade the ribosomes and enzymes before they could produce GFP. Additionally, without a membrane, there would be no control over which molecules activate the system, leading to false positives and unreliable results. Encapsulation is essential to protect the internal machinery and ensure specific detection.

Could this function be realized by a genetically modified natural cell?

R/: Yes, this function could theoretically be realized by a genetically modified natural cell. However, it would be significantly more complex and potentially dangerous. A living cell has its own metabolism and processes that could interfere with the detection signal. Additionally, introducing living genetically modified organisms into a patient raises serious safety and regulatory concerns. A synthetic minimal cell is safer because it is not alive, cannot replicate or mutate, and is designed to perform only one specific function — detecting Trypanosoma cruzi and producing a GFP signal.

Describe the desired outcome of your synthetic cell operation.

R/: In the presence of Trypanosoma cruzi molecules, the synthetic cell detects the parasite and produces a visible green fluorescence signal, enabling early diagnosis of Chagas disease in field settings without specialized laboratory equipment.

System Design

Design all components that would need to be part of your synthetic cell.
What would the membrane be made of?

R/: The membrane would be made of POPC phospholipids and cholesterol, similar to a natural cell membrane. It would incorporate specific membrane channel proteins that allow molecules unique to Trypanosoma cruzi to enter the synthetic cell and trigger GFP expression, while protecting the internal machinery from degradation by external proteases in the patient’s blood.

What would you encapsulate inside (enzymes, small molecules)?

Cell-free Tx/Tl system from E. coli: ribosomes, RNA polymerase, and transcription/translation factors
DNA encoding eGFP under the control of a promoter activated by Trypanosoma cruzi molecules
ATP and a creatine phosphate regeneration system for continuous energy supply
Amino acids as building blocks for GFP synthesis
Membrane channel proteins to allow parasite molecules to enter

Which organism will your Tx/Tl system come from?
Is a bacterial system sufficient, or do you need a mammalian system? Why?

R/:The Tx/Tl system will come from bacteria (E. coli). Since eGFP is a simple protein that does not require post-translational modifications such as glycosylation, a bacterial system is sufficient and more practical. It is cheaper, faster, and easier to prepare than a mammalian system, making it ideal for a diagnostic tool intended for field use in endemic regions.

Communication

How will your synthetic cell communicate with the environment?
Are substrates permeable, or do you need to express membrane channels?

R/: The synthetic cell will communicate with the environment through specific membrane channel proteins; alpha-hemolysin (aHL) expressed on its surface. These channels will be selectively permeable, allowing only molecules unique to Trypanosoma cruzi to enter and activate the GFP promoter.

Experimental Details

List all lipids and genes required.
Specify actual genes where possible (e.g., membrane channels).

R/:

Lipids: POPC, cholesterol
Genes: eGFP (enhanced Green Fluorescent Protein), alpha-hemolysin (aHL) membrane channel, TcGPR aptamer-controlled promoter activated by Trypanosoma cruzi surface molecules

How will you measure the function of your system?

R/: The function will be measured by detecting GFP fluorescence. In the presence of Trypanosoma cruzi, the synthetic cell will fluoresce green under UV light, providing a simple yes/no result visible to the naked eye for field use. For more precise quantification in a laboratory setting, flow cytometry could be used to measure fluorescence intensity. A negative control synthetic cells exposed to a sample without Trypanosoma cruzi would be run simultaneously to confirm specificity and rule out false positives

🤖 Homework Question from Peter Nguyen

Application Design

Choose one application field: Architecture, Textiles/Fashion, or Robotics.

R/: Robotics

Propose an application using freeze-dried cell-free systems integrated into materials.

Proposal Questions

Write a one-sentence summary pitch describing your concept.

R/: A soft robot embedded with freeze-dried cell-free biosensors that detect heavy metals in rivers close to mines in order to detect and report river contamination that could affect nearby communities.

Explain how the idea works in detail (3–4 sentences or more).

R/: The soft robot is built with a silicone skin layer containing encapsulated freeze-dried cell-free biosensors distributed across its surface. When the robot is deployed into a river, a mechanically-triggered release system opens the protective encapsulation, allowing river water to rehydrate the cell-free system. The biosensors contain metal-responsive inducible promoters (such as MerR for mercury or ArsR for arsenic) that, when activated by the presence of heavy metals, drive the expression of a fluorescent reporter protein like GFP. This fluorescent signal is detected by an onboard optical sensor that wirelessly transmits a contamination alert to the operator in real time.

What societal challenge or market need does this address?

R/: Illegal and industrial mining operations frequently cause heavy metal leakage into nearby rivers, contaminating water sources for rural and indigenous communities that depend on them for drinking, agriculture, and fishing. Current water monitoring methods require manual sample collection and laboratory analysis, which is slow, expensive, and logistically difficult in remote areas. This robot provides a fast, deployable, and low-cost alternative for real-time environmental monitoring in regions where traditional infrastructure is absent.

How will you address limitations of cell-free systems (e.g., activation, stability, single use)?

R/: The single-use limitation is addressed by the mission design itself, each robot deployment corresponds to one sampling event in one location, making single-use acceptable and even practical. The stability of the freeze-dried system eliminates the need for cold chain storage, allowing robots to be stored and transported to remote mining regions without refrigeration. Premature activation by humidity or ambient moisture is prevented by the protective encapsulation layer, which only opens upon deliberate mechanical triggering by the operator before river entry.

🚀 Homework Question from Ally Huang

Genes in Space Proposal

Provide background information describing the space biology question or challenge (≤100 words).

R/:Space exploration exposes astronauts to two major biological threats: cosmic radiation and microgravity. Cosmic radiation directly breaks DNA strands, while microgravity disrupts the cellular cytoskeleton, impairing the efficiency of DNA repair mechanisms. Together, these factors cause cumulative DNA damage that increases the risk of mutations, cancer, and accelerated cellular aging. For long-duration missions such as a Mars expedition, where astronauts would be exposed for up to three years without access to advanced medical facilities, real-time monitoring of DNA damage is critical to ensure crew health and mission success.

Name the molecular or genetic target (≤30 words). R/: Expression levels of DNA damage response genes, specifically p53, BRCA1, and RAD51, detected through their messenger RNA (mRNA) in astronaut blood samples.
Explain how this target relates to the space biology challenge (≤100 words).

R/:When DNA is damaged by radiation, cells activate an emergency response pathway that upregulates repair genes such as p53, BRCA1, and RAD51. Elevated mRNA levels of these genes in an astronaut’s blood indicate active DNA damage and repair attempts. By measuring the expression levels of these genes over time, we can track the accumulation of DNA damage throughout a mission. This provides a direct, molecular-level window into how the combination of radiation and microgravity is affecting the astronaut’s genome in real time.

State your hypothesis or research goal and explain the reasoning (≤150 words).

R/:My hypothesis is that astronauts on long-duration missions will show progressively increasing expression of DNA damage response genes (p53, BRCA1, RAD51) compared to pre-flight baseline levels, and that this increase will correlate with mission duration and radiation exposure levels.

This hypothesis is based on two established facts: first, that cosmic radiation causes double-strand DNA breaks, which are known to activate these repair pathways; and second, that microgravity impairs cytoskeletal organization, reducing the efficiency of DNA repair and allowing damage to accumulate. By tracking gene expression over time using BioBits® cell-free protein expression system, we can establish whether current radiation shielding and countermeasures are sufficient, and at what point during a mission DNA damage reaches clinically concerning levels, critical information for planning a safe Mars mission.

Outline your experimental plan, including samples, controls, and measurements (≤100 words).

R/:Blood samples will be collected from astronauts monthly throughout the mission. mRNA will be extracted and introduced into BioBits® freeze-dried cell-free systems engineered with fluorescent reporters linked to p53, BRCA1, and RAD51 expression. Fluorescence intensity, measured with the P51 Molecular Fluorescence Viewer, will indicate gene expression levels. Controls will include pre-flight baseline samples from each astronaut and Earth-based samples from non-exposed individuals. The miniPCR® thermal cycler will be used to amplify target sequences for confirmation. Data will be compared across time points to track damage accumulation trends throughout the mission.

Homework Part B: Individual Final Project

Put your chosen final project slide in the appropriate slide deck following the instructions on slide 1:

Week 10 HW: Imaging and Measurement

Homework: Final Project

Please identify at least one (ideally many) aspect(s) of your project that you will measure.

R/:
I would like to measure the expression levels of mutant p53 protein across different cancer types. High accumulation of mutant p53 (due to MDM2 being unable to degrade it) would indicate a GOF mutation, making those cancers more susceptible to p53 GOF-driven oncogenesis.

Please describe all of the elements you would like to measure and how you will perform these measurements.

R/:
First, I will perform DNA sequencing to confirm the specific GOF mutation in TP53. Then, I will use Western Blot (SDS-PAGE + anti-p53 antibody detection) to quantify p53 protein levels across cancer samples. Additionally, mass spectrometry (LC-MS) will be used to confirm the exact mutant protein sequence and molecular weight.

Homework: Waters Part I — Molecular Weight

We will analyze an eGFP standard on a Waters Xevo G3 QTof MS system.

1. Theoretical Molecular Weight

eGFP Sequence

MVSKGEELFTG VVPILVELDG DVNGHKFSVS GEGEGDATYG KLTLKFICTT GKLPVPWPTL VTTLTYGVQC FSRYPDHMKQ HDFFKSAMPE GYVQERTIFF KDDGNYKTRA EVKFEGDTLV NRIELKGIDF KEDGNILGHK LEYNYNSHNV YIMADKQKNG IKVNFKIRHN IEDGSVQLAD HYQQNTPIGD GPVLLPDNHY LSTQSALSKD PNEKRDHMVL LEFVTAAGIT LGMDELYKLE HHHHHH

Note: Contains His-tag (HHHHHH) and linker (LE).

R/:
Based on ExPASy Compute pI/Mw, the theoretical molecular weight of eGFP is 28,006.60 Da (~28 kDa).

2. Molecular Weight from Charge States

Calculate MW using adjacent charge states from LC-MS (Figure 1).

3. Accuracy Calculation

Compare deconvoluted MW vs theoretical MW.

Homework: Waters Part II — Secondary/Tertiary Structure

1. Native vs Denatured Proteins

R/:
Native proteins maintain their 3D folded structure, while denatured proteins unfold and expose more basic residues (Lys, Arg), allowing higher protonation.

In MS:

Denatured proteins: many peaks at low m/z (high charge states)
Native proteins: fewer peaks at high m/z (low charge states)

2. Charge State Determination

R/:
The peak at ~2800 m/z corresponds to z = 10.

Calculation:

z = 28,000 / 2,799.4199 ≈ 10

This low charge state is consistent with a folded (native) protein.

Homework: Waters Part III — Peptide Mapping (Primary Structure)

We digest eGFP with trypsin (cuts after K and R).

1. Lysine and Arginine Count

R/:
eGFP contains:

20 Lysines (K)
6 Arginines (R)
Total = 26 cleavage sites

2. Number of Peptides

Steps

Use ExPASy PeptideMass
Input sequence
Perform trypsin digestion

R/:
Tryptic digestion of eGFP generates 25 peptides.

3. Chromatographic Peaks in the Peptide Map

Based on the LC-MS peptide map shown in Figure 5a, approximately 22 chromatographic peaks above 10% relative abundance can be observed between 0.5 and 6 minutes.

Figure 5a shows the total ion chromatogram (TIC) of the eGFP peptide map. The peak at 2.78 minutes is circled, and its MS data is shown in Figure 5b.

4. Comparison Between Observed and Predicted Peptides

R/:
The number of peaks does not perfectly match the number of predicted peptides. There are fewer peaks observed in the chromatogram (22 peaks) than peptides predicted from tryptic digestion (25 peptides).

This could be explained by the size and behavior of some peptides. Very small peptides (<500 Da) are not retained efficiently in the LC column and may therefore not appear as distinct chromatographic peaks.

5. Identification of m/z, Charge State, and Singly Charged Mass

Part 1: Identify the m/z of the Most Abundant Charge State

Looking at Figure 5b, the tallest peak in the spectrum is located at:

m/z = 525.76712

This corresponds to the most abundant charge state of the peptide.

Part 2: Determine the Charge (z) Using Isotope Separation

In the inset (zoom-in) of Figure 5b, the isotope peaks are:

525.76712
526.25918
526.76845

The separation between consecutive isotope peaks is:

526.25918 − 525.76712 = 0.492 ≈ 0.5 m/z units

Because isotopes differ by approximately 1 Da in real mass, but the spectrum displays m/z, the observed separation corresponds to:

1/z

Therefore:

z = 1 / 0.5 = 2

The most abundant charge state of this peptide is:

z = 2+

Part 3: Calculate the Singly Charged Form [M+H]+

To convert from the observed m/z value (z = 2) to the singly charged mass [M+H]+, the following equation is used:

[M+H]+ = (m/z × z) − (z − 1) × 1.00728

Substituting the values:

[M+H]+ = (525.76712 × 2) − (1 × 1.00728)

= 1051.53424 − 1.00728

= 1050.527 Da

This matches the peak observed at 1050.52438 in Figure 5b, confirming the calculation.

6. Peptide Identification and Mass Accuracy

Using the PeptideMass tool (ExPASy) with the eGFP sequence and trypsin digestion, the peptide with a theoretical monoisotopic [M+H]+ mass closest to the experimentally observed value of 1050.52438 Da was identified as:

FEGDTLVNR (residues 115–123)

The mass accuracy was calculated as follows:

error (ppm) = |1050.52438 − 1050.5214| / 1050.5214 × 10⁶

≈ 2.84 ppm

This low error (~2.84 ppm) is consistent with the high mass accuracy expected from a TOF (Time-of-Flight) mass spectrometer.

7. Sequence Coverage Confirmed by Peptide Mapping

R/:
Based on the amino acid coverage map shown in Figure 6, approximately 88% of the eGFP sequence was confirmed by peptide mapping. This means that peptides covering 88% of the total amino acid sequence were successfully identified using their calculated masses and fragmentation patterns with the BioAccord LC-MS system.

Homework: Waters Part IV — Oligomers

Using the known subunit masses from Table 1, the expected masses of each oligomeric species were calculated and matched to the peaks observed in Figure 7:

7FU Decamer:
10 × 340 kDa = 3,400 kDa (3.4 MDa) → peak at 3.4 MDa
8FU Didecamer:
20 × 400 kDa = 8,000 kDa (8.0 MDa) → peak at 8.33 MDa
8FU 3-Decamer:
30 × 400 kDa = 12,000 kDa (12.0 MDa) → peak at 12.67 MDa
8FU 4-Decamer:
40 × 400 kDa = 16,000 kDa (16.0 MDa) → small peak near 16 MDa

The calculated masses are consistent with the peaks observed in the CDMS spectrum, confirming the presence of these oligomeric states of KLH in solution.

Week 11 HW: Bioproduction

Part A: The 1,536 Pixel Artwork Canvas | Collective Artwork

Reflection on the Community Bioart Project

R/: I contributed one pixel in the upper-right area of the collaborative canvas. Although it was a small contribution, it became part of a collective bioart piece created by students from around the world.

What I liked most about this project was the idea of global collaboration. Watching how individual contributions, even something as small as a single pixel, combined to form a unified artwork was a beautiful reflection of how collective science and art can work together.

For next year, I think the project could be improved by adding a live update feature so contributors could watch the canvas evolve in real time. It would also be interesting to include a small profile or identifier showing who contributed each pixel, making the human connection behind each contribution more visible.

Part B: Cell-Free Protein Synthesis | Cell-Free Reagents

1. Roles of Each Component in the Cell-Free Reaction

E. coli Lysate

BL21 (DE3) Star Lysate (includes T7 RNA Polymerase)

R/: The BL21 (DE3) Star lysate provides all the cellular machinery required for transcription and translation, including ribosomes, tRNAs, metabolic enzymes, and translation factors. It also contains T7 RNA Polymerase, which specifically transcribes genes under the control of a T7 promoter.

Salts and Buffer Components

Potassium Glutamate

R/: Potassium glutamate maintains ionic strength and stabilizes the transcription and translation machinery, mimicking the intracellular ionic environment of E. coli.

HEPES-KOH pH 7.5

R/: HEPES-KOH acts as a buffer that maintains the reaction pH near 7.5, which is optimal for enzymatic activity during transcription and translation.

Magnesium Glutamate

R/: Magnesium glutamate provides Mg²⁺ ions, which are essential cofactors for ribosomes, RNA polymerase, and many enzymes involved in protein synthesis.

Potassium Phosphate Monobasic/Dibasic

R/: These phosphate salts help stabilize pH and provide phosphate ions required for energy metabolism and nucleotide synthesis.

Energy / Nucleotide System

Ribose

R/: Ribose is a pentose sugar that serves as a precursor for nucleotide biosynthesis, helping regenerate NTPs required for transcription.

Glucose

R/: Glucose acts as a carbon and energy source that fuels metabolic pathways involved in ATP regeneration during translation.

AMP, CMP, GMP, UMP

R/: These nucleoside monophosphates serve as precursors for RNA synthesis and are phosphorylated into their triphosphate forms (NTPs) used during transcription.

Guanine

R/: Guanine can be salvaged and converted into GMP/GTP, providing an additional source of guanine nucleotides for RNA synthesis.

Translation Mix

17 Amino Acid Mix

R/: The amino acid mix provides 17 of the 20 standard amino acids required as substrates for protein synthesis.

Tyrosine

R/: Tyrosine is added separately because it has low solubility and must be prepared independently to ensure sufficient concentration in the reaction.

Cysteine

R/: Cysteine is added separately because it is prone to oxidation and must remain in its reduced form for efficient protein synthesis.

Additives

Nicotinamide

R/: Nicotinamide acts as a precursor for NAD⁺, an essential cofactor involved in metabolic reactions that regenerate ATP during the cell-free reaction.

Backfill

Nuclease-Free Water

R/: Nuclease-free water is used to bring the reaction to its final volume without introducing RNases or DNases that could degrade RNA or DNA templates.

2. Differences Between the 1-Hour Optimized PEP-NTP Master Mix and the 20-Hour NMP-Ribose-Glucose Master Mix

R/: The 1-hour optimized PEP-NTP master mix uses phosphoenolpyruvate (PEP) as the primary energy source and directly supplies nucleoside triphosphates (NTPs), allowing rapid and efficient protein production during short reactions.

In contrast, the 20-hour NMP-Ribose-Glucose master mix uses nucleoside monophosphates (NMPs) together with ribose and glucose, relying on endogenous metabolic enzymes in the lysate to regenerate NTPs gradually over time. This system is more cost-effective and better suited for long-duration protein expression.

3. Bonus Question: How Can Transcription Occur if GMP Is Not Included but Guanine Is?

R/: Although GMP is not directly included, transcription can still occur because guanine can be salvaged by enzymes present in the E. coli lysate. Enzymes such as hypoxanthine-guanine phosphoribosyltransferase (HGPRT) convert guanine into GMP using phosphoribosyl pyrophosphate (PRPP), and GMP is subsequently phosphorylated into GDP and GTP, which are required for RNA synthesis by RNA polymerase.

Part C: Planning the Global Experiment | Cell-Free Master Mix Design

1. Biophysical or Functional Properties of the Fluorescent Proteins

sfGFP

R/: sfGFP (Superfolder GFP) has exceptionally fast and robust folding kinetics, allowing maturation in under 10 minutes at 37°C. This makes it highly suitable for cell-free systems where reaction time is limited. However, like all GFP derivatives, it requires molecular oxygen for chromophore maturation.

mRFP1

R/: mRFP1 has incomplete chromophore maturation, meaning that a fraction of the synthesized protein never becomes fluorescent. It also has relatively low photostability and quantum yield, which can reduce fluorescence intensity in cell-free systems.

mKO2

R/: mKO2 is moderately sensitive to acidic conditions, meaning that slight decreases in pH during long incubations can reduce fluorescence output. Proper buffering is therefore important for maintaining signal stability.

mTurquoise2

R/: mTurquoise2 undergoes slow two-step chromophore maturation, making it one of the slowest-maturing cyan fluorescent proteins. In short cell-free reactions, fluorescence may underestimate the total amount of synthesized protein.

mScarlet-I

R/: mScarlet-I is known for robust folding and efficient maturation, even under oxidizing conditions. This makes it a reliable reporter protein in cell-free systems with varying redox environments.

Electra2

R/: Electra2 is a bright blue fluorescent protein with improved fluorescence compared to many other blue fluorescent proteins. However, like other β-barrel fluorescent proteins, it requires molecular oxygen for chromophore maturation, making oxygen availability important for long incubations.

2. Hypothesis for Improving Fluorescence Over a 36-Hour Incubation

R/: I hypothesize that increasing the concentrations of glucose and ribose in the cell-free master mix would improve the fluorescence output of mTurquoise2 during a 36-hour incubation.

Because mTurquoise2 undergoes slow two-step chromophore maturation, it requires extended reaction times to achieve maximum fluorescence. By increasing glucose and ribose concentrations, ATP regeneration could be sustained for longer periods, maintaining active translation and allowing a larger fraction of synthesized mTurquoise2 proteins to complete chromophore maturation. As a result, higher fluorescence intensity would be expected at later time points (18–36 hours) compared to the standard master mix composition.