Homework

Weekly homework submissions:

  • Week 1 HW: Principles and Practices

    Describe a biological engineering application or tool you want to develop and why. Virus Hunting The usage of virus hunting to discover viruses in animal populations that might become a pandemic and exploit it as a gene therapy tool. first of all the viruses are isolated from hosts of interest, then sequencing their genome, then characterize the virus. Following steps will be:

  • Week 2 HW: DNA Read, Write and Edit

    Part 0: Attend or watch all lecture and recitation videos. Part 1: Benchling & In-silico Gel Art Make a free account at benchling.com Import the Lambda DNA. Simulate Restriction Enzyme Digestion with the following Enzymes: EcoRI HindIII BamHI KpnI EcoRV SacI SalI Create a pattern/image in the style of Paul Vanouse’s Latent Figure Protocol artworks. I imagine the pattern as a hand making number one Part 3: DNA Design Challenge Choose your protein. I chose tau protein that it’s hyperphosphorylation is involved in Alzheimer’s disease progression I chose UniProt to get its sequence Reference: https://rest.uniprot.org/uniprotkb/P10636.fasta

  • Week 3 HW: Lab Automation

    Python Script for Opentrons Artwork I chose to make the egyptian beetle inspiration artistic design using the GUI link: https://opentrons-art.rcdonovan.com/?id=1xb86617h0wq061

  • Week 4 HW: Protein Design Pt. 1

    Part A: Conceptual Questions How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons) Why do humans eat beef but do not become a cow, eat fish but do not become fish? Why are there only 20 natural amino acids? Ref: https://www.chemistryworld.com/features/why-are-there-20-amino-acids/3009378.article

  • Week 5 HW: Protein Design Pt. 2

    Part A: SOD1 Binder Peptide Design (From Pranam) Pt 1: Generate Binders with PepMLM Begin by retrieving the human SOD1 sequence from UniProt (P00441) and introducing the A4V mutation. Using the PepMLM Colab linked from the HuggingFace PepMLM-650M model card: Generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence. To your generated list, add the known SOD1-binding peptide FLYRWLPSRRGG for comparison. Record the perplexity scores that indicate PepMLM’s confidence in the binders. human SOD1 sequence

  • Week 6 HW: Genetic Circuits Pt.1

    DNA Assembly What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose? The components of the Phusion High-Fidelity PCR Master Mix are the following: Phusion DNA Polymerase, incorporates nucleotides to “fill in” the gaps in the annealed DNA fragments. it is a hot-start, proofreading PCR enzyme, enabling generation of PCR amplicons with high sequence accuracy, sensitivity, and specificity. Phusion DNA Polymerase is a thermostable polymerase that possesses 5´→ 3´ polymerase activity, 3´→ 5´ exonuclease activity and will generate blunt-ended products. nucleotides: building blocks for new DNA strands during amplification. Buffer: it provides the optimal pH, ionic strength, and Mg²⁺ concentration (1.5 mM final) required for Phusion DNA polymerase to bind primers, extend DNA efficiently, and maintain its high fidelity What are some factors that determine primer annealing temperature during PCR? the specific primer annealing temperature depends on specific length and sequence of the primers. it depends also on melting temperature of the primers and therefore GC content TM = 4(G + C) + 2(A+T) There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other. Restriction Enzyme Digest

  • Week 7 HW: Genetic Circuits Pt.2

    Part 1: Intracellular Artificial Neural Networks (IANNs) What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions? The main advantage IANNs hold over traditional genetic circuits is scalability and the ability to support multilayer networks for complex decision-making. Traditional genetic circuits limitations include poor predictability and the struggle to reliably program multiple functions simultaneously due to inherent scalability limitations. On the other hand, ANNs have good predictability offering improved robustness for complex designs. Because of multiple layers and non-linear activations, neural networks can model complex, non-linear decision boundaries Traditional genetic circuits have input/output behaviors that function as Boolean operations. They process discrete signals (ON/OFF, high/low expression) through logic gates like AND, OR, and NOT, producing binary outputs based on truth tables. Moreover, the output layer in the ANNs producing the final prediction may be binary, multi-class or a continuous value. Describe a useful application for an IANN; include a detailed description of input/output behavior, as well as any limitations an IANN might face to achieve your goal. Application of CNNs: tumor and MSI detection in gastrointestinal cancer Convolutional Neural Networks (CNNs) are deep learning models designed to analyze structured grid-like data such as images. the CNNs were used as automatic tumor detector to predict MSI (Microsatellite instability) that determines if the patient with gastrointestinal cancer will respond will to immunotherapy. The authors used hematoxylin and eosin (H&E)-stained histology slides as an input For tumor detection in gastrointestinal cancer, the authors trained a convolutional neural network with deep residual learning (resnet18)12 model to classify tumor versus normal tissue by transfer learning. Transfer learning means reusing a pre-trained neural network model on a new but related task, instead of training from scratch. For MSI detection, we trained another resnet18 model for each tumor type. input/output behavior Input: Tiles extracted from digitized histology slides. Output: For each tile, a probability score indicating tumor vs. normal or MSI vs. MSS status. Behavior: The neural network processes image features within each tile to generate these probability scores, enabling localized tissue characterization and subsequent patient-level molecular classification. The mentioned limitations of CNN were: Classifying ability is limited to cancer type and ethnicity in the training set. therefore, larger training cohorts are needed to boost classification performance because rare morphological variants can be learned by the network The required tissue size. To define its lower limit, they generated ‘virtual biopsies’ and found that performance plateaued at approximately 100 tiles of 256 μm edge length, suggesting that biopsies are sufficient for MSI prediction Below is a diagram depicting an intracellular single-layer perceptron where the X1 input is DNA encoding for the Csy4 endoribonuclease and the X2 input is DNA encoding for a fluorescent protein output whose mRNA is regulated by Csy4. Tx: transcription; Tl: translation. References

  • Week 9 HW: Cell Free Systems

    Part A: General and Lecturer-Specific Questions General homework questions Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production. The main advantages of cell-free protein synthesis (CFPS) over traditional in vivo methods include

  • Week 10 HW: Imaging and Measurement

  • Week 11 HW: Building Genomes

    Part A: The 1,536 Pixel Artwork Canvas | Collective Artwork I contributed 123 pixels to the global artwork experiment by making HTGAA letters at the bottom left. I liked the collaborative work and that it represents all of us. I think it can be better by not allowing the replacement of anyone’s work. Part B: Cell-Free Protein Synthesis | Cell-Free Reagents Component Role E. coli Lysate • BL21 (DE3) Star Lysate (includes T7 RNA Polymerase) Salts/Buffer • Potassium Glutamate • HEPES-KOH pH 7.5 • Magnesium Glutamate • Potassium phosphate monobasic • Potassium phosphate dibasic Energy / Nucleotide System • Ribose • Glucose • AMP • CMP • GMP • UMP • Guanine Translation Mix (Amino Acids) • 17 Amino Acid Mix • Tyrosine • Cysteine Additives • Nicotinamide Backfill • Nuclease Free Water

Subsections of Homework

Week 1 HW: Principles and Practices

cover image cover image

Describe a biological engineering application or tool you want to develop and why.

Virus Hunting

The usage of virus hunting to discover viruses in animal populations that might become a pandemic and exploit it as a gene therapy tool. first of all the viruses are isolated from hosts of interest, then sequencing their genome, then characterize the virus. Following steps will be:

  1. Developing arrays for the virus detection providing a faster and cheaper way.
  2. Exploiting the virus replication machinery to deliver compounds / biopharmaceuticals to humans or animals.

Disocvering potential pandemic pathogens early will prevent its outbreak and prepare us well.

  • Biosafety and biosecurity aims to prevent loss, theft and misuse of highconsequence material. This can be done by providing and implementing risk control measures that address the risks associated with conducting high-consequence research and working with high-consequence material, including other biosecurity-relevant material.

  • The intrinsic risks of working with biological agents are not only of a biosafety nature, such as exposure or unintentional release, but also of biosecurity, which includes the theft, misuse, or intended release of biological material.

Describe at least three different potential governance actions by considering the purpose, design, assumptions, and risks of failures & “success”

  1. Development of a board to organize and authorize the suitable scientist for conducting virus hunting
  • Purpose: The aim is to allow only trained professionals to conduct such procedures
  • Design: Every country will have a trusted board that will allow and oversee the virus hunting procedures and these boards will be under the supervision of a central board that will get periodic reports
  • Assumptions: Incorrect selection of personnel might lead to inproper viral isolation and process organization leading to its outbreak
  • Risks of Failures & Success: This action might fall if not properly implemented
  1. Development of an agreed upon method of biological materials disposal
  • Purpose: The aim is to control and oversee disposal methods to prevent any outbreaks
  • Design: Professionals will be further trained
  • Assumptions: Ignoring the right protocol for disposal may lead to an outbreak
  • Risks of Failures & Success: not providing the right training and control
  1. Providing enough funds to conduct the required procedures in the countries of interest
  • Purpose: this action aim to fund labs at developing countries of interest
  • Design: The organization will provide the fund and supervise its implementation to buy the right equipment and tools
  • Assumption: Corruption or not providing the fund will hinder the virus hunting procedures in that country
  • Risk of Failures & Success: Not providing enough funds will stop the required procedures

Score each of your governance actions against your rubric of policy goals.

Does the option:Authorizing BoardBiological Materials DisposalFunds
Enhance Biosecurity
• By preventing incidents123
• By helping respond123
Foster Lab Safety
• By preventing incident213
• By helping respond123
Protect the environment
• By preventing incidents213
• By helping respond123
Other considerations
• Minimizing costs and burdens to stakeholders231
• Feasibility?123
• Not impede research213
• Promote constructive applications321

Based on scores, describe which governance option or combination of options, you would prioritize, and why.

Based on the scores:

  • I would prioritize the formation of the board because it is the base upon which every other step will follow.
  • I would prioritize as well providing enough funds especially for developing countries in which many have the knowledgeable scientists but not enouhg funds for buying the necessary equipment.
References

Hunting for the next pandemic virus (no date) ASM.org. Available at: https://asm.org/magazine/2022/fall/hunting-for-the-next-pandemic-virus

Vaidyanathan, G. (2011) ‘Virus hunters: Catching bugs in the field’, Cell, 147(6), pp. 1209–1211. doi:10.1016/j.cell.2011.11.037.

World Health Organization. Available at: https://iris.who.int/

Week 2 HW: DNA Read, Write and Edit

DNA_Picture DNA_Picture

Part 0: Attend or watch all lecture and recitation videos.

Part 1: Benchling & In-silico Gel Art

  • Make a free account at benchling.com
  • Import the Lambda DNA.
  • Simulate Restriction Enzyme Digestion with the following Enzymes:
    • EcoRI
    • HindIII
    • BamHI
    • KpnI
    • EcoRV
    • SacI
    • SalI
  • Create a pattern/image in the style of Paul Vanouse’s Latent Figure Protocol artworks.
virtual-digest virtual-digestpattern-artwork pattern-artwork
  • I imagine the pattern as a hand making number one

Part 3: DNA Design Challenge

  1. Choose your protein.
  • I chose tau protein that it’s hyperphosphorylation is involved in Alzheimer’s disease progression
  • I chose UniProt to get its sequence
Tau-Protein-Sequence Tau-Protein-Sequence

Reference: https://rest.uniprot.org/uniprotkb/P10636.fasta

  1. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.
  • Tau Protein DNA Sequence:

atggcggaaccgcgccaggaatttgaagtgatggaagatcatgcgggcacctatggcctg ggcgatcgcaaagatcagggcggctataccatgcatcaggatcaggaaggcgataccgat gcgggcctgaaagaaagcccgctgcagaccccgaccgaagatggcagcgaagaaccgggc agcgaaaccagcgatgcgaaaagcaccccgaccgcggaagatgtgaccgcgccgctggtg gatgaaggcgcgccgggcaaacaggcggcggcgcagccgcataccgaaattccggaaggc accaccgcggaagaagcgggcattggcgataccccgagcctggaagatgaagcggcgggc catgtgacccaggaaccggaaagcggcaaagtggtgcaggaaggctttctgcgcgaaccg ggcccgccgggcctgagccatcagctgatgagcggcatgccgggcgcgccgctgctgccg gaaggcccgcgcgaagcgacccgccagccgagcggcaccggcccggaagataccgaaggc ggccgccatgcgccggaactgctgaaacatcagctgctgggcgatctgcatcaggaaggc ccgccgctgaaaggcgcgggcggcaaagaacgcccgggcagcaaagaagaagtggatgaa gatcgcgatgtggatgaaagcagcccgcaggatagcccgccgagcaaagcgagcccggcg caggatggccgcccgccgcagaccgcggcgcgcgaagcgaccagcattccgggctttccg gcggaaggcgcgattccgctgccggtggattttctgagcaaagtgagcaccgaaattccg gcgagcgaaccggatggcccgagcgtgggccgcgcgaaaggccaggatgcgccgctggaa tttacctttcatgtggaaattaccccgaacgtgcagaaagaacaggcgcatagcgaagaa catctgggccgcgcggcgtttccgggcgcgccgggcgaaggcccggaagcgcgcggcccg agcctgggcgaagataccaaagaagcggatctgccggaaccgagcgaaaaacagccggcg gcggcgccgcgcggcaaaccggtgagccgcgtgccgcagctgaaagcgcgcatggtgagc aaaagcaaagatggcaccggcagcgatgataaaaaagcgaaaaccagcacccgcagcagc gcgaaaaccctgaaaaaccgcccgtgcctgagcccgaaacatccgaccccgggcagcagc gatccgctgattcagccgagcagcccggcggtgtgcccggaaccgccgagcagcccgaaa tatgtgagcagcgtgaccagccgcaccggcagcagcggcgcgaaagaaatgaaactgaaa ggcgcggatggcaaaaccaaaattgcgaccccgcgcggcgcggcgccgccgggccagaaa ggccaggcgaacgcgacccgcattccggcgaaaaccccgccggcgccgaaaaccccgccg agcagcggcgaaccgccgaaaagcggcgatcgcagcggctatagcagcccgggcagcccg ggcaccccgggcagccgcagccgcaccccgagcctgccgaccccgccgacccgcgaaccg aaaaaagtggcggtggtgcgcaccccgccgaaaagcccgagcagcgcgaaaagccgcctg cagaccgcgccggtgccgatgccggatctgaaaaacgtgaaaagcaaaattggcagcacc gaaaacctgaaacatcagccgggcggcggcaaagtgcagattattaacaaaaaactggat ctgagcaacgtgcagagcaaatgcggcagcaaagataacattaaacatgtgccgggcggc ggcagcgtgcagattgtgtataaaccggtggatctgagcaaagtgaccagcaaatgcggc agcctgggcaacattcatcataaaccgggcggcggccaggtggaagtgaaaagcgaaaaa ctggattttaaagatcgcgtgcagagcaaaattggcagcctggataacattacccatgtg ccgggcggcggcaacaaaaaaattgaaacccataaactgacctttcgcgaaaacgcgaaa gcgaaaaccgatcatggcgcggaaattgtgtataaaagcccggtggtgagcggcgatacc agcccgcgccatctgagcaacgtgagcagcaccggcagcattgatatggtggatagcccg cagctggcgaccctggcggatgaagtgagcgcgagcctggcgaaacagggcctg

Reference https://www.bioinformatics.org/sms2/rev_trans.html

  1. Codon optimization.
    • Codon optimization is conducted to increase the efficiency of expression. For example, although each amino acid has more than one codon, their efficiency varies, therefore, the optimization aims to choose the most efficient codons for increased translation efficiency and stable mRNA structure.
    • I chose E.coli to optimize the codon for, because it of its fast replication and versatile applications
    • Tau Protein DNA sequence with Codon-Optimization
    • Tau protein optimized coding sequence

GC=59.89%, CAI=0.90

ATGGCGGAACCGCGCCAGGAGTTCGAAGTGATGGAAGATCATGCGGGCACCTATGGCCTGGGCGATCGTAAAGATCAGGGCGGCTACACGATGCATCAGGATCAGGAAGGCGATACCGATGCAGGCCTGAAAGAAAGCCCGCTGCAGACCCCGACCGAAGATGGTAGCGAAGAACCGGGCAGCGAAACCAGCGATGCGAAAAGCACCCCGACCGCCGAAGATGTTACCGCCCCTTTAGTGGATGAAGGCGCGCCGGGCAAACAGGCGGCGGCCCAGCCGCATACCGAAATTCCGGAAGGCACGACCGCGGAAGAAGCGGGCATTGGCGATACCCCGAGCCTGGAAGATGAAGCAGCGGGTCACGTGACCCAGGAACCGGAAAGCGGCAAAGTTGTGCAGGAAGGCTTTCTGCGCGAGCCGGGACCGCCCGGCCTGAGCCATCAACTGATGAGCGGCATGCCGGGTGCGCCGTTACTGCCGGAAGGCCCGCGCGAAGCCACCCGCCAGCCGAGCGGCACGGGCCCGGAAGATACCGAAGGCGGCCGTCATGCGCCGGAACTGCTGAAACATCAGCTGCTGGGCGATCTGCATCAGGAAGGCCCGCCGCTGAAAGGCGCGGGTGGCAAAGAACGTCCGGGCAGCAAAGAAGAAGTGGATGAAGATCGTGATGTGGATGAAAGCAGCCCGCAGGATAGCCCGCCGAGCAAAGCCAGCCCGGCCCAGGATGGCCGTCCGCCGCAAACCGCGGCACGTGAAGCCACCTCAATTCCGGGCTTCCCGGCGGAAGGCGCGATTCCGCTGCCGGTGGATTTCCTGAGCAAAGTGAGCACCGAAATTCCGGCGAGCGAACCGGATGGCCCGAGCGTGGGTCGCGCCAAAGGCCAGGATGCGCCGCTGGAATTCACCTTTCATGTGGAAATTACCCCGAACGTGCAGAAAGAACAGGCGCATAGCGAAGAGCATCTGGGACGCGCGGCCTTTCCGGGCGCGCCGGGTGAAGGTCCGGAAGCGCGCGGTCCGTCTCTGGGCGAAGATACGAAAGAAGCGGATCTGCCGGAACCGAGCGAAAAACAGCCGGCGGCGGCGCCGCGCGGTAAACCGGTGAGCCGCGTTCCGCAACTGAAAGCGCGCATGGTTTCGAAATCAAAAGATGGCACGGGCAGCGACGATAAAAAAGCCAAAACCAGCACCCGCAGCAGTGCCAAAACCCTGAAAAACCGCCCGTGCCTGAGCCCGAAACATCCGACGCCGGGCAGCAGCGATCCGCTGATTCAGCCGAGCTCTCCGGCGGTTTGTCCTGAACCGCCGTCAAGTCCGAAATATGTTAGCAGCGTTACCAGCCGCACCGGCTCAAGCGGCGCCAAAGAAATGAAACTGAAAGGTGCCGATGGTAAAACTAAAATTGCGACCCCGCGCGGCGCGGCCCCGCCGGGCCAGAAAGGCCAGGCGAACGCAACCCGCATTCCGGCGAAAACCCCGCCGGCGCCGAAAACCCCGCCGAGTTCAGGTGAACCGCCGAAAAGCGGCGATCGCTCAGGCTATAGTAGCCCGGGCAGCCCGGGGACCCCGGGCAGCCGTTCACGTACCCCGAGCCTGCCGACCCCGCCGACTCGTGAACCGAAAAAAGTCGCCGTGGTACGCACCCCGCCGAAAAGCCCGTCGTCGGCGAAAAGCCGCCTGCAGACCGCGCCGGTTCCGATGCCGGATCTGAAAAATGTGAAAAGCAAAATTGGCTCTACCGAAAACCTGAAACACCAGCCGGGAGGCGGCAAAGTGCAAATCATTAATAAAAAACTGGATCTGTCAAACGTGCAATCAAAATGCGGTTCGAAAGATAACATTAAACATGTTCCGGGTGGCGGCTCGGTGCAGATTGTGTATAAACCCGTGGATCTGAGCAAAGTTACCTCGAAGTGTGGATCTCTGGGCAATATCCATCATAAACCGGGCGGCGGCCAGGTTGAAGTTAAATCTGAAAAACTGGATTTTAAAGATCGCGTGCAGAGCAAAATTGGCAGCCTGGATAATATCACCCATGTGCCGGGCGGCGGCAACAAAAAAATTGAAACCCATAAACTGACCTTTCGCGAAAATGCCAAAGCGAAAACCGATCACGGTGCGGAAATTGTTTATAAAAGCCCGGTTGTTAGCGGTGATACGAGCCCGCGTCATCTGTCGAACGTTAGCTCAACCGGTAGCATTGATATGGTGGATAGCCCGCAACTGGCGACGCTGGCCGATGAAGTGTCGGCGTCCCTGGCGAAACAGGGTCTG

Reference: https://en.vectorbuilder.com/tool/codon-optimization/6781bd12-7b93-4071-a97a-e6f9b8f17287.html

Part 4: Prepare a Twist DNA Synthesis Order

  1. Create a Twist account and a Benchling account [DONE]
  2. Build Your DNA Insert Sequence

Sharing link in Benchling: https://benchling.com/s/seq-LWrWyNWwQivMCCt0o4Ra?m=slm-qbMaWSkeMRm5NZ7JzEJw

  • SBOL Image
SBOL SBOLmy_first_plasmid my_first_plasmid

Part 5: DNA Read/Write/Edit

DNA Read
  1. What DNA would you want to sequence (e.g., read) and why?
  • The gene I’m interested in is the APP (Amyloid Beta Precursor Protein) Gene, this gene is involved in Alzheimer disease. I chose this gene because I’m interested in using synthetic biology to understand neurodegenerative disorders, especially Alzheimer’s disease.
  1. In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why?
  • I chose PacBio sequencing technology, it is a third-generation sequencing technology, that have the ability to produce long and higly accurate DNA reads. It is based on single molecule real-time (SMRT) sequencing principle.
  • for preparing the input, the DNA is prepared into the SMRTbell library by ligating hairpin adapters to double-stranded DNA on both ends, forming a circular template. Primers and polymerases are added to this library, which is loaded onto the sequencing instrument that contains the SMRT Cell and ZMWs. A single template DNA is immobilized in each ZMW. As the polymerase adds fluorescently labeled nucleotides into the growing DNA strand, light is emitted. This light emission is measured in real time and these signals are converted into nucleotide sequences.
  • the first step is library costruction which involves several steps to prepare DNA for sequencing:
    • DNA is cleaved into fragments of the desired size and it undergoes end repair.
    • Then, adaptors with hairpin structures are ligated to both ends of the DNA fragments which creates single-stranded circular structures called SMRTbell templates.
    • Finally, the templates are purified and loaded onto the PacBio sequencing instrument.
  • the output is fluorescent signals that are translated into base sequences then alignment and assembly are conducted
DNA Write
  1. What DNA would you want to synthesize (e.g., write) and why?
  • I would synthesize genetic circuit that sense the presence of high amount of hyperphosphorylation in the brain for example.
  1. What technology or technologies would you use to perform this DNA synthesis and why?
  • I would choose oxford nanopore for synthesizing the genetic circuit.
DNA Edit
  1. What DNA would you want to edit and why?
  • I would want to edit a gene that have a disease-causing mutation.
  1. What technology or technologies would you use to perform these DNA edits and why?
  • I would use CRISPR-Cas to edit the gene, the main steps involves designing gRNA that will guide the Cas9 to cut the specific site

Week 4 HW: Protein Design Pt. 1

tauprotein tauprotein

Part A: Conceptual Questions

  1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)

  2. Why do humans eat beef but do not become a cow, eat fish but do not become fish?

  3. Why are there only 20 natural amino acids? Ref: https://www.chemistryworld.com/features/why-are-there-20-amino-acids/3009378.article

  4. Can you make other non-natural amino acids? Design some new amino acids.

  5. Where did amino acids come from before enzymes that make them, and before life started?

  6. If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?

  7. Can you discover additional helices in proteins?

  8. Why are most molecular helices right-handed?

  9. Why do β-sheets tend to aggregate? What is the driving force for β-sheet aggregation?

  10. Why do many amyloid diseases form β-sheets? Can you use amyloid β-sheets as materials?

  11. Design a β-sheet motif that forms a well-ordered structure.

Part B: Protein Analysis and Visualization

  1. Briefly describe the protein you selected and why you selected it.
  • I select tau protein. It is a microtubule-associated protein that promotes microtubule assembly and stability, and might be involved in the establishment and maintenance of neuronal polarity. In neurodegeneration, this protein becomes hyperphosphorylated, detaches from microtubules, and aggregates into toxic, insoluble neurofibrillary tangles (NFTs). Since I’m interested in using synthetic biology to understand more neurodegenerative disorders, this protein is of interest.

  • Another protein is Amyloid-beta precursor protein.

  • It may play a role in postsynaptic function. The C-terminal gamma-secretase processed fragment, ALID1, activates transcription activation through APBB1 (Fe65) binding. Couples to JIP signal transduction through C-terminal binding. May interact with cellular G-protein signaling pathways. Can regulate neurite outgrowth through binding to components of the extracellular matrix such as heparin and collagen I. The gamma-CTF peptide, C30, is a potent enhancer of neuronal apoptosis.

  1. Identify the amino acid sequence of your protein.

    Tau Protein

  • Sequence Length: 758 amino acids

  • The most common amino acid is: Proline, which appears 93 times.

  • Amino Acid Frequencies: P: 93 (12.27%) G: 82 (10.82%) S: 79 (10.42%) K: 64 (8.44%) A: 60 (7.92%) E: 59 (7.78%) T: 50 (6.60%) D: 43 (5.67%) L: 43 (5.67%) V: 41 (5.41%) Q: 33 (4.35%) R: 30 (3.96%) H: 20 (2.64%) I: 20 (2.64%) N: 13 (1.72%) M: 9 (1.19%) F: 9 (1.19%) Y: 6 (0.79%) C: 4 (0.53%)

  • there are 10 protein homologs

  • yes, it belongs to the microtubule-associated protein family

    APP

  • Sequence length: 650 Amino Acids (fasta file: https://rest.uniprot.org/uniprotkb/P51693.fasta)

  • The most common amino acid is L appearing 70 times

  • Amino Acid Frequencies: L: 70 E: 67 P: 61 R: 57 G: 51 A: 51 Q: 46 S: 43 V: 36 D: 26 T: 23 H: 22 M: 16 I: 16 F: 14 K: 14 C: 12 Y: 11 N: 10 W: 4

  • 250 homolgs (reference: https://www.uniprot.org/blast/uniprotkb/ncbiblast-R20260307-104740-0382-64643955-p1m/overview)

  • it belongs to the APP family.

  1. Identify the structure page of your protein in RCSB

Tau Protein

  • the structure was solved/released in 2015-07-08
  • it is a high-quality NMR structure
structure structure
  • No, it doesn’t belong to any structure classification family

APP Protein

  • Good resoltuion. Resolution: 2.60 Å

References: https://www.uniprot.org/uniprotkb/P10636/entry https://www.ebi.ac.uk/pdbe/scop/search?t=txt;q=tau%20protein

  1. Open the structure of your protein in any 3D molecule visualization software
  • Visualize the protein as “cartoon”, “ribbon” and “ball and stick”

  • the protein that will be visualized is APP (Amyloid-beta precursor protein)

  • Cartoon cartoon cartoon old

  • Ribbon ribbon ribbon old

  • Ball and Stick ballandstick ballandstick old

  • Color the protein by secondary structure. Does it have more helices or sheets?

  • The helix is colored in cyan and the sheet in magenta, from the image the helices are more

APPsecondarystructure APPsecondarystructure old

  • Color the protein by residue type. What can you tell about the distribution of hydrophobic vs hydrophilic residues?
    • The hydrophobic residues (ALA, VAL, LEU, ILE, MET, PHE, TRP, PRO) are colored in yellow
    • The hydrophilic residues (SER, THR, ASN, GLN, TYR, CYS, LYS, ARG, HIS, ASP, GLU) are colored in blue
    • GLY (neutral) is colored in white
    • From the image the hydrophilic residues are grouped together and they are more then the hydrophobic ones

1 1 old 2 2 old 3 3 old

  • Visualize the surface of the protein. Does it have any “holes” (aka binding pockets)? No, it doesn’t have any holes

Part C: Using ML-Based Protein Design Tools

I chose the Amyloid Beta-Peptide protein

C1. Protein Language Modeling
Deep Mutational Scans
  • Use ESM2 to generate an unsupervised deep mutational scan of your protein based on language model likelihoods.
  • Can you explain any particular pattern? (choose a residue and a mutation that stands out)

Fasta File

1AMC_1|Chain A|AMYLOID BETA-PEPTIDE|Homo sapiens (9606) old

DAEFRHDSGYEVHHQKLVFFAEDVGSNK old

heatmap heatmap

The heatmap shows hotspots for mutations that are beneficial or detrimental to the function of the protein. Notice the dark blue regions, where the LLR values are negative, indicating that mutations that are likely detrimental to function, and lighter yellow regions where the LLR values are positive, indicating mutations that are likely beneficial to the function of the protein. Also, note how there are dark bands running vertically indicating regions which are likely evolutionarily conserved, and brighter bands running vertically indicating regions of the protein which may in fact be preferable over the wild-type sequence. Note also, for some regions of the protein, there are amino acid mutations which are likely to be detrimental to functioning for entire regions of the protein, indicated by dark bands running horizontally along most of the protein. Similarly, we see brighter bands of yellow running horizontally, indicating almost any residue mutated to that amino acid would be preferential to the wild type.

for example replacing by is prefereable. mutations at as well are favourable

Latent Space Analysis
  • Use the provided sequence dataset to embed proteins in reduced dimensionality.
  • Analyze the different formed neighborhoods: do they approximate similar proteins?
  • Place your protein in the resulting map and explain its position and similarity to its neighbors.
latentspaceanalysis, 50% latentspaceanalysis, 50%
C2. Protein Folding
  • Fold your protein with ESMFold. Do the predicted coordinates match your original structure?
APP-ESMFold APP-ESMFold

No, they don’t match it

  • Try changing the sequence, first try some mutations, then large segments. Is your protein structure resilient to mutations? Mutation done: replacing position 22 (L) by K
mutation1 mutation1

Mutation: replacing positions 20 to 29 LPLLLPLLLL with NNNNNNNNNN

mutation2 mutation2
C3. Protein Generation
  1. Analyze the predicted sequence probabilities and compare the predicted sequence vs the original one.
  2. Input this sequence into ESMFold and compare the predicted structure to your original.

Part D: Group Brainstorm on Bacteriophage Engineering

[x] Find a group of ~3–4 students [ ] Read through the Phage Reading material listed under “Reading & Resources” below.

References colab link: https://colab.research.google.com/drive/16VrQUyOY0s-a7m07FFV2H-4UhvRD3eje?authuser=2#scrollTo=ySOWXRjTja9D https://huggingface.co/blog/AmelieSchreiber/mutation-scoring

Week 5 HW: Protein Design Pt. 2

Part A: SOD1 Binder Peptide Design (From Pranam)

Pt 1: Generate Binders with PepMLM

  1. Begin by retrieving the human SOD1 sequence from UniProt (P00441) and introducing the A4V mutation.
  2. Using the PepMLM Colab linked from the HuggingFace PepMLM-650M model card:
  3. Generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence.
  4. To your generated list, add the known SOD1-binding peptide FLYRWLPSRRGG for comparison.
  5. Record the perplexity scores that indicate PepMLM’s confidence in the binders.

human SOD1 sequence

MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

A4V mutation

MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Generated Peptides

PeptidesperplexityType
WHYGAAGARLKE10.803203Generated peptide No. 1
WHYPAAVAEWGK10.861847Generated peptide No. 2
WRSPATAVAHKK8.193301Generated peptide No. 3
WLYYPAALEHGE14.861894Generated peptide No. 4
FLYRWLPSRRGGSOD1-binding peptide

Colab Link: https://colab.research.google.com/drive/1mFeOfeeTxAycc_tvqmw2YbZpVIvX6E3E?authuser=2#scrollTo=VtfbXYndhyle

Pt 2: Evaluate Binders with AlphaFold3

  1. Navigate to the AlphaFold Server: alphafoldserver.com
  2. For each peptide, submit the mutant SOD1 sequence followed by the peptide sequence as separate chains to model the protein-peptide complex.
  3. Record the ipTM score and briefly describe where the peptide appears to bind. Does it localize near the N-terminus where A4V sits? Does it engage the β-barrel region or approach the dimer interface? Does it appear surface-bound or partially buried?
  4. In a short paragraph, describe the ipTM values you observe and whether any PepMLM-generated peptide matches or exceeds the known binder.

ipTM measures the accuracy of the predicted relative positions of the subunits within the complex. Values higher than 0.8 represent confident high-quality predictions, while values below 0.6 suggest likely a failed prediction. ipTM values between 0.6 and 0.8 are a gray zone where predictions could be correct or incorrect.

WHYGAAGARLKE

ipTM = 0.3 since the value is below 0.6, it suggest likely a failed prediction.

firstpeptide firstpeptide

WHYPAAVAEWGK

ipTM = 0.28 since the value is below 0.6, it suggest likely a failed prediction.

secondpeptide secondpeptide

WRSPATAVAHKK

ipTM = 0.6 the value is 0.6 and it is highest value obtained suggesting it could be correct or incorrect

third-peptide third-peptide

WLYYPAALEHGE

ipTM = 0.32 since the value is below 0.6, it suggest likely a failed prediction.

Forthpeptide Forthpeptide

FLYRWLPSRRGG

ipTM = 0.37 since the value is below 0.6, it suggest likely a failed prediction.

control control

Link: https://alphafoldserver.com/fold/44913f6ed245c97c

Pt 3: Evaluate Properties of Generated Peptides in the PeptiVerse

  1. Paste the peptide sequence.
  2. Paste the A4V mutant SOD1 sequence in the target field.
  3. Check the boxes
    • Predicted binding affinity
    • Solubility
    • Hemolysis probability
    • Net charge (pH 7)
    • Molecular weight

Compare these predictions to what you observed structurally with AlphaFold3. In a short paragraph, describe what you see. Do peptides with higher ipTM also show stronger predicted affinity? Are any strong binders predicted to be hemolytic or poorly soluble? Which peptide best balances predicted binding and therapeutic properties?

Choose one peptide you would advance and justify your decision briefly.

WHYGAAGARLKE

PropertyPredictionValueUnit
SolubilitySoluble1.000Probability
HemolysisNon-hemolytic0.025Probability
Binding AffinityWeak binding5.546pKd/pKi
Molecular weight1358.5Da
Net Charge (pH 7)0.85
ipTM0.3

WHYPAAVAEWGK

PropertyPredictionValueUnit
SolubilitySoluble1.000Probability
HemolysisNon-hemolytic0.023Probability
Binding AffinityWeak binding5.037pKd/pKi
Molecular weight1414.6Da
Net Charge (pH 7)-0.15
ipTM0.28

WRSPATAVAHKK

PropertyPredictionValueUnit
SolubilitySoluble1.000Probability
HemolysisNon-hemolytic0.011Probability
Binding AffinityWeak binding4.520pKd/pKi
Molecular weight1351.6Da
Net Charge (pH 7)2.85
ipTM0.6

WLYYPAALEHGE

PropertyPredictionValueUnit
SolubilitySoluble1.000Probability
HemolysisNon-hemolytic0.037Probability
Binding AffinityWeak binding5.588pKd/pKi
Molecular weight1448.6Da
Net Charge (pH 7)-2.14
ipTM0.32

FLYRWLPSRRGG

PropertyPredictionValueUnit
SolubilitySoluble1.000Probability
HemolysisNon-hemolytic0.047Probability
Binding AffinityWeak binding5.968pKd/pKi
Molecular weight1507.7Da
Net Charge (pH 7)2.76
ipTM0.37

Link: https://huggingface.co/spaces/ChatterjeeLab/PeptiVerse

Pt 4: Generate Optimized Peptides with moPPIt

moPPit peptides first peptide generated with moPPit

  1. Open the moPPit Colab linked from the HuggingFace moPPIt model card
  2. Make a copy and switch to a GPU runtime.
  3. In the notebook:
    • Paste your A4V mutant SOD1 sequence.
    • Choose specific residue indices on SOD1 that you want your peptide to bind (for example, residues near position 4, the dimer interface, or another surface patch).
    • Set peptide length to 12 amino acids.
    • Enable motif and affinity guidance (and solubility/hemolysis guidance if available). Generate peptides.
  4. After generation, briefly describe how these moPPit peptides differ from your PepMLM peptides. How would you evaluate these peptides before advancing them to clinical studies?
Generated PeptidesHemolysisSolubilityAffinityMotif
HMCVNYQKKTKN0.98124329186975960.83333331346511846.31769609451293950.7024670839309692
STDTCTGRFKQK0.96490772068500520.91666668653488165.7859306335449220.8285709619522095
KKKTYSKKGDFY0.97407522983849050.91666668653488165.85596990585327150.56052565574646

Link: https://colab.research.google.com/drive/1Ie8j4XEG3AOVj37FhpHVrb8fNjl1bMrv?authuser=2#scrollTo=hpBXJwHg4ZRz

Part B: BRD4 Drug Discovery Platform Tutorial (Gabriele)

  • Pt 0: Sign-up to Boltz Lab
  • Pt 1: Structural Predictions in the Sandbox
CmpoundBinding ConfidenceOptimization ScoreStructure Confidence
Hit0.450.230.98
Lead0.750.260.98
JQ10.960.440.99

Discussion Questions

  • Does Binding Confidence increase as you move from hit to clinical candidate? What would you expect, and why might it deviate? Binding confidence which means how confidently the ligand is placed in the binding site is higher when JQ1 was chosen as the ligand and lower in hit

  • Inspect the predicted binding pose for JQ1. Can you identify potential key binding interactions.

  • Compare the Optimization Scores. How do the scores compare for JQ1 vs the Lead. Optimization score for JQ1 is 0.44 while 0.26 for Lead, indicating higher tight binding with JQ1\

  • Pt 2: Setting Up a BRD4 Design Project the predicted structure from boltz vs from RCSB

Boltz

boltz boltz

RCSB

rcsb rcsb

Pocket Structure Prediction

pocket-prediction pocket-prediction
  • Pt 3: Running Your Virtual Screen

Generating 1000 binders

generative-design generative-design
  • Pt 4: Analysis and Discussion

high confidence binders only one high-cofidence-binder high-cofidence-binder

high high

moderate confidence 23 binders

moderate moderate

low confidence / non-binders 621 binders

low low

Links:

  1. https://lab.boltz.bio/app/nour-abdelrahman-htgaa-Uz4g/p/brd-4-workshop-f6Wt/experiments/generative-binders-k84A/virtual-screens/fa4123a5-cc15-4cc4-8b5c-287349f74144/overview
  2. https://www.rcsb.org/structure/3MXF

Week 6 HW: Genetic Circuits Pt.1

DNA Assembly

  1. What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?

The components of the Phusion High-Fidelity PCR Master Mix are the following:

  • Phusion DNA Polymerase, incorporates nucleotides to “fill in” the gaps in the annealed DNA fragments. it is a hot-start, proofreading PCR enzyme, enabling generation of PCR amplicons with high sequence accuracy, sensitivity, and specificity. Phusion DNA Polymerase is a thermostable polymerase that possesses 5´→ 3´ polymerase activity, 3´→ 5´ exonuclease activity and will generate blunt-ended products.
  • nucleotides: building blocks for new DNA strands during amplification.
  • Buffer: it provides the optimal pH, ionic strength, and Mg²⁺ concentration (1.5 mM final) required for Phusion DNA polymerase to bind primers, extend DNA efficiently, and maintain its high fidelity
  1. What are some factors that determine primer annealing temperature during PCR?
  • the specific primer annealing temperature depends on specific length and sequence of the primers.
  • it depends also on melting temperature of the primers and therefore GC content TM = 4(G + C) + 2(A+T)
  1. There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other.

Restriction Enzyme Digest

  • It is a process in which DNA is cut at specific sites, dictated by the surrounding DNA sequence.
  • is accomplished by incubation of the target DNA molecule with restriction enzymes - enzymes that recognize and bind specific DNA sequences and cleave at specific nucleotides either within the recognition sequence or outside of the recognition sequence.
  • Restriction digestion can result in the production of blunt ends (ends of a DNA molecule that end with a base pair) or sticky ends
  • Restriction digestion is usually used to prepare a DNA fragment for subsequence molecular cloning
  • The results of a restriction digestion can be evaluated by gel electrophoresis, in which the products of the digestion are separated by molecule length
  • The components of a typical restriction digestion reaction include the DNA template, the restriction enzyme of choice, a buffer and sometimes BSA protein. The reaction is incubated at a specific temperature required for optimal activity of the restriction enzyme and terminated by heat.
    1. reaction mixing, incubation at specific temperatures and time

PCR

  • is a method for amplifying DNA. millions of copies of a DNA sequence can be generated from a single copy or just a few copies of DNA
  • PCR protocols consist of assembling a PCR reaction mix containing Taq polymerase (a thermostable DNA polymerase that can withstand the high temperatures required for thermal cycling), primers (short DNA sequences that define the target region for amplification), deoxynucleotide triphosphates (dNTPs, the building blocks of DNA) and MgCl2 (Taq polymerase co-factor) in a buffered solution. The reaction mixture undergoes three basic thermal cycling steps including (1) denaturation (usually at 95°C, (2) annealing (usually lowest primer melting temperature - 5°C, (3) extension (usually at 72°C).
    1. components mixing , 2. denaturation step where the double stranded DNA denatures into single strands, annealing for primers annealing then extension where DNA polymerase extends the DNA creating million copies of required DNA fragment
  1. How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning?
  • after generating dna fragments by pcr, run agarose gel to check for size and yield
  1. How does the plasmid DNA enter the E. coli cells during transformation?

    • Heat shock transformation is also known as chemical transformation and calcium chloride transformation. This method involves subjecting the cells to a sudden increase in temperature, often achieved by briefly immersing them in hot water or placing them in a heating block, followed by a rapid decrease in temperature through incubation on ice. The heat shock causes the cell membrane to become more permeable, facilitating the uptake of exogenous DNA.

    • Plasmid uptake by chemically competent cells is facilitated by heat shock, and plasmid uptake by electrocompetent cells is facilitated by electroporation.

      transformation transformation
    • Electroporation transformation

  • Bacterial transformation aided by electroporation is called electroporation transformation; electroporation involves using an electroporator to subject competent cells and the plasmid carrying DNA construct to a brief pulse of a high-voltage electric field (Figure 3B). This treatment induces transient pores in cell membranes, which permits plasmid entry into the cells

  • One of the main issues with electroporation is arcing, or electric discharge, which may lower cell viability and transformation efficiency. Arcing often results from electroporation in conductive buffers, such as those containing MgCl2 and phosphates.

    • it enters through creating pores in bacterial membrane, these pores can be created by either heat shock
    electroporation electroporation
  1. Describe another assembly method in detail (such as Golden Gate Assembly)

References

  1. https://www.neb.com/en/products/m0531-phusion-high-fidelity-pcr-master-mix-with-hf-buffer?srsltid=AfmBOopKylwQ43HL-LppGa9GH2B6iMXIrXuwRITgpYMEq3PQqljrIYko
  2. https://www.thermofisher.com/eg/en/home/life-science/cloning/cloning-learning-center/invitrogen-school-of-molecular-biology/pcr-education/pcr-reagents-enzymes/pcr-cycling-considerations.html
  3. https://www.genscript.com/what-is-restriction-digestion.html
  4. https://www.genscript.com/what-is-pcr.html
  5. https://www.addgene.org/protocols/gibson-assembly/
  6. https://www.thermofisher.com/eg/en/home/life-science/cloning/cloning-learning-center/invitrogen-school-of-molecular-biology/molecular-cloning/transformation/bacterial-transformation-workflow.html

Asimov Kernel

  1. Create a Repository for your work
  2. Create a blank Notebook entry to document the homework and save it to that Repository
  3. Explore the devices in the Bacterial Demos Repo to understand how the parts work together by running the Simulator on various examples, following the instructions for the simulator found in the “Info” panel (click the “i” icon on the right to open the Info panel)
  4. Create a blank Construct and save it to your Repository
    1. Recreate the Repressilator in that empty Construct by using parts from the Characterized Bacterial Parts repository
    2. Search the parts using the Search function in the right menu
    3. Drag and drop the parts into the Construct
    4. Confirm it works as expected by running the Simulator (“play” button) and compare your results with the Repressilator Construct found in the Bacterial Demos repository
  5. Document all of this work in your Notebook entry - you can copy the glyph image and the simulator graphs, and paste them into your Notebook
  6. Build three of your own Constructs using the parts in the Characterized Bacterials Parts Repo
    1. Explain in the Notebook Entry how you think each of the Constructs should function
    2. Run the simulator and share your results in the Notebook Entry
    3. If the results don’t match your expectations, speculate on why and see if you can adjust the simulator settings to get the expected outcome

References

Week 7 HW: Genetic Circuits Pt.2

Part 1: Intracellular Artificial Neural Networks (IANNs)

  1. What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions?
  • The main advantage IANNs hold over traditional genetic circuits is scalability and the ability to support multilayer networks for complex decision-making. Traditional genetic circuits limitations include poor predictability and the struggle to reliably program multiple functions simultaneously due to inherent scalability limitations. On the other hand, ANNs have good predictability offering improved robustness for complex designs. Because of multiple layers and non-linear activations, neural networks can model complex, non-linear decision boundaries
  • Traditional genetic circuits have input/output behaviors that function as Boolean operations. They process discrete signals (ON/OFF, high/low expression) through logic gates like AND, OR, and NOT, producing binary outputs based on truth tables. Moreover, the output layer in the ANNs producing the final prediction may be binary, multi-class or a continuous value.
  1. Describe a useful application for an IANN; include a detailed description of input/output behavior, as well as any limitations an IANN might face to achieve your goal.
  • Application of CNNs: tumor and MSI detection in gastrointestinal cancer
    • Convolutional Neural Networks (CNNs) are deep learning models designed to analyze structured grid-like data such as images.
    • the CNNs were used as automatic tumor detector to predict MSI (Microsatellite instability) that determines if the patient with gastrointestinal cancer will respond will to immunotherapy. The authors used hematoxylin and eosin (H&E)-stained histology slides as an input
    • For tumor detection in gastrointestinal cancer, the authors trained a convolutional neural network with deep residual learning (resnet18)12 model to classify tumor versus normal tissue by transfer learning. Transfer learning means reusing a pre-trained neural network model on a new but related task, instead of training from scratch. For MSI detection, we trained another resnet18 model for each tumor type.
  • input/output behavior
    • Input: Tiles extracted from digitized histology slides.
    • Output: For each tile, a probability score indicating tumor vs. normal or MSI vs. MSS status.
    • Behavior: The neural network processes image features within each tile to generate these probability scores, enabling localized tissue characterization and subsequent patient-level molecular classification.
  • The mentioned limitations of CNN were:
    • Classifying ability is limited to cancer type and ethnicity in the training set. therefore, larger training cohorts are needed to boost classification performance because rare morphological variants can be learned by the network
    • The required tissue size. To define its lower limit, they generated ‘virtual biopsies’ and found that performance plateaued at approximately 100 tiles of 256 μm edge length, suggesting that biopsies are sufficient for MSI prediction
  1. Below is a diagram depicting an intracellular single-layer perceptron where the X1 input is DNA encoding for the Csy4 endoribonuclease and the X2 input is DNA encoding for a fluorescent protein output whose mRNA is regulated by Csy4. Tx: transcription; Tl: translation.
multilayer-perceptron multilayer-perceptron

References

  1. https://www.geeksforgeeks.org/deep-learning/what-is-perceptron-the-simplest-artificial-neural-network/
  2. https://www.sciencedirect.com/science/article/pii/S0303264724000492?ref=pdf_download&fr=RR-2&rr=9e292d67be62edc7
  3. https://www.geeksforgeeks.org/deep-learning/convolutional-neural-networks-cnns-in-r/
  4. https://pmc.ncbi.nlm.nih.gov/articles/PMC7423299/

Part 2: Fungal Materials

  1. What are some examples of existing fungal materials and what are they used for? What are their advantages and disadvantages over traditional counterparts?
  • Rigid fungal composites

    • they are created by combining fungi with lignocellulosic fibers or particles, producing materials with varying properties depending on the used finishing method (e.g., hot/cold pressing) , followed by the used substrate, as well as fungal species and strains, particularly the growth behavior and hyphal type, besides substrate nutritional profile and growth conditions
    • Disadvantages: their mechanical strength and moisture uptake limit their use primarily to non-weight-bearing applications, such as interior panels and acoustic absorption
    • Advantages: biodegradable and have demonstrated potential in architectural designs
    • Examples
      • Mycotectural Alpha (2009): Utilized G. lucidum-bound sawdust for its construction.
    Mycotectural Alpha Mycotectural Alpha
    • Hy-Fi (2014): A cluster of circular towers made from mycelium-based bricks.
    Hy-Fi Hy-Fi
    • MycoTree (2017): Featured mycelium-bound composite blocks in its installation.
    mycotree mycotree
    • Growing Pavilion (2020): Incorporated Ganoderma lingzhi mycelium composite panels mounted on wooden frames.
    pavilion pavilion
    • My-Co Space (2021): Showcased elements of hemp-grown F. fomentarius on a supporting structure.
my-co my-co
  • Flexible Fungal Materials
    • Flexible fungal materials have diverse applications, including fungal wound dressings (e.g., F. fomentarius), medical cell scaffolds, paper like materials, fungal chitin nanomaterials, filters for water treatment, and meat analogs
    • Disadvantages: limited availability and fragility
    • Advantages: sustainable, biodegradable, and customizable, and their properties depend on the fungal strain, substrate, growth regime, and post-processing techniques (e.g., drying, pigmenting, plasticizing) for enhancing (microbiological) robustness and appearance. Mycelium-based foams and leather alternatives, made from agricultural waste, are cruelty-free and more eco-friendly than traditional materials, as they generate less pollution and use less water. They are lightweight, offer good thermal and acoustic insulation, and require fewer resources to produce
  1. What might you want to genetically engineer fungi to do and why? What are the advantages of doing synthetic biology in fungi as opposed to bacteria?
  • Filamentous fungi are considered as unique cell factories for protein production due to the high efficiency of protein secretion and superior capability of post-translational modifications. Therefore, they can be engineered to secrete proteins with higher efficiency.

  • Genetically engineered fungi have diverse applications across food, industry, medicine, and agriculture due to their eukaryotic biology and secretion capabilities.

  • Applications:

    • in food production include production of high-protein, meat-like alternatives with enhanced nutrition
    • industrial enzymes where fungi serve as cell factories for secreted enzymes like glucoamylase or cellulases used in biofuels, detergents, and food processing
    • Pharmaceuticals where fungi produce secondary metabolites (antibiotics, anticancer drugs)
  • Both bacteria and fungi have their unique properties in synthetic biology. Synthetic biology in fungi offers key advantages over bacteria, particularly for complex eukaryotic pathways, due to their eukaryotic machinery and natural industrial traits.

    • Bacteria is a prokaryote with simple cell wall and fast growth rate. they are versatie and easily genetically manipulated. They are extensively used in the production of antibiotics, enzymes, and biofuels.

    • In agriculture, bacteria serve as biofertilizers and biopesticides, enhancing soil fertility and protecting crops from pests and diseases.

    • In medicine, bacteria are harnessed to produce therapeutic proteins and vaccines, and they are central to the development of new antibiotics

    • Fungi is an eukaryote with thick chitinous cell wall and slow growth rate. They have long been utilized in biotechnology for their ability to produce a wide range of metabolites, including antibiotics, enzymes, and organic acids. Fungi grow on cheap, complex substrates like lignocellulose or waste, reducing costs compared to bacteria’s need for purified sugars.

    • They have applications in many sectors, for example:

    • Food industry: production of bread, beer, and cheese.

    • Agriculture: they improve agricultural crop yield and quality by enhancing plant physiology and stress tolerance.

    • Environmental sustainability: they play a significant role by decomposing organic matter, thus recycling nutrients in ecosystems.

    • Medicine: they are sources of important pharmaceuticals, such as penicillin, and are being explored for their potential in developing new drugs.

References:

  1. https://www.frontiersin.org/journals/bioengineering-and-biotechnology/articles/10.3389/fbioe.2020.00293/full
  2. https://www.mdpi.com/2673-8856/4/4/30
  3. https://pmc.ncbi.nlm.nih.gov/articles/PMC12565570/#abstract1

Week 9 HW: Cell Free Systems

Part A: General and Lecturer-Specific Questions

General homework questions

  1. Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production.
  • The main advantages of cell-free protein synthesis (CFPS) over traditional in vivo methods include

    • Cell-free systems do not need time-consuming cloning steps
    • Easy manipulation of reaction conditions
    • High-throughput potential
    • Synthesis of difficult to express proteins, such as toxic and transmembrane proteins. In addition, the absence of the cellular membrane allows the synthesis of modified proteins with statistically as well as sitespecifically embedded non-canonical amino acids
    • CFPS) is easily adaptable to the translational requirements of a particular target protein, and the synthesis conditions can be adjusted for a desired subsequent analytical setup
    • Novel automated high-throughput systems are being developed due to the simple handling of liquids and the easy scalability of cell-free reactions
    • Via the removal of the cell membranes and redundant parts of cells, CFPS has provided flexibility in directly dissecting and manipulating the Central Dogma with rapid feedback. non-native chemicals can be introduced directly into the system, allowing greater flexibility in the selection of regulating reagents
    • Such an open nature of the CFPS enables the first-ever programming of modular cellular mimicking processes with active transcription and translation support.
    • ease of use, rapid protein production, and minimal requirements for lab space, equipment, and expertise compared to traditional methods
    • Flexibility: The cell-free expression system can utilize various template DNAs, including PCR products, plasmid DNA, and synthetic DNA, making it suitable for expressing different types of proteins.
  • two cases where cell-free expression is more beneficial than cell production.

    • cell-free protein expression lets researchers incorporate unnatural labels or amino acids into targets of interest, as well as express toxic proteins
    • The accessibility of cell-free reactions enables optimization impossible in cells. Researchers can directly adjust pH, ionic strength, redox potential, metal ion concentrations, or temperature without considering cellular viability. Specific folding catalysts, chaperones, or cofactors can be added at precise concentrations. For disulfide-bonded proteins, the oxidation-reduction balance can be fine-tuned by adding specific ratios of reduced and oxidized glutathione. For metalloproteins, appropriate metal ions can be supplemented. This level of control over the biochemical environment enables optimization of yield and proper folding for challenging targets that fail in standard cellular environments.
  1. Describe the main components of a cell-free expression system and explain the role of each component. There are three fundamental components:
  • Cell-Free Extract: This is the heart of the system, containing the essential cellular components for protein synthesis, such as ribosomes, tRNA, amino acids, and enzymes. The source of the extract can vary, with commonly used ones including E. coli, rabbit reticulocyte, and wheat germ extracts.
  • DNA Template: Researchers provide the genetic information for the desired protein in the form of a DNA template. This template typically contains a promoter sequence to initiate transcription and a coding sequence for the target protein.
  • Energy and Cofactors: Energy sources (e.g., ATP, GTP) and cofactors (e.g., magnesium ions) are supplied to facilitate transcription and translation processes.
  1. Why is energy provision regeneration critical in cell-free systems? Describe a method you could use to ensure continuous ATP supply in your cell-free experiment.

Energy regeneration is critical in cell-free systems because protein synthesis is ATP-intensive, and there is no living metabolism in the reaction mix to continuously replenish ATP as a cell would. Without regeneration, ATP is depleted quickly, protein synthesis slows or stops, and yield drops; stable energy supply is also a major determinant of reaction duration and cost.

Current energy module engineering solutions:

ATP regeneration systems for CFPS

  • using phosphoenolpyruvate (PEP) with pyruvate kinase, creatine phosphate with creatine kinase, or acetyl phosphate with acetate kinase. These systems help maintain energy levels throughout the protein synthesis process, significantly improving yield and duration of the reaction.
    • enzymes like creatine kinase or acetate kinase that regenerate ATP from ADP using high-energy phosphate donors.
    • Secondary energy sources such as phosphoenolpyruvate, creatine phosphate, and acetyl phosphate can be incorporated into cell-free protein synthesis systems to enhance energy availability. These compounds serve as phosphate donors in enzymatic reactions that regenerate ATP.
    • Continuous-exchange cell-free protein synthesis systems: Continuous-exchange cell-free protein synthesis systems involve the continuous supply of energy substrates and removal of inhibitory byproducts during the reaction. These systems utilize specialized reaction chambers with semi-permeable membranes that allow small molecules to diffuse while retaining larger components like ribosomes and enzymes. This approach significantly extends reaction lifetimes and increases protein yields by preventing energy depletion and byproduct accumulation that typically limit batch reactions.

Secondary energy sources and cofactors Beyond primary ATP regeneration, cell-free protein synthesis energy modules incorporate secondary energy sources and essential cofactors. These include NAD+/NADH, NADP+/NADPH, and GTP, which support various biochemical reactions during protein synthesis. Optimized ratios of these cofactors are critical for maintaining redox balance and ensuring efficient translation. Some systems also utilize glucose or maltose with appropriate enzymes to create a continuous energy supply pathway, enhancing the overall efficiency and productivity of the cell-free system.

Engineered extracts for improved energy efficiency Specially engineered cell extracts can significantly improve the energy efficiency of cell-free protein synthesis systems. These extracts are often derived from modified organisms with enhanced metabolic pathways or reduced energy-consuming side reactions. By eliminating competing pathways that deplete energy resources and optimizing the concentration of key enzymes involved in energy metabolism, these engineered extracts can sustain protein synthesis for longer periods with higher yields. Some approaches include genetic modifications to reduce phosphatase activity or enhance glycolytic flux.

  1. Compare prokaryotic versus eukaryotic cell-free expression systems. Choose a protein to produce in each system and explain why.

Prokaryotic cell-free expression systems

The E. coli based CFPS system has redefined the scale standard for protein synthesis. Its core advantage lies in the simplicity and metabolic robustness of the prokaryotic machinery, allowing for high concentration yields in batch and continuous-flow reactions. This platform is the undisputed leader in low-cost, high-throughput synthesis for projects where functional folding (e.g., disulfide bonds or glycosylation) is not a critical factor.

  • Scale Advantage: Capable of producing up to 2 mg/mL of protein, dramatically reducing the cost per gram—similar to Twist’s $0.003/bp DNA synthesis advantage.
  • Speed Metric: Protein production can be completed within 2–4 hours, allowing for parallel synthesis of hundreds of constructs in a single day.
  • Modification Niche: Highly adaptable for specialized labeling, such as efficient incorporation of non-natural amino acids (CFPS for Non-Natural Amino Acid Incorporation Service), due to easy depletion of natural amino acids in the lysate.
  • Limitation: Lacks the machinery (PDI, chaperones, microsomal membranes) for correct folding and processing of complex eukaryotic proteins, often resulting in inclusion bodies or inactive constructs.

Eukaryotic cell-free expression systems

Eukaryotic CFPS systems have specialized in overcoming the functional bottlenecks of prokaryotic expression. By retaining cell-specific endogenous elements—including ribosomes, tRNA pools, and PTM enzymes—they achieve functional integrity for complex targets, aligning with the “clinical-grade accuracy” of GenScript.

Systems based on mammalian cells (HEK293, CHO) are essential for therapeutic protein research.

  • Fidelity Core: The presence of microsomal membranes allows for co-translational or post-translational translocation, critical for synthesizing Cell-Free Membrane Protein Expression and functional antibodies.
  • Key PTM: Capable of performing initial N-glycosylation and forming correct disulfide bonds, reaching a functional correctness standard of 99% for single-chain variable fragments (scFv).
  • Limitation: Production yields are generally 5–10 times lower than E. coli systems, and the lysate preparation process is costly.
  1. How would you design a cell-free experiment to optimize the expression of a membrane protein? Discuss the challenges and how you would address them in your setup.

  2. Imagine you observe a low yield of your target protein in a cell-free system. Describe three possible reasons for this and suggest a troubleshooting strategy for each.

References

  1. https://pubmed.ncbi.nlm.nih.gov/26478227/
  2. https://www.sciencedirect.com/science/article/pii/S200103702300185X
  3. https://www.cytion.com/us/About-Cytion/Knowledge-Hub/Articles-Updates/Cell-Free-Systems-for-Protein-Production-Advantages-Over-Living-Cells/
  4. https://www.idtdna.com/pages/applications/cell-free-protein-synthesis
  5. https://www.cusabio.com/cell-free-expression-system.html
  6. https://eureka.patsnap.com/report-energy-module-engineering-for-sustainable-cell-free-protein-synthesis
  7. https://www.biosynsis.com/a-comparative-guide-prokaryotic-vs-eukaryotic-cell-free-expression-systems-for-eukaryotic-proteins.html

Week 10 HW: Imaging and Measurement

Week 11 HW: Building Genomes

Part A: The 1,536 Pixel Artwork Canvas | Collective Artwork

I contributed 123 pixels to the global artwork experiment by making HTGAA letters at the bottom left. I liked the collaborative work and that it represents all of us. I think it can be better by not allowing the replacement of anyone’s work.

pixels pixels

Part B: Cell-Free Protein Synthesis | Cell-Free Reagents

ComponentRole
E. coli Lysate
• BL21 (DE3) Star Lysate (includes T7 RNA Polymerase)
Salts/Buffer
• Potassium Glutamate
• HEPES-KOH pH 7.5
• Magnesium Glutamate
• Potassium phosphate monobasic
• Potassium phosphate dibasic
Energy / Nucleotide System
• Ribose
• Glucose
• AMP
• CMP
• GMP
• UMP
• Guanine
Translation Mix (Amino Acids)
• 17 Amino Acid Mix
• Tyrosine
• Cysteine
Additives
• Nicotinamide
Backfill
• Nuclease Free Water