Homework

Weekly homework submissions:

  • Committed listener milestones

    Milestone 1 - CL Requirements Attend Class 1 Submit HW1 Submit node preference form Milestone 2 - CL Requirements Watch the Weekly Classes & Recitations Attend at least 33% of all BioClub Node Meetings Submit the HW3 Robot Form 3 Final Project Ideas on the Shared Slide Deck & in your repo Make a post or send a chat message on the Forum Sign the Committed Listener MoU by committing it to your repo

  • Obsidian Setup for local webpage edits

    Why Obsidian? Obsidian does live rendering of markdown, so it is easier to do formatting while doing the homework Obsidian brings the webpages to the local machine, so it is easier to save files, paste screenshots and move things around between folders. What is Obsidian again? Obsidian is a powerful, local-first note-taking and knowledge management app that uses interconnected Markdown files.

  • Week 1 HW: Principles and Practices

    1 Describe a biological engineering application or tool you want to develop and why. This could be inspired by an idea for your HTGAA class project and/or something for which you are already doing in your research, or something you are just curious about. Bio engineering Tool/application: Autonomous Space Biomanufacturing Platform for Active Pharmaceutical Ingredients (APIs) My biological engineering application is an autonomous, space-based biomanufacturing and analysis system capable of producing active pharmaceutical ingredients (APIs) and nutritionally relevant biomolecules during long-duration space missions. The system integrates:

  • Week 2 HW: DNA read, write, and edit

    Week 2 - DNA read, write and edit We’re in the deep now, time to get the hands dirty. Part 0 - Basics of Gel electrophoresis - What did the small DNA fragment say to the large one? “Catch me if you can!”

  • Week 3 HW: Lab Automation

    Week 3 lab automation Assignment: Python Script for Opentrons Artwork Review Recitation Generate an artistic design using the GUI at opentrons-art.rcdonovan.com. using the coordinates from the GUI, follow the instructions in the HTGAA26 Opentrons Colab to write your own Python script which draws your design using the Opentrons. Submit form Using AI to generate an image. Image prompt: I am making agar art. I need an image reference. In the image add 3 star flares in the color blue. Lets have an “orbit” around the 3 stars in yellow color, which is an ellipse/spiral that starts thin and gets thicker. In red, lets have a NASA style triangle in the bottom right and top left that is pointing upwards and is small. Give discrete points, not a continous image, similar to bitmap. Use only the colors specified. Give top view.

  • Week 4 HW: Protein design part 1

    Answer conceptual questions Learn basic concept of protein design Brainstorm how to apply these together in the group project Part A - Conceptual questions Amino Acid & Protein Structure Q&A How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons) To find the number of molecules, we first determine the mass of protein and then convert that to moles and molecules.

  • Week 5 HW: Protein design part 2

    Design short peptides that bind mutant SOD1. Then decide which ones are worth advancing toward therapy. Uniprot SOD1 protein sequence: sp|P00441|SODC_HUMAN Superoxide dismutase [Cu-Zn] OS=Homo sapiens OX=9606 GN=SOD1 PE=1 SV=2 MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ Apply A4V mutation: (ignoring Methionine) The Bolded V used to be A MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ Pepmlm Generated peptides and their perplexity score:

Subsections of Homework

Committed listener milestones

Milestone 1 - CL Requirements

  • Attend Class 1
  • Submit HW1
  • Submit node preference form

Milestone 2 - CL Requirements

  • Watch the Weekly Classes & Recitations
  • Attend at least 33% of all BioClub Node Meetings
  • Submit the HW3 Robot Form
  • 3 Final Project Ideas on the Shared Slide Deck & in your repo
  • Make a post or send a chat message on the Forum
  • Sign the Committed Listener MoU by committing it to your repo

Obsidian Setup for local webpage edits

Why Obsidian?

  • Obsidian does live rendering of markdown, so it is easier to do formatting while doing the homework
  • Obsidian brings the webpages to the local machine, so it is easier to save files, paste screenshots and move things around between folders.

What is Obsidian again?

Obsidian is a powerful, local-first note-taking and knowledge management app that uses interconnected Markdown files.

How to use Obsidian for HTGAA?

  1. Ensure your computer has git installed
  2. Ensure Obsidian is installed
  3. setup git global username and email
  4. clone webpages repository into a directory of your choice using git clone {your https URL} git clone location git clone location
  5. Navigate to the webpages directory
  6. Do a git pull command in the terminal just in case

git pull is a command that is used to sync the repository stored in the web with the folder that is locally stored

  1. If you are on a windows based system, it might be recommended to run the following command as well - git config --global core.autocrlf true . This solves issues with how UNIX based and windows based directories handle new lines differently

  2. Open Obsidian and select “Open folder as vault” obsidian opening|400 obsidian opening|400

  3. Navigate to the {your chosen root directory}\webpages\content directory and select that as vault folder

  4. et voila, you have live rendering of your markdown files. You can even navigate using the navigation bar on the left obsidian navbar|400 obsidian navbar|400

  5. One you’re done editting and playing around, you have to commit your changes

    1. Go back to the webpages directory.
    2. run git add .
    3. git commit -m "my new changes"
    4. git push
  6. The above push needs to be done every time you make changes to the content. As good practice, please do run a git pull every time you start making changes

    The first time you run git push, you will be redirected to the pages.htgaa site to provide authentication. You will probably have to select authorize application git push authoriztion git push authoriztion

DISCLAIMER - Do this at your own risk, any and all actions you take on your computer system is completely your own responsibilty. The above is simply a documentation of a series of steps I took to get obsidian up and running.

I am sure someone else has done this before, I am just bringing this up so that people who haven’t yet discovered that obsidian can be used to edit your webpages are made aware of the same. There are a lot of housekeeping and security related things I have not gone into above. Feel free to suggest if I have made any mistakes, left things out, or if you have ideas on how to make this better.

Week 1 HW: Principles and Practices

cover image cover image

1 Describe a biological engineering application or tool you want to develop and why. This could be inspired by an idea for your HTGAA class project and/or something for which you are already doing in your research, or something you are just curious about.

Bio engineering Tool/application: Autonomous Space Biomanufacturing Platform for Active Pharmaceutical Ingredients (APIs) My biological engineering application is an autonomous, space-based biomanufacturing and analysis system capable of producing active pharmaceutical ingredients (APIs) and nutritionally relevant biomolecules during long-duration space missions. The system integrates:

  • engineered microbial or cell-free biosynthesis platforms,
  • automated microfluidic bioreactors,
  • in-situ laboratory analysis,
  • robotic sample handling and process control,
  • edge-computing-driven optimization.

The motivation for this system arises from a fundamental constraint in human space exploration: mass and resupply dependence. Pharmaceuticals degrade over time due to radiation, storage and their own degradation constraints, and resupply from Earth becomes impossible for Mars-class missions. Current mission architectures rely heavily on pre-packed medications and Earth-based analysis.

There is a NASA capability gap described in a white paper by OSMED - Organization for space medicine, engineering and design. The white paper highlights that NASA currently lacks sufficient in-situ automated laboratory analysis capability, limiting the ability to monitor biological systems, diagnose health issues, and support medical decision-making during deep-space missions. This same gap directly limits the feasibility of onboard pharmaceutical manufacturing, since production requires real-time quality verification and process monitoring.

The proposed system addresses two linked challenges:

  • Medical autonomy — astronauts can produce essential drugs (antibiotics, anti-inflammatory compounds, metabolic supplements) on demand rather than relying on pre-manufactured supplies.
  • Utilization of the space environment — microgravity and radiation environments may enable production pathways or crystallization outcomes difficult to achieve on Earth, potentially improving yield or purity for certain molecules.

From a robotics perspective, the system functions as a self-driving laboratory, where robotic handling and microfluidic automation minimize crew cognitive load — a requirement explicitly emphasized in the NASA analysis, which identifies crew time and cognitive burden as limiting factors for biological operations in deep space. It could help in space as well as in earth, manufacturing source crystals and drugs that cannot be made on earth.

Governance policy Goal: Enable autonomous biological manufacturing in space while ensuring safety, non-malfeasance, and equitable long-term use of space biotechnology.

This goal can be broken into three specific sub-goals.

  • safety and containment - Prevent unintended hamrful leak of biological growth
  • Security and responsible use - Only approved pathways can be executed. All executions transparent and open
  • open-source, architecture - Interoperable standards available to any other organization as well

3 describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”)

Governance Action 1: Certified Biological Production Libraries

Purpose Current biological research systems allow flexible modification of organisms and production pathways, which introduces safety and security risks in autonomous space environments. This action proposes restricting onboard biological manufacturing to pre-certified biological production libraries.

Design

  • Space agencies and regulatory bodies define approved biological production libraries.
  • Only certified organisms, enzymes, or cell-free pathways are permitted onboard.
  • Production protocols are digitally verified before execution.
  • Academic researchers and companies contribute to library development under safety review.

Assumptions

  • Essential pharmaceuticals can be produced using a limited set of approved biological pathways.
  • Certification processes can be updated fast enough to support mission needs.

Risks of Failure & “Success”

  • Overly restrictive certification could limit flexibility during medical emergencies.
  • Centralized approval systems may slow innovation or adaptation to new conditions.

Governance Action 2: Mandatory Autonomous In-Situ Quality Verification

Purpose Biological manufacturing in space requires reliable verification of product identity and safety without reliance on Earth-based laboratories. This action proposes requiring automated quality verification for any biologically produced medical compound.

Design

  • Biomanufacturing systems must include integrated analytical tools for identity, purity, and contamination checks.
  • Automated analysis prevents dispensing of products that fail quality thresholds.
  • Space agencies and mission integrators enforce this requirement during system certification.

Assumptions

  • Miniaturized analytical systems can reach sufficient reliability for medical decision-making.
  • Automated verification can substitute for expert laboratory oversight during deep-space missions.

Risks of Failure & “Success”

  • False negatives may prevent use of necessary medication in emergencies.
  • Increased system complexity may introduce additional failure modes.

Governance Action 3: Open Interface and Data Standards for Space Biology Systems

Purpose Fragmented ownership and incompatible systems can prevent effective use of biological technologies in space. This action proposes standardized interfaces and data formats for biological analysis and manufacturing systems.

Design

  • Space agencies, private companies, and academic partners agree on common data and interface standards.
  • Biological production logs and analysis results follow shared formats.
  • Systems support interoperable integration with robotic handling and automated laboratory platforms.

Assumptions

  • Stakeholders are willing to cooperate on shared standards.
  • Open standards will improve reliability and reduce duplication of effort.

Risks of Failure & “Success”

  • Standards may become overly bureaucratic or slow to evolve.
  • Successful standardization could accelerate widespread adoption faster than governance mechanisms adapt.

4 score (from 1-3 with, 1 as the best, or n/a) each of your governance actions against your rubric of policy goals. The following is one framework but feel free to make your own

Does the option:Option 1Option 2Option 3
Enhance Biosecurity
• By preventing incidents113
• By helping respond312
Foster Lab Safety
• By preventing incident122
• By helping respond313
Protect the environment
• By preventing incidents121
• By helping respond223
Other considerations
• Minimizing costs and burdens to stakeholders131
• Feasibility?233
• Not impede research332
• Promote constructive applications331

Professor Jacobson

  • Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?

error rate = 1:106 Length of the human genome is about 3 billion bases. Throughput Error Rate Product Differential: ~108 Biology doesnt just depend on polymerase for copying DNA, it uses other methods as well

  • How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?

1036 base pairs maybe due to codon usage bias, certain organisms prefer to use certain codons, while others don’t

Professor Dr. LeProust

  • What’s the most commonly used method for oligo synthesis currently? solid-phase phosphoramidite synthesis.
  • Why is it difficult to make oligos longer than 200nt via direct synthesis? Each nucleotide addition has a small failure probability. This keeps accumulating.
  • Why can’t you make a 2000bp gene via direct oligo synthesis? 2000 sequential chemical coupling steps is required, which is too much and will fail due to error accumulation. Not even a single full length molecule will be made.

Professor George Church

What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency"? Histidine Isoleucine Leucine Lysine Methionine Phenylalanine Threonine Tryptophan Valine Arginine

Since the organisms cannot exist without lysine and a very fundamental one that is common among all animals, it is clear that there was one common ancestor who was also similarly limited. So the lysine contingency is a fundamental and deeply rooted biological limit that has not been overcome in a very long time. The 10 amino acids must be attained somehow, and that is a very profound insight.

HTGAA Website assignment

  • Template has been filled in
  • All parts of homework written

Week 2 HW: DNA read, write, and edit

Week 2 - DNA read, write and edit

We’re in the deep now, time to get the hands dirty.

Part 0 - Basics of Gel electrophoresis -

What did the small DNA fragment say to the large one? “Catch me if you can!”

In gel electophoresis, the positive end should be on the opposite side of the sample well, i.e sample well should be close to the cathode.

DNA is an acid, so it readily gives away a positive ion and becomes negative. DNA is negatively charged due to the phosphates in its sugar-phosphate backbone (PO4 3-).

Top strand is always 5’ to 3’. The other strand is called the template strand.

Video tutorial on Gel electrophoresis - https://www.youtube.com/watch?v=TIZRGt3YAug

Part 1 - Benchling & In-silico Gel Art

  • Make benchling account
  • import Lambda DNA
    • Simulate restriction enzyme digestion with the below enzymes-
    • EcoRI
    • HindIII
    • BamHI
    • KpnI
    • EcoRV
    • SacI
    • SalI
  • Create a Pattern/image in Paul Vanouse’s Latent Figure Protocol artworks

In benchling - Create new project. In the left menubar, use the “+” button to create new sequence.
Name it, select DNA, then select linear toppology.
Then click create.
Once can also import sequences.
Just copy GenBank code.
Go to import from database in the “+” section.
Paste code and search.

lambda phage DNA|500 lambda phage DNA|500

Let’s do a virtual restriction enzyme digestion !
once the DNA is imported and the sequence is visible, select the scissor logo on the right control panel.
Click on new digest, and add the resitriction enzyme name.
Select enzyme and run digest. Move onto the virtual digest tab in the right panel to be able to see and compare a virtual gel electrophoresis.
Ensure the right ladder is picked - should be congruent with the machine.

Lambda virtual digest file Lambda virtual digest file

To those who wanna view my benchling work: https://benchling.com/s/seq-3PpRFMxTkg5VowzYd9ya?m=slm-zacrpUMhznCXawpSU6AX

Final part of part 1 - create a virtual art.
I used rcdonovan’s art automation website.
It was quite cumbersome, because I assumed I can draw any shape and it will find it out for me.
Turned out I had to click up and down till I got something I could possibly work with.
Given the proximity to Valentine’s day I decided to go with a heart, with a smiley face in the middle.

heart with smiley in gel heart with smiley in gel

I got the following directions to follow to create the art in benchling benchling directions benchling directions

The same done in benchling after doing all the digestions. half the heart is there in gel simualtion, the rest can be made through a mirror reflection. benchling half benchling half

Part 2 - Gel Art - Restriction Digests and Gel electrophoresis

(Unable to do this part since I do not have access to a lab)

Part 3 - DNA Design Challenge

  • Choose my Protein
  • Reverse translation: Amino acid sequence to DNA nucleotide sequence
  • Codon Optimization
  • You have a sequence now what?
  • How does it work in nature/biological systems?

3.1 Choose your protein: In my governance paper, I had talked about the importance of synbio in space. In continuation with that I have chosen to work with LEA (Late Embryogenesis Abundant) proteins. The idea being to produce active pharmaceutical ingredients (API) in space. But dealing with storage and water shortage issues by dehydrating the results of the bioreactor, to reuse the water. But dehydration should not affect the API, hence a protein that is produced along with the API, that protects it will be great. Upon reading this paper, I had an idea of using the LEA3 https://pmc.ncbi.nlm.nih.gov/articles/PMC2292704/.

Protein sequence acquired from: https://www.uniprot.org/uniprotkb/O81843/entry

mRNA Protein Sequence for LEA3

AAN13065.1 unknown protein [Arabidopsis thaliana] MEVPKSSLLMIIFVVASCFLHVKAWHGQTYCGGNATPRCQLRYIDCPEECPTEMFPNSQNKICWVDCFKP LCEAVCRAVKPNCESYGSICLDPRFIGGDGIVFYFHGKSNEHFSIVSDPDFQINARFTGHRPAGRTRDFT WIQALGFLFNSHKFSLETTKVATWDSNLDHLKFTIDGQDLIIPQETLSTWYSSDKDIKIERLTEKNSVIV TIKDKAEIMVNVVPVTKEDDRIHNYKLPVDDCFAHFEVQFKFINLSPKVDGILGRTYRPDFKNPAKPGVV MPVVGGEDSFRTSSLLSHVCKTCLFSEDPAVASGSVKPKSTYALLDCSRGASSGYGLVCRK

Using the following website to do reverse translation: https://www.bioinformatics.org/sms2/rev_trans.html

The site requested a codon table. The codon table of Arabidopsis Thaliana was obtained using: https://www.kazusa.or.jp/codon/cgi-bin/showcodon.cgi?species=3702&aa=1&style=GCG

Reverse translation result:

reverse translation of AAN13065.1 unknown protein [Arabidopsis thaliana] to a 1023 base sequence of most likely codons. atggaagttcctaagtcttctcttcttatgattatttttgttgttgcttcttgttttctt catgttaaggcttggcatggacaaacttattgtggaggaaatgctactcctagatgtcaa cttagatatattgattgtcctgaagaatgtcctactgaaatgtttcctaattctcaaaat aagatttgttgggttgattgttttaagcctctttgtgaagctgtttgtagagctgttaag cctaattgtgaatcttatggatctatttgtcttgatcctagatttattggaggagatgga attgttttttattttcatggaaagtctaatgaacatttttctattgtttctgatcctgat tttcaaattaatgctagatttactggacatagacctgctggaagaactagagattttact tggattcaagctcttggatttctttttaattctcataagttttctcttgaaactactaag gttgctacttgggattctaatcttgatcatcttaagtttactattgatggacaagatctt attattcctcaagaaactctttctacttggtattcttctgataaggatattaagattgaa agacttactgaaaagaattctgttattgttactattaaggataaggctgaaattatggtt aatgttgttcctgttactaaggaagatgatagaattcataattataagcttcctgttgat gattgttttgctcattttgaagttcaatttaagtttattaatctttctcctaaggttgat ggaattcttggaagaacttatagacctgattttaagaatcctgctaagcctggagttgtt atgcctgttgttggaggagaagattcttttagaacttcttctcttctttctcatgtttgt aagacttgtcttttttctgaagatcctgctgttgcttctggatctgttaagcctaagtct acttatgctcttcttgattgttctagaggagcttcttctggatatggacttgtttgtaga aag

reverse translation of AAN13065.1 unknown protein [Arabidopsis thaliana] to a 1023 base sequence of consensus codons. atggargtnccnaarwsnwsnytnytnatgathathttygtngtngcnwsntgyttyytn caygtnaargcntggcayggncaracntaytgyggnggnaaygcnacnccnmgntgycar ytnmgntayathgaytgyccngargartgyccnacngaratgttyccnaaywsncaraay aarathtgytgggtngaytgyttyaarccnytntgygargcngtntgymgngcngtnaar ccnaaytgygarwsntayggnwsnathtgyytngayccnmgnttyathggnggngayggn athgtnttytayttycayggnaarwsnaaygarcayttywsnathgtnwsngayccngay ttycarathaaygcnmgnttyacnggncaymgnccngcnggnmgnacnmgngayttyacn tggathcargcnytnggnttyytnttyaaywsncayaarttywsnytngaracnacnaar gtngcnacntgggaywsnaayytngaycayytnaarttyacnathgayggncargayytn athathccncargaracnytnwsnacntggtaywsnwsngayaargayathaarathgar mgnytnacngaraaraaywsngtnathgtnacnathaargayaargcngarathatggtn aaygtngtnccngtnacnaargargaygaymgnathcayaaytayaarytnccngtngay gaytgyttygcncayttygargtncarttyaarttyathaayytnwsnccnaargtngay ggnathytnggnmgnacntaymgnccngayttyaaraayccngcnaarccnggngtngtn atgccngtngtnggnggngargaywsnttymgnacnwsnwsnytnytnwsncaygtntgy aaracntgyytnttywsngargayccngcngtngcnwsnggnwsngtnaarccnaarwsn acntaygcnytnytngaytgywsnmgnggngcnwsnwsnggntayggnytngtntgymgn aar

Using another site to get the codon optimized sequence for Saccharomyces Cerevisiae S288C - from https://en.vectorbuilder.com/tool/codon-optimization.html

ATGGAAGTACCAAAGTCGTCTTTATTAATGATTATATTTGTTGTTGCATCTTGTTTTTTACATGTTAAAGCTTGGCATGGTCAAACTTATTGCGGTGGTAACGCTACACCACGTTGCCAACTTAGATATATCGATTGTCCAGAAGAGTGCCCAACTGAAATGTTTCCAAATTCACAGAACAAAATTTGTTGGGTTGATTGTTTTAAACCACTATGTGAAGCAGTTTGCAGAGCTGTTAAACCAAATTGCGAAAGCTACGGTAGCATATGTTTAGATCCAAGGTTCATTGGAGGAGATGGTATTGTTTTCTACTTCCACGGTAAATCCAATGAACATTTTTCTATTGTATCTGACCCAGATTTTCAAATTAACGCTAGATTTACAGGACATAGACCAGCCGGTAGAACAAGAGATTTCACCTGGATACAAGCTTTAGGATTTCTGTTTAACTCCCATAAGTTCTCTTTAGAAACAACAAAAGTTGCCACCTGGGACTCTAATTTGGATCATTTGAAGTTTACAATTGATGGCCAGGACTTGATAATTCCTCAAGAAACTTTATCAACATGGTACTCCTCTGATAAAGATATTAAAATTGAAAGATTAACCGAAAAGAATTCTGTTATCGTTACAATTAAAGATAAGGCCGAAATTATGGTTAATGTTGTTCCAGTTACAAAAGAAGATGACAGAATACACAATTATAAACTTCCAGTTGATGACTGTTTTGCTCATTTTGAGGTTCAATTCAAATTCATTAACTTGTCTCCTAAAGTTGATGGTATTTTAGGTAGGACTTATAGACCAGATTTCAAGAACCCAGCTAAGCCAGGTGTCGTGATGCCAGTCGTTGGCGGTGAGGATAGCTTTAGAACTTCTTCTTTGTTATCACACGTTTGTAAAACCTGTTTGTTTTCTGAGGACCCTGCCGTTGCTTCTGGTTCTGTTAAACCAAAATCTACTTATGCTCTATTGGATTGTTCAAGAGGTGCCTCCTCCGGTTATGGTTTGGTTTGTAGAAAATGA

What technologies could be used to produce this protein from your DNA? Describe in your words the DNA sequence can be transcribed and translated into your protein. You may describe either cell-dependent or cell-free methods, or both.

Some technologies that can be used to produce this protein can be:

  • Cell based
    • Using the S.Cerevisuae as a chassis to make the proteins for us
    • Using plasmid based protein expression
  • Cell free
    • Using PCR

How does DNA get transcribed and translated into protein DNA get converted into RNA by RNA polymerase, in the nucleus. This RNA leaves out into the cytoplasm, and gets translated into amino acids. The RNA strand that leaves the nucleus is the messenger RNA. Which eventually finds its way to a ribosome. In the ribosome, a tRNA combines with codons, and amino acids on the other end, hence making the protein chains.

For the next part using: TPA_inf: Saccharomyces cerevisiae S288C chromosome I, complete sequence https://www.ncbi.nlm.nih.gov/nuccore/BK006935.2/

promoter - pTEF1 Translation Initiation Region (Yeast Kozak Sequence) - AAAAAAATGTCT Start codon - ATG Stop codon - TAA Terminator - tADH1

Found something new that there is y and n and in consensus codons - y is for u/c and n - for any

Part 4 - Prepare a Twist DNA synthesis order

  • Create account in twist
  • Build your DNA insert sequence - New DNA/RNA sequence in benchling
    • Give insert sequence a name with a linear topology
    • Add each codon optimized sequence in benchling and color it
    • Click on linear map to preview
    • Download FASTA file
    • Visualize DNA design in SBOL canvas
  • On twist, select the Genes option
  • Select “Clonal Genes” option
  • Import sequence
  • Choose your vector
    • Choose clonign vectors like pTwist Amp High Copy
    • Download construct - GenBank
    • Import back into Benchling

Part 5 - DNA read/write/edit

DNA Read

  1. What DNA would you want to sequence and why?

  2. In lecture, a variety of sequencing technologies were mentioned. What tech would you use to perform sequencing on your DNA and why?

  3. Is your method first-, second- or third-generation or other? How so?

  4. What is your input? How do you prepare your input (e.g. fragmentation, adapter ligation, PCR)? List the essential steps.

  5. What are the essential steps of your chosen sequencing technology, how does it decode the bases of your DNA sample (base calling)?

  6. What is the output of your chosen sequencing technology?

Ans. In addition to LEA3, I would sequence DNA related to microbial environmental monitoring and crew health, specifically microbial genomes present in the spacecraft habitat and bioreactor systems. Monitoring microbial populations is important in long-duration missions because microbial evolution, contamination, or biofilm formation can affect both human health and biological manufacturing systems. Sequencing allows detection of pathogenic organisms, identification of mutations in engineered production strains, and verification that biological systems remain genetically stable over time. For sequencing, I would use Oxford Nanopore sequencing, which is a third-generation sequencing technology, because it allows long-read sequencing, minimal sample preparation, compact instrumentation, and real-time analysis, making it suitable for autonomous space environments. The input is purified DNA extracted from environmental or biological samples, which is prepared by DNA extraction, optional fragmentation depending on desired read length, adapter ligation, and sometimes PCR amplification if DNA quantities are low. During sequencing, individual DNA molecules pass through a nanopore embedded in a membrane, and changes in ionic current are measured as different nucleotide sequences move through the pore; these electrical signal changes are decoded computationally into base sequences through base-calling algorithms. The output of the sequencing process is a digital file containing nucleotide sequences (reads), along with quality scores, which can then be assembled, compared to reference genomes, or analyzed for mutations and species identification.

DNA Write

  1. What DNA would you want to synthesize (e.g., write) and why? These could be individual genes, clusters of genes or genetic circuits, whole genomes, and beyond. As described in class thus far, applications could range from therapeutics and drug discovery (e.g., mRNA vaccines and therapies) to novel biomaterials (e.g. structural proteins), to sensors (e.g., genetic circuits for sensing and responding to inflammation, environmental stimuli, etc.), to art (DNA origamis). If possible, include the specific genetic sequence(s) of what you would like to synthesize! You will have the opportunity to actually have Twist synthesize these DNA constructs! :)

I would synthesize a genetic circuit for sensing oxidative stress and activating a protective response in Saccharomyces cerevisiae. In a space environment, radiation and altered metabolism can increase reactive oxygen species (ROS), which can damage both cells and pharmaceutical production pathways. A ROS-responsive genetic circuit could act as a biological sensor that activates protective genes or temporarily pauses production when stress levels become high, improving reliability and safety of autonomous biomanufacturing systems.

The construct would include a stress-responsive promoter, a reporter or regulatory protein, and a downstream response gene. For example, a simplified design could include a yeast oxidative stress promoter (such as a promoter derived from oxidative stress response genes), followed by a coding sequence for a fluorescent reporter or regulatory protein, and a transcription terminator.

  1. What Technology would you use to perform this DNA synthesis and why *. What are the essential steps in the chosen sequencing methods? *. What are the limitations of your sequencing methods in terms of speed, accuracy and scalability?

I would use chemical DNA synthesis using phosphoramidite chemistry, which is the most common and reliable method used by commercial DNA synthesis companies. This method builds DNA one nucleotide at a time on a solid support, allowing precise control over the final sequence and making it suitable for synthesizing gene-length constructs.

The essential steps include chemically adding nucleotides sequentially to build short DNA fragments, removing protective groups after each step, assembling short fragments into a full gene if needed, and verifying the final sequence through DNA sequencing.

The main limitations are that longer DNA sequences are harder to synthesize directly because small errors accumulate during each chemical step. This limits the length of DNA that can be made in one piece and requires assembly from shorter fragments. While accuracy is high after verification, synthesis speed depends on chemical cycle time, and very large constructs become slower and more expensive to produce at scale.

source - Chatgpt 5.2

DNA edit

  1. What DNA would you want to edit and why?

I would edit the DNA of Saccharomyces cerevisiae to modify stress-response and metabolic regulation genes so that the organism produces pharmaceutical compounds efficiently while limiting uncontrolled growth. In a space environment, it is useful for production organisms to remain stable, tolerate radiation and oxidative stress, and enter low-activity states when production is not required. Editing genes involved in stress tolerance or metabolic regulation could allow the yeast to maintain productivity without excessive biomass accumulation, improving safety and reducing resource consumption in a closed system.

  1. What Techhnology or Technologies would be used to perform these DNA edits and why?
  2. How does your technology of choice edit DNA? What are the essential steps?

I would use CRISPR–Cas9 genome editing, because it allows precise, targeted modification of DNA and is widely used in yeast engineering. CRISPR-based editing is efficient, relatively simple to design, and supports both gene knockouts and insertion of new genetic elements at specific locations in the genome.

  1. What preparation do you need to do (e.g. design steps) and what is the input (e.g. DNA template, enzymes, plasmids, primers, guides, cells) for the editing?

CRISPR–Cas9 uses a guide RNA to direct the Cas9 enzyme to a specific DNA sequence in the genome. Cas9 creates a double-stranded break at that location. The cell then repairs the break using its natural repair mechanisms. If a repair template is provided, the cell incorporates the new DNA sequence during repair, allowing precise edits such as insertions, deletions, or replacements.

  1. What are the limitations of your editing methods (if any) in terms of efficiency or precision?

    The preparation involves designing a guide RNA that matches the target DNA sequence and designing a repair template containing the desired genetic change. The inputs typically include a plasmid encoding Cas9, the guide RNA sequence, the repair DNA template, and yeast cells to be edited. After transformation, edited cells are selected and verified using sequencing to confirm that the intended modification was successfully introduced.

Week 3 HW: Lab Automation

Week 3 lab automation

Assignment: Python Script for Opentrons Artwork

  • Review Recitation
  • Generate an artistic design using the GUI at opentrons-art.rcdonovan.com.
  • using the coordinates from the GUI, follow the instructions in the HTGAA26 Opentrons Colab to write your own Python script which draws your design using the Opentrons.
  • Submit form

Using AI to generate an image. Image prompt: I am making agar art. I need an image reference. In the image add 3 star flares in the color blue. Lets have an “orbit” around the 3 stars in yellow color, which is an ellipse/spiral that starts thin and gets thicker. In red, lets have a NASA style triangle in the bottom right and top left that is pointing upwards and is small. Give discrete points, not a continous image, similar to bitmap. Use only the colors specified. Give top view.

Image from the prompt Image from the prompt

Then used the image to generate art GUI concept on RCdonovan’s opentrons art site Image from the prototyping site Image from the prototyping site

In here are the coordinates I got from the rcdonovan website for red, blue and yellow

mko2_points = [(4.4, 37.4),(6.6, 37.4),(8.8, 37.4),(11, 37.4),(13.2, 37.4),(-4.4, 35.2),(-2.2, 35.2),(0, 35.2),(2.2, 35.2),(4.4, 35.2),(6.6, 35.2),(8.8, 35.2),(11, 35.2),(13.2, 35.2),(15.4, 35.2),(17.6, 35.2),(-8.8, 33),(-6.6, 33),(-4.4, 33),(-2.2, 33),(0, 33),(2.2, 33),(4.4, 33),(6.6, 33),(8.8, 33),(11, 33),(13.2, 33),(15.4, 33),(17.6, 33),(19.8, 33),(-4.4, 30.8),(-2.2, 30.8),(0, 30.8),(2.2, 30.8),(4.4, 30.8),(6.6, 30.8),(8.8, 30.8),(11, 30.8),(13.2, 30.8),(15.4, 30.8),(17.6, 30.8),(19.8, 30.8),(-4.4, 28.6),(-2.2, 28.6),(0, 28.6),(2.2, 28.6),(4.4, 28.6),(6.6, 28.6),(8.8, 28.6),(-19.8, 26.4),(-17.6, 26.4),(-4.4, 26.4),(-24.2, 24.2),(-22, 24.2),(-19.8, 24.2),(-17.6, 24.2),(-15.4, 24.2),(-28.6, 22),(-26.4, 22),(-24.2, 22),(-22, 22),(-19.8, 22),(-17.6, 22),(-15.4, 22),(-13.2, 22),(-30.8, 19.8),(-28.6, 19.8),(-26.4, 19.8),(-24.2, 19.8),(-22, 19.8),(-33, 17.6),(-30.8, 17.6),(-28.6, 17.6),(-26.4, 17.6),(-24.2, 17.6),(-35.2, 15.4),(-33, 15.4),(-30.8, 15.4),(-28.6, 15.4),(-26.4, 15.4),(-4.4, 15.4),(-2.2, 15.4),(0, 15.4),(2.2, 15.4),(4.4, 15.4),(6.6, 15.4),(8.8, 15.4),(11, 15.4),(13.2, 15.4),(15.4, 15.4),(17.6, 15.4),(19.8, 15.4),(22, 15.4),(-35.2, 13.2),(-33, 13.2),(-30.8, 13.2),(-28.6, 13.2),(-11, 13.2),(-8.8, 13.2),(-6.6, 13.2),(24.2, 13.2),(-37.4, 11),(-35.2, 11),(-33, 11),(-30.8, 11),(-13.2, 11),(-11, 11),(-37.4, 8.8),(-35.2, 8.8),(-33, 8.8),(-30.8, 8.8),(-15.4, 8.8),(-13.2, 8.8),(0, 8.8),(2.2, 8.8),(4.4, 8.8),(8.8, 8.8),(11, 8.8),(13.2, 8.8),(-37.4, 6.6),(-35.2, 6.6),(-33, 6.6),(-17.6, 6.6),(-15.4, 6.6),(-6.6, 6.6),(-4.4, 6.6),(-2.2, 6.6),(-39.6, 4.4),(-37.4, 4.4),(-35.2, 4.4),(-33, 4.4),(-19.8, 4.4),(-17.6, 4.4),(-6.6, 4.4),(-4.4, 4.4),(17.6, 4.4),(30.8, 4.4),(33, 4.4),(-39.6, 2.2),(-37.4, 2.2),(-35.2, 2.2),(-19.8, 2.2),(-6.6, 2.2),(-4.4, 2.2),(30.8, 2.2),(33, 2.2),(-39.6, 0),(-37.4, 0),(-35.2, 0),(-19.8, 0),(-2.2, 0),(17.6, 0),(30.8, 0),(33, 0),(-39.6, -2.2),(-37.4, -2.2),(-35.2, -2.2),(17.6, -2.2),(28.6, -2.2),(30.8, -2.2),(33, -2.2),(-39.6, -4.4),(-37.4, -4.4),(-35.2, -4.4),(15.4, -4.4),(28.6, -4.4),(30.8, -4.4),(33, -4.4),(-37.4, -6.6),(-35.2, -6.6),(-19.8, -6.6),(11, -6.6),(13.2, -6.6),(26.4, -6.6),(28.6, -6.6),(30.8, -6.6),(-37.4, -8.8),(-35.2, -8.8),(-33, -8.8),(-19.8, -8.8),(-17.6, -8.8),(6.6, -8.8),(8.8, -8.8),(24.2, -8.8),(26.4, -8.8),(28.6, -8.8),(30.8, -8.8),(-37.4, -11),(-35.2, -11),(-33, -11),(-15.4, -11),(-13.2, -11),(-11, -11),(-8.8, -11),(-6.6, -11),(22, -11),(24.2, -11),(26.4, -11),(28.6, -11),(-35.2, -13.2),(-33, -13.2),(19.8, -13.2),(22, -13.2),(24.2, -13.2),(26.4, -13.2),(28.6, -13.2),(-35.2, -15.4),(-33, -15.4),(-30.8, -15.4),(13.2, -15.4),(15.4, -15.4),(17.6, -15.4),(19.8, -15.4),(22, -15.4),(24.2, -15.4),(26.4, -15.4),(-33, -17.6),(-30.8, -17.6),(-28.6, -17.6),(11, -17.6),(13.2, -17.6),(15.4, -17.6),(17.6, -17.6),(19.8, -17.6),(22, -17.6),(24.2, -17.6),(-33, -19.8),(-30.8, -19.8),(-28.6, -19.8),(-26.4, -19.8),(-24.2, -19.8),(-22, -19.8),(-19.8, -19.8),(8.8, -19.8),(11, -19.8),(13.2, -19.8),(15.4, -19.8),(17.6, -19.8),(19.8, -19.8),(-30.8, -22),(-28.6, -22),(-26.4, -22),(-24.2, -22),(-22, -22),(-19.8, -22),(-17.6, -22),(-15.4, -22),(-13.2, -22),(-11, -22),(-8.8, -22),(-6.6, -22),(-4.4, -22),(-2.2, -22),(0, -22),(2.2, -22),(4.4, -22),(6.6, -22),(8.8, -22),(11, -22),(13.2, -22),(15.4, -22),(17.6, -22),(-28.6, -24.2),(-26.4, -24.2),(-24.2, -24.2),(-22, -24.2),(-19.8, -24.2),(-17.6, -24.2),(-15.4, -24.2),(-13.2, -24.2),(-11, -24.2),(-8.8, -24.2),(-6.6, -24.2),(-4.4, -24.2),(-2.2, -24.2),(0, -24.2),(2.2, -24.2),(4.4, -24.2),(6.6, -24.2),(8.8, -24.2),(11, -24.2),(13.2, -24.2),(-22, -26.4),(-19.8, -26.4),(-17.6, -26.4),(-15.4, -26.4),(-13.2, -26.4),(-11, -26.4),(-8.8, -26.4),(-6.6, -26.4),(-4.4, -26.4),(-2.2, -26.4),(0, -26.4),(2.2, -26.4),(4.4, -26.4),(6.6, -26.4),(8.8, -26.4),(11, -26.4),(-2.2, -28.6),(0, -28.6),(2.2, -28.6),(4.4, -28.6)] electra2_points = [(6.6, 11),(6.6, 8.8),(4.4, 6.6),(6.6, 6.6),(8.8, 6.6),(-13.2, 4.4),(0, 4.4),(2.2, 4.4),(4.4, 4.4),(6.6, 4.4),(8.8, 4.4),(11, 4.4),(13.2, 4.4),(-13.2, 2.2),(4.4, 2.2),(6.6, 2.2),(8.8, 2.2),(-15.4, 0),(-13.2, 0),(-11, 0),(6.6, 0),(-19.8, -2.2),(-17.6, -2.2),(-15.4, -2.2),(-13.2, -2.2),(-11, -2.2),(-8.8, -2.2),(-6.6, -2.2),(6.6, -2.2),(-15.4, -4.4),(-13.2, -4.4),(-11, -4.4),(-13.2, -6.6),(0, -6.6),(-13.2, -8.8),(0, -8.8),(-2.2, -11),(0, -11),(2.2, -11),(-6.6, -13.2),(-4.4, -13.2),(-2.2, -13.2),(0, -13.2),(2.2, -13.2),(4.4, -13.2),(6.6, -13.2),(-2.2, -15.4),(0, -15.4),(2.2, -15.4),(0, -17.6),(0, -19.8)] mrfp1_points = [(-19.8, 30.8),(-17.6, 30.8),(-15.4, 30.8),(-13.2, 30.8),(-11, 30.8),(-8.8, 30.8),(-6.6, 30.8),(-17.6, 28.6),(-15.4, 28.6),(-13.2, 28.6),(-11, 28.6),(-8.8, 28.6),(-6.6, 28.6),(-15.4, 26.4),(-13.2, 26.4),(-11, 26.4),(-8.8, 26.4),(-6.6, 26.4),(-13.2, 24.2),(-11, 24.2),(-8.8, 24.2),(-6.6, 24.2),(-11, 22),(-8.8, 22),(-6.6, 22),(-8.8, 19.8),(-6.6, 19.8),(-6.6, 17.6),(30.8, 17.6),(28.6, 15.4),(30.8, 15.4),(26.4, 13.2),(28.6, 13.2),(30.8, 13.2),(24.2, 11),(26.4, 11),(28.6, 11),(30.8, 11),(22, 8.8),(24.2, 8.8),(26.4, 8.8),(28.6, 8.8),(30.8, 8.8),(19.8, 6.6),(22, 6.6),(24.2, 6.6),(26.4, 6.6),(28.6, 6.6),(30.8, 6.6)]

The copied google colab link is here: https://colab.research.google.com/drive/18vdbBdfkDA-w4Ehbb-0FUUs5g1yCzt-Q#scrollTo=pczDLwsq64mk&line=7&uniqifier=1 Please reach out to me if for some reason the file is not shared or viewable.

Google form post-submission link (for TAs only): https://docs.google.com/forms/d/e/1FAIpQLSddW_4xl32vZHADiUUx4llXO-fwLYnOpT-YNbJB1FozGhT7jQ/alreadyresponded

My code (ignoring the boiler plate):

  #mko2 is yellow
  #electra2 is blue
  #mrfp1 is red

  mko2_points = #coordinates1
  electra2_points = #coordinates2
  mrfp1_points = #coordinates3

  #pickup the tip
  pipette_20ul.pick_up_tip()
  cursor = center_location

  #pipette_20ul.aspirate(20, location_of_color('Red'))

  #Start dispensing the fluoroscent proteins and drawing
  #Yellow
  for i in range(len(mko2_points)):
    if i%20 == 0:
      pipette_20ul.aspirate(20,location_of_color('Yellow'))
    cursor = center_location.move(types.Point(x=mko2_points[i][0],y=mko2_points[i][1]))
    dispense_and_detach(pipette_20ul, 1, cursor)
  pipette_20ul.drop_tip()

  pipette_20ul.pick_up_tip()
  #Blue
  for i in range(len(electra2_points)):
    if i%20 == 0:
      pipette_20ul.aspirate(20,location_of_color('Blue'))
    cursor = center_location.move(types.Point(x=electra2_points[i][0],y=electra2_points[i][1]))
    dispense_and_detach(pipette_20ul, 1, cursor)
  pipette_20ul.drop_tip()

  pipette_20ul.pick_up_tip()

  #Red
  for i in range(len(mrfp1_points)):
    if i%20 == 0:
      pipette_20ul.aspirate(20,location_of_color('Red'))
    cursor = center_location.move(types.Point(x=mrfp1_points[i][0],y=mrfp1_points[i][1]))
    dispense_and_detach(pipette_20ul, 1, cursor)
  pipette_20ul.drop_tip()

et voila, the image generated by the opentrons simulation Image from the prompt Image from the prompt

Post-Lab Questions

  • Find and describe a published paper that utilizes the Opentrons or an automation tool to achieve novel biological applications.
  • Write a description about what you intend to do with automation tools for your final project

Published paper that uses automation: There are various providers of lab automation. Be it opentrons, trilobio, tetsuwan scientific or ginkgo bioworks.

  • This latest paper describes using GPT 5 and ginkgo bioworks autonomous lab to do cell-free protein synthesis: Using a GPT-5-driven autonomous lab to optimize the cost and titer of cell-free protein synthesis | bioRxiv
  • In this paper, the team was able to achieve a 27% increase in protein titer, and a 40% reduction in cost of cell free protein synthesis (CFPS) in $/gram. Protein of choice was sfGFP - super green fluorescent protein
  • Pydantic was used to validate that the AI generated protocols were valid before sending it to ginkgo bioworks.

LLM generated:

  • programmatic specification of multi-instrument biological workflows by Ginkgo’s Catalyst software
  • Use YAML/XML output from ginkgo to redesign and optimize the experiment further.

Description of what I intend to do: I want to use GPT5 and Ginkgo’s RAC to optimize protein-MOF symbiosis in ginkgo’s RAC automatic lab.

I uploaded the paper into ChatGPT, asked it to program itself to design an experiment for the above.

Output:

Ginkgo catalyst orchestration system JSON workflow graph:

{
  "workflow_name": "mof_binding_protein_screen",
  "steps": [

    {
      "name": "dna_synthesis",
      "tool": "twist_synthesis",
      "input_sequences": "${protein_sequences}"
    },

    {
      "name": "cell_free_expression",
      "tool": "opentrons_cfpx",
      "protocol": "sfGFP_style_cfpx",
      "parameters": {
        "temperature": 30,
        "reaction_volume": "20uL"
      }
    },

    {
      "name": "mof_binding_assay",
      "tool": "automated_binding_plate",
      "parameters": {
        "mof_type": "ZIF-8",
        "incubation_time": "2h"
      }
    },

    {
      "name": "fluorescence_read",
      "tool": "plate_reader",
      "readout": "binding_intensity"
    },

    {
      "name": "data_upload",
      "tool": "lab_data_store"
    }
  ]
}

High-level protein workflow pseudocode:

goal = "design protein that stabilizes ZIF-8 MOF surface"

while experiment_budget > 0:

    # 1 Hypothesis generation
    hypothesis = GPT5.generate_hypothesis(
        goal,
        literature_context,
        previous_results
    )

    # 2 Protein design
    candidate_proteins = RFdiffusion.design(
        target_surface="ZIF-8",
        binding_motif=hypothesis["motif"],
        count=50
    )

    # 3 Sequence optimization
    sequences = ProteinMPNN.optimize(candidate_proteins)

    # 4 Structure prediction
    structures = ESMFold.predict(sequences)

    # 5 Binding simulation
    scores = docking_simulation(
        structures,
        MOF_surface="ZIF-8"
    )

    # 6 Select best candidates
    top_candidates = select(scores, top=10)

    # 7 Design experiments
    experiment_plan = GPT5.design_experiment(
        proteins=top_candidates,
        assay="MOF binding stability",
        conditions=["pH", "temperature", "salt"]
    )

    # 8 Execute via robotic lab
    results = catalyst_orchestrator.run(experiment_plan)

    # 9 Evaluate
    analysis = GPT5.analyze(results)

    # 10 Update experiment policy
    update_model(analysis)

    experiment_budget -= 1

Final Project Ideas

  1. Make in earth for space - Send things we couldn’t send before to space in new ways.
  2. Make in space for earth - Protein crystals and APIs that can only be made in microgravity
  3. Make in space for space - Hardware to enable space synbio slides yet to be updated

few more ideas I have been exploring

  • exploring how MOFs and proteins go hand in hand
  • dealing with composites that have in built bio-sensors.
  • bio-electric connections

Committed listener page has been updated.

Final ideas submitted:

  • Bio-hybrid Reef builder Robot
  • Protein@MOFs cell-free system for extreme habitat stability
  • Cell free On-chip cartridge framework for modular exobiology research instrumentation

Personal slides link (could have more than 3 ideas, but in shared slide, only 3 are shared): https://docs.google.com/presentation/d/14bcZB3e4qWsHIfbFh_S83iBtNpFeC-ODNHlv9NLe4sI/edit?usp=sharing


Sources

Using a GPT-5-driven autonomous lab to optimize the cost and titer of cell-free protein synthesis

Alexus A. Smith, Edmund L. Wong, Ronan C. Donovan, Brad A. Chapman, Ryan Harry, Pooyan Tirandazi, Paulina Kanigowska, Elizabeth A. Gendreau, Robert H. Dahl, Michal Jastrzebski, Jose E. Cortez, Christopher J. Bremner, José C. Morales Hemuda, James Dooner, Ian Graves, Rahul Karandikar, Christopher Lionetti, Kevin Christopher, Andrew L. Consiglio, Alyssa Tran, William McCusker, Duy X. Nguyen, Isis Botelho Nunes da Silva, Alvaro R. Bautista-Ayala, Monica P. McNerney, Sean Atkins, Michael McDuffie, Will Serber, Bradley P. Barber, Trinh Thanongsinh, Andrew Nesson, Bibek Lama, Brandon Nichols, Cameron LaFrance, Tenzing Nyima, Alicia Byrn, Rashard Thornhill, Bryan Cai, Lizvette Ayala-Valdez, Alycia Wong, Austin J. Che, Walter Thavarajah, Daniel Smith, Thomas F. Knight Jr., David W. Borhani, Jerry Tworek, Mostafa Rohaninejad, Ahmed El-Kishky, Nathan C. Tedford, Tejal Patwardhan, Yunxin Joy Jiao, Reshma P. Shetty

bioRxiv 2026.02.05.703998; doi: https://doi.org/10.64898/2026.02.05.703998

My chatGPT chat asking it to design an experiment: https://chatgpt.com/share/69ad5492-7f1c-8003-87aa-b1758c36f74b

Week 4 HW: Protein design part 1

  • Answer conceptual questions
  • Learn basic concept of protein design
  • Brainstorm how to apply these together in the group project

Part A - Conceptual questions

Amino Acid & Protein Structure Q&A


How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)

To find the number of molecules, we first determine the mass of protein and then convert that to moles and molecules.

  1. Protein Mass: Lean meat is roughly 20% protein by weight.

    $500\text{ g} \times 0.20 = 100\text{ g of protein}$.

  2. Moles of Amino Acids: Using the average molecular weight (100 Daltons = 100 g/mol):

    $100\text{ g} / 100\text{ g/mol} = 1\text{ mole of amino acids}$.

  3. Molecules: Using Avogadro’s number ($6.022 \times 10^{23}$):

    You consume approximately $6.022 \times 10^{23}$ amino acid molecules.

Why are there only 20 natural amino acids?

The “Standard 20” represents a biological “frozen accident” that reached a functional optimum early in evolution.

  • Chemical Versatility: These 20 provide a sufficient range of acidity, basicity, hydrophobicity, and polarity to fold into almost any required 3D shape.

  • Genetic Code Constraints: Our triplet codon system ($4^3 = 64$ combinations) must balance diversity with error tolerance.

  • Metabolic Cost: Adding more amino acids requires more complex metabolic pathways and specialized tRNA synthetases.

  • Note: Some organisms do use “extra” ones like Selenocysteine or Pyrrolysine.

Can you make other non-natural amino acids? Design some new amino acids.

Yes, “Non-Canonical Amino Acids” (ncAAs) are frequently synthesized.

  1. The “Photo-Switch”: An amino acid with an azobenzene side chain that changes shape (cis/trans) when hit by light.

  2. The “Click-Linker”: Incorporating an azide or alkyne group into the side chain to allow “Click Chemistry” reactions.

  3. The “Boron-Carrier”: Adding a boronic acid group for creating glucose-sensing proteins.

Where did amino acids come from before enzymes that make them, and before life started?

Prebiotic chemistry provided several pathways:

  • Strecker Synthesis: Spontaneous formation in a “primordial soup” containing ammonia, hydrogen cyanide, and aldehydes.

  • The Miller-Urey Experiment: Demonstrated that sparking a mixture of $CH_4$, $NH_3$, $H_2$, and $H_2O$ creates various amino acids.

  • Hydrothermal Vents: High pressure and temperature gradients at the ocean floor catalyze organic formation.

  • Exogenesis: Amino acids found on meteorites suggest they formed in space via UV radiation.

If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?

Natural L-amino acids form right-handed α-helices. Because D-amino acids are the mirror image (enantiomer) of L-amino acids, a polymer made entirely of D-amino acids would form a left-handed α-helix.

Can you discover additional helices in proteins?

Yes, they are classified by their hydrogen-bonding patterns:

  • $3_{10}$ helix: A tighter, more elongated helix often found at the ends of $\alpha$-helices.

  • $\pi$-helix: A wider, shorter helix that is relatively rare and often associated with functional sites.

  • Polyproline II helix: A left-handed helix that doesn’t rely on internal hydrogen bonds; common in collagen.

Why are most molecular helices right-handed?

This is due to homochirality. In biology, almost all amino acids are L-isomers. When L-amino acids link together, the steric hindrance (physical bumping) between the side chains and the backbone is minimized in a right-handed geometry. A left-handed helix made of L-amino acids would cause the side chains to clash significantly.

Why do β-sheets tend to aggregate? What is the driving force for β-sheet aggregation?

$\beta$-sheets are “sticky” at their edges.

  • The Driving Force: The primary driver is Hydrogen Bonding and the Hydrophobic Effect.

  • Unlike $\alpha$-helices, the peptide backbone at the edge of a $\beta$-strand has “exposed” donors and acceptors. If a strand doesn’t find a partner within its own protein, it will bind to a neighboring protein.

Why do many amyloid diseases form β-sheets? Can you use amyloid β-sheets as materials?

Amyloid Formation: When proteins misfold, they expose hydrophobic backbones that snap into stable, insoluble cross-$\beta$ structures. These are incredibly resistant to degradation.

Use as Materials: They are being researched for:

  • Nanowires: Conductive amyloids for bio-electronics.

  • Drug Delivery: Hydrogels for slow-release medication.

  • Adhesives: Mimicking super-strong underwater glues used by bacteria.

Part B - Protein analysis and visualization

  1. Briefly describe the protein you selected and why you selected it.
  2. Identify the amino acid sequence of your protein
    1. How long is it? What is the most frequent amino acid? You can use this Colab notebook to count the frequency of amino acids.
      • Total sequence length is 336, with L being the most frequent
    2. How many protein sequence homologs are there for your protein? Hint: Use Uniprot’s BLAST tool to search for homologs.
      • Around 250 Homologs are there -
    3. Does the protein belong to a family?
      • Yes - GH116 Glycosyl-hydrolase family 116 catalytic region domain-containing protein
  3. Identify the structures page of your protein in RCSB - RCSB PDB - 1GOW: BETA-GLYCOSIDASE FROM SULFOLOBUS SOLFATARICUS
    1. When the structure was resolved
      1. Released: 1997-08-20 at resolution of 2.60 Å
    2. Are there any other molecules in the solved structure apart from protein
      1. No
    3. Does your protein belong to any structure classification family?
      1. Glycoside Hydrolase Family
  4. PYMOL section
    1. Download PDBX/mmCIF format of protein from RCSB
    2. in import section, select the above file
    3. Protein is visualized and can be seen in the main console. On the right side panel, there will be options to alter visualization “ASHLC” Which is what needs to be changed
    • show cartoon, ribbon and sticks/stones view
    • Color by secondary structure
    • Color by residue type
    • show holes

Pymol screenshots

Pymol protein visualizations

cartoon view of molecule Select, S, then cartoon view

Ribbon view, select S then show as then ribbon view, on 1GOW, not on all

Show as spheres first, the in the previous drop down click on sticks as well

Color the protein by their secondary structure

Colored by ss structure. Seems like it has more helixes than sheets, but more loops than both. Cyan is helix, pink is sheets, orange is loops

Color by Residue, hydrophobicity and hydrophilicity

  • Import the python script and run it from files-run_script.
  • In the command Palette below, run the file name, in this case color_h

Source:Mapping properties onto a structure: Electrostatic potential, conservation, hydrophobicity/polarity

The redder the part is, more hydrophobic it is, white is not necessarily hydrophilia. In the second image below, green are polar molecules, and white are non-polar molecules, so could indicate hydrophilia.

Check if any “holes” there

Doesn’t seem like any outright holes are there, but a lot of surfaces and gaps are there though for binding pockets

Part C - Using ML based protein design tools

Copy of google colab - https://colab.research.google.com/drive/1Hn82J2OK4n2e_SrKc0UW3Pw9U4y6dzv6?usp=sharing

C1 Protein Language Modelling

Deep Mutational scan

  1. Use ESM2 to generate an unsupervised deep mutational scan of your protein based on language model likelihoods.
  2. Can you explain any particular pattern? (choose a residue and a mutation that stands out)

Used ESM2 to run a mutational scan on above protein. Input the amino acid sequence into the code and just execute it with relative mode. The model that was run was esm2t68MUR50D

Interesting pattern, there are 5-6 major straits where mutation can cause detrimental effects, but just like that, right before those sites, there are points where mutation can cause beneficial effects. This is if the target amino acid is replaced by any other amino acid.

  • According to the graph, the most dangerous amino acid to replace with seems to be P, because there is a horizontal line indicating P replacement causing a detrimental effect.
  • Replacing 162nd amino Acid with T is a very beneficial mutation at one site.

There are a few patterns along the latent space graph, but most proteins seem to be similar to each other. Clusters are often 10-15 proteins large are in the outskirts. Most proteins are also having TSNE heatmap along the vertical axis, so lower down the proteins are more likely they are along one other dimension also.

Added our protein the beta-glycosidase to the existing list of proteins. Redone Our target is part of a small cluster at the edge of the graph, as shown by the red.

  • It is grouped with 2 other yeast and bacteria based glucoamylase
  • Nisin Biosynthesis protein
  • arylamine N-acetyletransferase
  • epimerase
  • mannosidase
  • putative NAG-isomerase
  • endo b-1,4,glucanase

The below code was used to add it to the rest of the sequences, and the to create the graph.

from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord

# Create a new sequence (can be any protein or nucleic acid sequence)
new_seq_data = "MGRFAIYEAPQNCPYLGTIGACYEFGSLPVILMFPELEKSFLKLLIRHIREDGYVPHDLGYHSLDSPIDGTTSPPRWKDMNPSLILLVYRYFKFTNDIEFLKEVYPILVKVMDWELRQCKGNLPFMEGEMDNAFDATIIKGHDSYTSSLFIGSLIAMREIAKLVGDSNYVDFISEKLSSAREAFRRMFNGRYFKAWDSVDNASFLAQLYGEWFTTLVGLEDIVEEDIIKKALESIIRLNGNASPHCVPNLVDDNGKIVGLSVQTYSSWPRMVFAICWLAYKKGVGDLSFCKKEWDNLVKNGMVWDQPSRINGYNGKAEMNYLDHYIGSPSPWSFLF"

new_seq = Seq(new_seq_data)
# Create a new SeqRecord object
new_record = SeqRecord(
    new_seq,
    id="Somename_",
    name="Somename_",
    description="d1dlwa_ a.1.1.1 (A:) sulfolobuuls sulfactarius {bacteria (extremophile) [TaxId: 2287]}"
)
 
# Add the new SeqRecord object to the sequences list
sequences.append(new_record)
print("New SeqRecord added to the sequences list.")
print("First three entries of the updated sequences list:")
print(sequences[0:3])





from sklearn.manifold import TSNE
import plotly.express as px
import numpy as np
import pandas as pd

# Convert the list of embeddings to a numpy array if not already done
embeddings_array = np.array(embeddings)
protein_sequence_annotations = [str(record.description) for record in sequences]
print(f"Shape of embeddings array before 3D t-SNE: {embeddings_array.shape}")

# Apply t-SNE for 3D dimensionality reduction
tsne_3d = TSNE(n_components=3, perplexity=30, n_iter=300, random_state=42)
embeddings_3d = tsne_3d.fit_transform(embeddings_array)
print(f"Shape of embeddings array after 3D t-SNE: {embeddings_3d.shape}")

# Create a DataFrame for Plotly
tsne_df_3d = pd.DataFrame(embeddings_3d, columns=['TSNE1', 'TSNE2', 'TSNE3'])
# Create a category column to highlight the last added sequence
tsne_df_3d['category'] = 'Other sequences'

# Assuming the last element in `embeddings` (and thus `embeddings_3d`) corresponds to the last added sequence
tsne_df_3d.loc[len(tsne_df_3d) - 1, 'category'] = 'Last added sequence'  

# Define custom colors
color_map = {'Other sequences': 'blue', 'Last added sequence': 'red'}
# Visualize with Plotly 3D scatter plot, coloring by the new category
fig_3d = px.scatter_3d(
    tsne_df_3d,
    x='TSNE1',
    y='TSNE2',
    z='TSNE3',
    color='category', # Color points based on the new category
    color_discrete_map=color_map, # Apply custom colors
    title='3D t-SNE Visualization of Protein Sequence Embeddings (Last Added Highlighted)',
    hover_name=protein_sequence_annotations[:len(embeddings_array)]
)
fig_3d.update_layout(
    height=800 # Increase the height of the plot
)
fig_3d.show()

C2 Protein Folding

  1. Fold your protein with ESMFold. Do the predicted coordinates match your original structure?
  2. Try changing the sequence, first try some mutations, then large segments. Is your protein structure resilient to mutations?

ESM fold generated protein looks very different from the pyMOL proteins.

Changed the 162nd amino acid to T since according to previous analysis, it was supposed to be very beneficial

C2 Protein Generation

  1. Analyze the predicted sequence probabilities and compare the predicted sequence vs the original one.
  2. Input this sequence into ESMFold and compare the predicted structure to your original.

Generated sequence:

T=0.1, sample=0, score=0.7474, seq_recovery=0.5031 MIEFPKDFKFGFSTHPYSSYMGTPGSENPNTDWYAWHRDPENIAAGLVSGDNIDNGPGWWGNYQKYIDAAKELGATIIRTSVDWSAIFPKPLPPDPDFDPASKRVDSVEINPERIAELKKYANQAAIAHYKKILTAIKDAGFHIILNIWAGPLPLWLFDALAVRKGDFSTPIGWLDPRTVVEFALYAGFIVSEFKDLIDEIEVMNEPNDYSYFGYIETEKGYPPGYKSEELSEIALRNLIQAIALARDAIKKVSDLPVGISINTRGFKPATPDDKEAVEKARYDNVDKFLDAIINGVYSEGGEEKTDPSLKGSLDFIGVNYYTYDVVKKEGGGYEILDGYGWRCKKNSVDANGNPTDEAGYEFYPEGLYDVLKYLYDKYKLPMYVTEFGIADKDGTMQPYYIVSNIAQVAKAIKDGVDIKGALYHHLADEYHWSKGWALQYGLLSVDFATKELSRKPAADVFKKIATNNAIPDEIAHLIAKPDVSPLPK

The sequence that is generate is completely different from the original, it is longer and has completely different sequences. amino acid probabilities seem completely random

Regenerated inverse folded protein:

This looks drastically different, with one extra helix, and different placement for loops as well as sheets

Pard D - Group Brainstorm on bacteriophage engineering

  • Form team of 3-4 people
  • Read through phage reading resources
  • Review bacteriophage final project goals for engineering the L-protein
  • Do brainstorm session
  • Include plan to engineer phage on the website

Group brainstorm result:

  1. Use esmfold to see in what ways the protein can mutate.
  2. Use analytical methods to find target protein regions.
  3. Using genomic language models for creating more lytic proteins and their target regions.
  4. Check folding again using alphafold

Week 5 HW: Protein design part 2

  • Design short peptides that bind mutant SOD1.
  • Then decide which ones are worth advancing toward therapy.

Uniprot SOD1 protein sequence:

sp|P00441|SODC_HUMAN Superoxide dismutase [Cu-Zn] OS=Homo sapiens OX=9606 GN=SOD1 PE=1 SV=2 MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Apply A4V mutation: (ignoring Methionine) The Bolded V used to be A MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Pepmlm

Generated peptides and their perplexity score:

  • KRYPAAGIELKE - 17
  • WSYPPTAAEHWK - 15
  • WSYPWAAAKHAX - 12
  • WHYYWYAARRKX - 11
  • FLYRWLPSRRGG - 20

Perplexity is how well a probability model predicts a sample. Perplexity can be unbounded, but the closer to 1 the more well the language model predicts a sample. our scores are quite close pepmlm peptides pepmlm peptides

Alphafold

For the FLYRWLPSRRGG peptide: iPTM score - 0.36