A1. I want develop a living biological tool that works like a chromatography instrument. I have been thinking since a long time that what if we could use 3D bioprinting to create a living tissue (I like to call it an ‘organstrument’) that can selectively bind and separate ions/molecules. I propose it could work similar to a ion-exchange / affinity chromatography columns but instead of using mechanical parts, it would be bio-engineered. It would be made of cells and biomaterials that do the separation biologically.
Homework: Final Project What will I measure? My final project involves modifying an existing flowering plant species to enhance anthocyanin pigment production. So that’s metric number one. How much production is occurring. Now the reason we’re tweaking anthocyanin production is to turn the petals of the plant into reliable pH indicators. I would also need to measure the change in color, the rate of deterioration of pigment after plucking the petal. The correlation between temperature and pigment concentration and also the overall pigment concentration in petals, if it can be even roughly standardized (all petals might not have exact amount for it to function as intended. so we tweak and see if at least all petals have similar concentration and if not then what is the limiting factor (specific env. conditions?))
Week 2 : Pre-HW Professor Jacobson: A1. DNA polymerase with proofreading has an error rate of about 1 error per 10⁶ bases (10⁻⁶). This is due to its proofreading and exonuclease activity. The human genome is about 3.2 billion base pairs. At a raw error rate of 10⁻⁶, replication would introduce thousands of errors per genome copy, which is unacceptable. Biology deals with this via multiple layers of correction, DNA Polymerase proofreading, post replication mismatch repair and other such systems.
Assignment : Python Script for Opentrons Artwork I had to write a Python script for a art design. I chose to create a silhouette of the Indian subcontinent, with my city being highlighted. I did that using the Opentrons Artwork website. I thought I will make a pattern of sorts with code but I realized that would time consuming and not very symbolic as such. I got a clipart of India from google and cropped it and then used that too generate my artwork. It didn’t look very good, I had to fiddle around with the contrast, brightness and other values to make it work. It still wasn’t looking how I’d expected it too. I decided to redo it.
Part A : Conceptual Questions 1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons) Amino Acids are protein building blocks, so whatever percent of protein the meat contains is technically the AA content. A quick google search tells me that most cooked meats contain 20%-30% protein by weight. I’ll take 25% as my number. Now, 25% of 500g is 125g. (500/4)
Part A: SOD1 Binder Peptide Design Part 1: Generating Binders with PepMLM Part 2: Evaluating Binders with AlphaFold3 Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse Part 4: Generate Optimized Peptides with moPPIt Part B: BRD4 Drug Discovery Platform Tutorial Part C: Final L Protein Mutants
DNA Assembly What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?
The Phusion Hi-Fi PCR Master Mix has multiple components like the Phusion Hot Start II DNA Polymerase is the central enzyme. It is a polymerase enzyme with 3’ to 5’ exonuclease activity that corrects mismatched bases and therefore has low error rates than Taq polymerase. The ‘Hot Start’ part in the name refers to the modification done to enzyme to keep it inactive until the initial denaturation step so that the polymerase doesn’t amplify some other DNA at room temperature. The mix also contains dNTPs which is a given as the nucleotides are the building blocks used for extension. Another component is MgCl2 which is a cofactor required for polymerase activity, The magnesium ion helps in catalysis of the phosphodiester bond formation between nucleotides. Magnesium ions are also important to form active substrate from dNTPs which is recognized by polymerase. (The magnesium ions neutralize some of the charge of the triphosphate group so they can fit into the active site of the polymerase without hindrance) Other components in the mix are Reaction Buffer and Stabilizers. The buffer maintains the optimal pH and Ionic strength for enzyme activity. What are some factors that determine primer annealing temperature during PCR?
What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions?
Traditional genetic circuits are boolean, like the question says. Therefore, they can be either ‘on’ or ‘off’ and only can compute boolean functions. Limiting the cell’s computational ability. IANNs are different in the way that they produce continuos signals, they can take in multiple inputs. I think the benefits of IANNs over conventional genetic circuits are synonymous to the benefits of a neural network over a hard-coded solution. IANNs can react to novel inputs whereas the conv. genetic circuits can only respond to the input they were designed for. Describe a useful application for an IANN; include a detailed description of input/output behavior, as well as any limitations an IANN might face to achieve your goal.
Homework Part A: General and Lecturer-Specific Questions General homework questions Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production. Describe the main components of a cell-free expression system and explain the role of each component. Why is energy provision regeneration critical in cell-free systems? Describe a method you could use to ensure continuous ATP supply in your cell-free experiment. Compare prokaryotic versus eukaryotic cell-free expression systems. Choose a protein to produce in each system and explain why. How would you design a cell-free experiment to optimize the expression of a membrane protein? Discuss the challenges and how you would address them in your setup. Imagine you observe a low yield of your target protein in a cell-free system. Describe three possible reasons for this and suggest a troubleshooting strategy for each. Homework question from Kate Adamala Design an example of a useful synthetic minimal cell as follows:
Subsections of Homework
Week 1 HW: Principles and Practices
A1. I want develop a living biological tool that works like a chromatography instrument. I have been thinking since a long time that what if we could use 3D bioprinting to create a living tissue (I like to call it an ‘organstrument’) that can selectively bind and separate ions/molecules. I propose it could work similar to a ion-exchange / affinity chromatography columns but instead of using mechanical parts, it would be bio-engineered. It would be made of cells and biomaterials that do the separation biologically.
The idea came to me while attending an industrial training program wherein we were being taught chromatography and other such techniques, the program also had a guest lecture on 3D bioprinting. While being taught HPLC, I asked if I could do it hands-on and try things out. The lecturer refused as if anything went awry, it would cost the university a lot of money as the column could get damaged. I thought that if the current instruments are so expensive and rigid, how about entirely replacing them entirely and biologically. The bio-nature might allows for self-adaptation and regeneration. Perhaps this could open way for bio-disposable columns (one and done kind of way). One thing that also sort of fascinates me is the ethics behind idea of using a living system as a tool.
A2. Overall goal should be that these ‘organstruments’ are developed and used safely without causing any kind of harm. The goal can be divided into 2 further subgoals.
a. Prevent harm - if the tool is highly efficient, it has to be made sure that it is not used for concentrating toxic compounds. The access will have to monitored or the tool will have to be tested for alternate use cases.
b. Ensure safety and environmental protection - reduce risks of contamination and make sure these tissues cannot evolve outside controlled settings.
A3. Governance Actions
Safe by Design - Biosafety is a major factor to consider if these tools are going to be used. Biosafety as of now depends on lab training and rules, we would have to make the tools safe by design. Incorporating specific nutrient dependence to function or using non replicating cells would make it so. The assumptions here are that these safety mechanism work properly and reliably and that the standards of use are being followed honestly. Risks - safety mechanisms fail over time. complexity of design.
Class based tools division - Dividing the organstruments into different categories based on their risk would make it easy to ensure/reduce malicious use of the tools. low risk tools would be open to use, high risks would have restricted access and so on. The assumption is that risks can be clearly defined. Reliance on too much documentations, approvals for restricted access could slow down research.
Transparency and shared registry - Researchers would voluntarily register organstrument designs, uses and safety features. if we provide incentives to them for doing so, the sharing of information would help in making the tools better and safer. The risks here would be that the shared information could be misused, the friction of registration could lead researchers to not register, therefore the registration process would have to be made smoother.
Does the option:
Option 1
Option 2
Option 3
Enhance Biosecurity
• By preventing incidents
1
2
3
• By helping respond
2
1
2
Foster Lab Safety
• By preventing incident
1
2
2
• By helping respond
2
2
1
Protect the environment
• By preventing incidents
1
2
2
• By helping respond
2
2
1
Other considerations
• Minimizing costs and burdens to stakeholders
2
3
1
• Feasibility?
1
2
1
• Not impede research
2
3
1
• Promote constructive applications
2
2
1
Week 10 HW: Imaging and Measurement
Homework: Final Project
What will I measure?
My final project involves modifying an existing flowering plant species to enhance anthocyanin pigment production. So that’s metric number one. How much production is occurring. Now the reason we’re tweaking anthocyanin production is to turn the petals of the plant into reliable pH indicators. I would also need to measure the change in color, the rate of deterioration of pigment after plucking the petal. The correlation between temperature and pigment concentration and also the overall pigment concentration in petals, if it can be even roughly standardized (all petals might not have exact amount for it to function as intended. so we tweak and see if at least all petals have similar concentration and if not then what is the limiting factor (specific env. conditions?))
What would I like to measure and how ?
I am not really about the entire list of elements but I have an idea. I would like to measure:
- total anthocyanin pigment concentrations in petals (HPLC-MS to identify the kinds of anthocyanins being produced) (to know if enough pigment is being produced to even have a color change)
- petals color response to pH, variablity between two petals/two plants (no idea, some type of colorimetry i guess. Taking known pH solutions to test) (to check if the plant even works reliably)
- pH response accuracy compared to high accuracy pH meters/litmus paper (create some sort of calibration curve and compare to pH meters) (to see how the petals compare to standard methods)
- some measure of the gene expression. (using RNA-seq? )(to check the expression level of the inserted/modified genes)
- metabolic activity / metabolomics (LC-MS to separate and identify metabolites) to optimize the anthocyanin synthesis pathway)
What technologies would I use?
Mostly chromatographic techniques to separate and study the pigments, to study the metabolic activity. Techniques like RNA-seq will help me study the genes behind the metabolic pathways and look for ways to optimize/upregulate.
Side Note
While trying to figure out how to make this work, I came across a species of flowers ‘Clitoria ternatea’ also known as Butterfly Pea.
The flower already shows a wide range of color change to changes in pH and that too across a large range. As the flower contains ternatins, one of the most stable anthocyanins, they show a color change across the 4-12 pH range which makes them pretty usable. I think they can be picked as the candidate species as only an upregulation of existing pathways and optimization and sterility induction could make the final project possible.
I also found out that ternatins kill cancer cells and also inhibit fat accumulation, which led me to think maybe a tea from butterfly pea would help me with the easy fat that my body is genetically inclined to store and turns out Butterfly pea tea is a REAL THING!
Homework: Waters Part I — Molecular Weight
Assignees for this section
MIT/Harvard students
Required
Committed Listeners
Required
We will analyze an eGFP standard on a Waters Xevo G3 QTof MS system to determine the molecular weight of intact eGFP and observe its charge state distribution in the native and denatured (unfolded) states. The conditions for LC-MS analysis of intact protein cause it to unfold and be detected in its denatured form (due to the solvents and pH used for analysis).
Based on the predicted amino acid sequence of eGFP (see below) and any known modifications, what is the calculated molecular weight? You can use an online calculator like the one at https://web.expasy.org/compute_pi/
eGFP Sequence: MVSKGEELFTG VVPILVELDG DVNGHKFSVS GEGEGDATYG KLTLKFICTT GKLPVPWPTL VTTLTYGVQC FSRYPDHMKQ HDFFKSAMPE GYVQERTIFF KDDGNYKTRA EVKFEGDTLV NRIELKGIDF KEDGNILGHK LEYNYNSHNV YIMADKQKNG IKVNFKIRHN IEDGSVQLAD HYQQNTPIGD GPVLLPDNHY LSTQSALSKD PNEKRDHMVL LEFVTAAGIT LGMDELYKLE HHHHHH Note: This contains a His-purification tag (HHHHHH) and a linker (the LE before it).
Calculate the molecular weight of the eGFP using the adjacent charge state approach described in the recitation. Select two charge states from the intact LC-MS data (Figure 1) and:
Determine for each adjacent pair of peaks using:
Determine the MW of the protein using the relationship between , , and
Calculate the accuracy of the measurement using the deconvoluted MW from 2.2 and the predicted weight of the protein from 2.1 using:
Figure 1. Mass Spectrum of intact eGFP protein from the Waters Xevo G3 LC-MS (a mass spectrometer with 30,000 resolution) with individual charge state peaks labeled with values.
Can you observe the charge state for the zoomed-in peak in the mass spectrum for the intact eGFP? If yes, what is it? If no, why not?
Homework: Waters Part II — Secondary/Tertiary structure
Assignees for this section
MIT/Harvard students
Optional but highly recommended
Committed Listeners
Optional but highly recommended
We will analyze eGFP in its native, folded state and compare it to its denatured, unfolded state on a quadrupole time-of-flight MS. We will be doing MS-only analysis (no liquid chromatography, also known as “direct infusion” experiments) on the Waters Xevo G3-QToF MS.
Based on learnings in the lab, please explain the difference between native and denatured protein conformations. For example, what happens when a protein unfolds? How is that determined with a mass spectrometer? What changes do you see in the mass spectrum between the native and denatured protein analyses (Figure 2)?
Figure 2. Comparison of the mass spectra between denatured (top) and native (bottom) eGFP standard on the Waters Xevo G3 QTof MS.
Zooming into the native mass spectrum of eGFP from the Waters Xevo G3 QTof MS (see Figure 3), can you discern the charge state of the peak at ~2800 ? What is the charge state? How can you tell?
Figure 3. Native eGFP mass spectrum from the Waters Xevo G3 Q-Tof MS. The inset is a zoomed-in view of the charge state at ~2800 on a mass spectrometer with 30,000 resolution.
Homework: Waters Part III — Peptide Mapping - primary structure
Assignees for this section
MIT/Harvard students
Required
Committed Listeners
Required
We will digest the eGFP protein standard into peptides using trypsin (an enzyme that selectively cleaves the peptide bond after Lysine (K) and Arginine (R) residues. The resulting peptides will be analyzed on the Waters BioAccord LC-MS to measure their molecular weights and fragmented to confirm the amino acid sequence within each peptide – generating a “peptide map”. This process is used to confirm the primary structure of the protein.
There are a variety of tools available online to calculate protein molecular weight and predict a list of peptides generated from a tryptic digest. We will be using tools within the online resource Expasy (the bioinformatics resource portal of the Swiss Institute of Bioinformatics (SIB)) to predict a list of tryptic peptides from eGFP.
How many Lysines (K) and Arginines (R) are in eGFP? Please circle or highlight them in the eGFP sequence given in Waters Part I question 1 above. (Note: adding the sequence to Benchling as an amino acid file and clicking biochemical properties tab will show you a count for each amino acid).
How many peptides will be generated from tryptic digestion of eGFP?
Copy/paste the sequence above into the input box in the PeptideMass tool to generate expected list of peptides.
Use Figure 4 below as a guide for the relevant parameters to predict peptides from eGFP.
Click “Perform the Cleavage” button in the PeptideMass tool and report the number of peptides generated when using trypsin to perform the digest.
Figure 4. Example conditions for predicting the number of tryptic peptides from the eGFP standard. Please replicate all parameters shown above.
Based on the LC-MS data for the Peptide Map data generated in lab (please use Figure 5a as a reference) how many chromatographic peaks do you see in the eGFP peptide map between 0.5 and 6 minutes? You may count all peaks that are >10% relative abundance.
Figure 5a. Total ion chromatogram (TIC) of the eGFP peptide map. The peak at 2.78 minutes is circled, and its MS data is shown in the mass spectrum in Figure 5b, below.
Assuming all the peaks are peptides, does the number of peaks match the number of peptides predicted from question 2 above? Are there more peaks in the chromatogram or fewer?
Identify the mass-to-charge () of the peptide shown in Figure 5b. What is the charge () of the most abundant charge state of the peptide (use the separation of the isotopes to determine the charge state). Calculate the mass of the singly charged form of the peptide () based on its and .
Figure 5b. Mass spectrum figure to show for the chromatographic peak at 2.78 min from Figure 5a above. The inset is a zoom-in of the peak at 525.76, to discern the isotope peaks.
Figure 5c. Fragmentation spectrum of the peptide eluting at retention time 2.78 minutes in Figure 5a (above).
Identify the peptide based on comparison to expected masses in the PeptideMass tool. What is mass accuracy of measurement? Please calculate the error in ppm. (Recall that )
What is the percentage of the sequence that is confirmed by peptide mapping? (see Figure 6)
Figure 6. Amino Acid Coverage Map of eGFP based on BioAccord LC-MS peptide identification data.
Bonus Peptide Map Questions
Can you determine the peptide sequence for the peptide fragmentation spectrum shown in Figure 5c? (HINT: Use your results from Question 2 above to match the peptide molecular weight that is closest to that shown in Figure 5b. Copy and paste its sequence into this tool online to predict the fragmentation pattern based on its amino acid sequence: http://db.systemsbiology.net/proteomicsToolkit/FragIonServlet.html. What is the sequence of the eGFP peptide that best matches the fragmentation spectrum in Figure 5c?
Does the peptide map data make sense, i.e. do the results indicate the protein is the eGFP standard? Why or why not? Consult with Figure 6, which depicts the % amino acid coverage of peptides positively identified using their calculated mass and fragmentation pattern.
Homework: Waters Part IV — Oligomers
Assignees for this section
MIT/Harvard students
Required
Committed Listeners
Required
We will determine Keyhole Limpet Hemocyanin (KLH)’s oligomeric states using charge detection mass spectrometry (CDMS). CDMS single-particle measurements of KLH allow us to make direct mass measurements to determine what oligomeric states (that is, how many protein subunits combine) are present in solution. Using the known masses of the polypeptide subunits (Table 1) for KLH, identify where the following oligomeric species are on the spectrum shown below from the CDMS (Figure 7):
7FU Decamer
8FU Didecamer
8FU 3-Decamer
8FU 4-Decamer
Polypeptide Subunit Name
Subunit Mass
7FU
340 kDa
8FU
400 kDa
Table 1: KLH Subunit Masses
Figure 7. Mass spectrum of Keyhole Limpet Hemocyanin (KLH) acquired on the CDMS.
Homework: Waters Part V — Did I make GFP?
Assignees for this section
MIT/Harvard students
Required
Committed Listeners
Required
Please fill out this table with the data you acquired from the lab work done at the Waters Immerse Lab in Cambridge, or else the data screenshots in this document if you were unable to have lab work done at Waters.
Theoretical
Observed/measured on the Intact LC-MS
PPM Mass Error
Molecular weight (kDa)
Week 2 HW: DNA Read, Write & Edit
Week 2 : Pre-HW
Professor Jacobson:
A1. DNA polymerase with proofreading has an error rate of about 1 error per 10⁶ bases (10⁻⁶). This is due to its proofreading and exonuclease activity. The human genome is about 3.2 billion base pairs. At a raw error rate of 10⁻⁶, replication would introduce thousands of errors per genome copy, which is unacceptable. Biology deals with this via multiple layers of correction, DNA Polymerase proofreading, post replication mismatch repair and other such systems.
A2. The genetic code is degenerate in nature, therefore there could be many possible DNA sequences that could encode the same protein sequence. There could be in theory, millions of DNA sequences that could encode one protein. Most of these sequences wouldn’t work due to biological constraints like Codon bias, repetitive sequences causing errors, Inhibiions in transcription/translation.
Dr. LeProust:
A1. The most common method is chemical phosphoramidite DNA synthesis. It works via:
Stepwise base addition
Chemical protection/deprotection cycles
Typically ~5 minutes per base addition
A2. It is harder to synthesize oligos longer than ~200 nt as errors accumulate with the addition of every base. Depurination and incomplete reactions increase with time and by the time you reach ~200 nt the drops in yield and accuracy make the product unreliable.
A3. Directly synthesizing a 2000 bp gene would accumulate too many errors to be of actually use. The yield would be extremely low and the process would be expensive. We use assembly based approaches to make long genes instead via using medthods like Gibson Assemby then doing sequencing and error correction later on.
George Church:
A1. There are 10 essential amino acids that animals cannot synthesize and must get from food. They are:
I wasn’t aware of the ‘Lysine Contingency’ but a quick google search revealed that it is a reference to “Jurassic Park” wherein they engineer dinosaurs so they cannot synthesize lysine and must receive it externally, acting as a biological control mechanism. In reality, all animals already lack the ability to synthesize lysine, making them inherently dependent on plants and microbes. This makes the Lysine contingency an actually real thing, but if such dependencies could be engineered then it could be used to control organisms.
Part 1: BENCHLING ADVENTURES AND GEL ART
This week’s homework was pretty daunting as it involved Benchling. Something I’d never heard of before. I just decided to follow, the steps and figure out stuff as I go.
After creating a Benchling account and logging in, I was greeted by a screen that looked so complex. A plasmid on the right, DNA sequence on the left, a lot of restriction sites. I decided to just follow the next step. After clicking on the ‘plus’ icon and selecting Import DNA/RNA sequence.
A pop up window asked me to upload the DNA Sequence, I thought I could just add the accession number or something (Something I’d used in my graduate biotechnology coursework) I wasn’t sure, so I still decided to stick to the scaffold and just follow the next step. :)
The link to lambda DNA Sequence was in the Google Doc for the homework, I opened the link and right-clicked to save the file.
I made sure that I saved with the .gb extension as I was downloading a GenBank file, it was being downloaded as a .txt file. (I didn’t want any uploading problems)
Then I just drag and dropped the .gb file to the Benchling pop-up window and the sequence started to be uploaded. (So far so good.)
I was awestruck when I saw the screen post sequence upload; I was being overwhelmed with information. Everywhere I looked, there was something new yet it seemed familiar. I then found the digest button in the side panel on the left (SCISSOR ICON).
Clicking on the scissor icon, another panel for ’new digest’ opened up and it seemed intuitive. I was supposed to add the enzymes from the HW Doc, and then do an in-silico restriction of the DNA. I managed to add all the enzymes into the list and then clicked on the big blue ‘RUN DIGEST’ button.
Okay so before moving ahead. I was very intimidated by Benchling and the entire homework so I had tinkered around in the whole HW Doc and I had also visited the DNA Gel Art Interface website by ‘rdonovan’ (https://rcdonovan.com/gel-art) At first I wasn’t able to understand what was happening, I only had a general idea of what this was, but there was no tutorial/tooltips, I wasn’t sure what button did what. After selecting/deselecting enzymes and pressing the arrows, I found out that this was also like Benchling’s Digest thingie but this was quicker and allowed faster tweaks. To make sure, I selected all the enzymes in DGAI (I’ll refer to rdonovan’s website this way to keep things simple) and then tried to replicate the same in Benchling.
While trying to do this, I found out that the table below the enzymes in DGAI was the main thing to focus on. When I clicked on the arrows for a specific well, with a specific combination of enzymes, the table showed what enzymes were used for THAT specific result. This was I was able to find out how to replicate the DGAI gel in Benchling. For some reason, my Benchling results looked slightly different then DGAI. (side by side comparison below)
I thought maybe it was because in Benchling, it showed N/A for KpnI and SacI and in DGAI, they were selected??
I decided that to make my pattern, I’ll tinker around with different enzyme combination and see what they give me, note that down and then see what I could muster up from the patterns. I did think that maybe I could reverse engineer somehow that this combination gives these types of bands and then make a program that could somehow tell me the closest enzymes I could use to get a particular result. Like I could select the areas to keep on, like a display but all that abductive reasoning would be of no use as I wasn’t sure if it could be done. I continued with trying to make a pattern art.
This the table that I mentioned a while ago, If you can see that in well 1, the enzymes used are BamHI and SacI. I then used the same combination in Benchling and ran a digest.
Mission Successful! I was able to get how to replicate DGAI Gels in Benchling Digest (mostly.)
Fast forward to after experimenting with multiple combos for an hour or so, I was able to make something that looked like an M. My friends said that, I do see it but to me it looked like that one cat meme (minus the whiskers and ears) in DGAI.
Then I used the table and enzymes combos from DGAI to replicate the digest in Benchling!
I had to get a bit crafty as Benchling didn’t allow an empty digest. I googled which enzyme doesn’t cut lambda DNA.
I used NotI in the digest to get an empty well. :) Mission two Successful! I was able to achieve the same output in Benchling.
It was the end but I clicked on a band in the Benchling Gel and found out that it also shows you the exact point where the cut was made and what made THAT band. I figured that if I want to refine my art further, I can maybe use this information to my advantage.
Part 3: DNA Design
3.1. Choose your protein.
The protein that I would choose is Green Fluorescent Protein. I choose GFP because it is used a lot to track other proteins, to see expression of proteins. It’s just interesting to me that it allows us to study other proteins up close.
The protein sequence for GFP (I used UniProt to get it):
>sp|P42212|GFP_AEQVI Green fluorescent protein OS=Aequorea victoria OX=6100 GN=GFP PE=1 SV=1
MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTL
VTTFSYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLV
NRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLAD
HYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK
3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.
The nucleotide sequence is:
>reverse translation of sp|P42212|GFP_AEQVI Green fluorescent protein OS=Aequorea victoria OX=6100 GN=GFP PE=1 SV=1 to a 714 base sequence of most likely codons. atgagcaaaggcgaagaactgtttaccggcgtggtgccgattctggtggaactggatggc gatgtgaacggccataaatttagcgtgagcggcgaaggcgaaggcgatgcgacctatggc aaactgaccctgaaatttatttgcaccaccggcaaactgccggtgccgtggccgaccctg gtgaccacctttagctatggcgtgcagtgctttagccgctatccggatcatatgaaacag catgatttttttaaaagcgcgatgccggaaggctatgtgcaggaacgcaccatttttttt aaagatgatggcaactataaaacccgcgcggaagtgaaatttgaaggcgataccctggtg aaccgcattgaactgaaaggcattgattttaaagaagatggcaacattctgggccataaa ctggaatataactataacagccataacgtgtatattatggcggataaacagaaaaacggc attaaagtgaactttaaaattcgccataacattgaagatggcagcgtgcagctggcggat cattatcagcagaacaccccgattggcgatggcccggtgctgctgccggataaccattat ctgagcacccagagcgcgctgagcaaagatccgaacgaaaaacgcgatcatatggtgctg ctggaatttgtgaccgcggcgggcattacccatggcatggatgaactgtataaa
I used Bioinformatics.org Reverse Translation tool to reverse translate the AA sequence to the DNA sequence.
3.3. Codon optimization.
I would optimize the codon usage for E.coli because it grows fast, it is well-researched. The thing about GFP is that it will not be the main protein of interest but rather it’ll be used to study one. So if I have a protein that I have expressed in a certain microorganism, then I will have to optimize the codon according to that.
About why do we optimize codons, I know this! I once had a question in my mind that why is thermos thermophilus heat resistant. Why can it live in such high temperatures? I basically went on a bioinformatics quest. To answer the question so I. First, my hypothesis was that maybe it has more GC content because GC has three hydrogen bonds and just having overall more hydrogen bonds would make it more heat stable. Then to validate my hypothesis, I had to see what genes it had and I had to compare it genes with. E coli. I saw that the codons with GC were preferred more. (Codon Bias) https://www.youtube.com/watch?v=1Jrawq9fnMs&t=1791s
The reason that we optimize codons is because certain microorganisms have their own preferences of codons to use. It could be so that if the organism has a preference of a certain codon then that tRNA which is required for the protein expression is in abundance and if you pick a codon whose tRNA is not readily available in that organism, then there is a chance that because of the lack of the tRNA the protein might not be expressed. Therefore in order to increase the chances of expression we have to optimize the codons for our nucleotide sequence.
I used VectorBuilder to optimize the codons. The interface is pretty intuitive. Just paste your sequence, select the organism. (There was also an option to avoid sites for certain restriction enzymes, I think that is to avoid having the sites of the enzymes that we might work with, so that our DNA doesn’t get damaged while working with something else?)
GFP protein DNA sequence with codons optimized for E.Coli ATGAGCAAAGGCGAAGAACTGTTTACCGGCGTGGTGCCGATTCTGGTGGAACTGGATGGCGATGTGAATGGCCATAAATTTAGCGTGAGCGGCGAAGGTGAAGGCGATGCGACCTATGGCAAACTGACCCTGAAATTTATCTGCACCACCGGTAAACTGCCGGTGCCGTGGCCGACCCTGGTGACCACCTTCAGCTACGGCGTGCAGTGTTTTAGCCGCTACCCGGATCATATGAAACAGCATGATTTTTTTAAAAGCGCGATGCCGGAAGGCTATGTGCAGGAACGCACCATTTTTTTCAAAGATGATGGCAATTACAAAACCCGTGCCGAAGTGAAATTCGAAGGCGATACCCTGGTGAATCGCATTGAACTGAAAGGCATTGATTTTAAAGAAGATGGTAACATTCTGGGCCACAAACTGGAATACAACTATAACAGCCATAACGTGTACATTATGGCGGATAAACAGAAAAATGGCATTAAAGTGAACTTTAAAATTCGCCATAACATTGAAGATGGCTCAGTGCAGCTGGCGGATCACTATCAGCAGAACACCCCGATTGGCGATGGCCCGGTTCTGCTGCCGGATAACCACTATCTGAGCACCCAGAGCGCGCTGTCGAAAGATCCGAACGAAAAACGCGATCACATGGTGCTGCTGGAATTTGTGACCGCCGCGGGCATCACCCATGGTATGGATGAACTGTATAAA
3.4. What do we do with the sequence?
The sequence can be used to chemically synthesize the DNA and then be put into a plasmid. The plasmid can then be inserted into our host organism (via electroporesis?) and then our protein can be expressed.
3.5.
Describe how a single gene codes for multiple proteins at the transcriptional level.
A single gene can code for multiple proteins via something called ‘Alternative Splicing’. Different combinations of exons are joined together from same pre-mRNA, to create varied mRNA molecules, this allows one gene to produce multiple protein isoforms (variants).
I aligned the DNA sequence, the transcribed RNA, and also the resulting translated Protein!!! (Using Photoshop, stitching together screenshots from Benchling)
BTS: Aligning Sequences in Photoshop
Part 4 : Fake Twist DNA Synthesis Order
I just created a account using the ‘Sign Up’ Button. Pretty simple stuff really. Just add details, set a password. In the organization field I added HTGAA and Lab = 2026. I didn’t really think too much. I verified my email and I was in! I didn’t have to create a Benchling account as getting through Part 1 of the HW required using Benchling, so I already had an account.
4.2. Build Your DNA Insert Sequence
I imported the DNA Sequence into Benchling, just like from Part 1. Selected Linear topology as this is meant to be inserted into a circular vector of our choice. As I was going ahead, i realized that the exercise is already making use of GFP as an example. (well, good for me :))
I went through the sequences given in the HW document and then pasted the sequences into the Benchling file one after the other (Just the way we imported a sequence in Part 1, but here I had to copy everything one by one and then paste). I annotate the sequences based on the information in the HW document. (screenshots below on how, from the HW Doc)
I then went and exported the sequence, by clicking on the metadata tab and then clicking on the three dots and selecting export sequence, I selected FASTA format to export.
4.3. Benchling to Twist: continuing with our fake order
On the Twist E-commerce platform, I went and selected Genes -> Clonal Genes. (Screenshot from HW Doc)
Then I had to import my sequence, I drag and dropped the FASTA file that I downloaded from Benchling.
After the sequence had been uploaded successfuly, I clicked on the sequence and I saw this screen. (the twist platform also allows you to do codon optimization, niceee!)
I had to refer to the HW Doc to know what was next. Turns out I had to select a vector, I did that by clicking on select vector option on the sequence, a drop-down dialog allowed me to choose a vector in ‘Cloning’. I chose pTwist Amp High Copy based on the HW Doc.
Then I clicked on my sequence again to see the ‘construct’. I pressed the ‘Show Construct’ button to view the construct and I was able to see two different tabs.
Sequence
Circular
Then I clicked on the Download Construct link to download the GenBank file to my construct. (screenshot below, from HW Doc)
I downloaded the GenBank file of my construct and imported it to Benchling.
Part 4: Done! I built a plasmid with my own DNA of choice that is ready to insert! exhilirating feeling!
Part 5: DNA RW+
5.1 DNA Read
(i) What DNA would you want to sequence (e.g., read) and why?
I would want to sequence my own DNA. I’ve wanted to understand for a long while, what makes me, ‘ME’. What is my ancestry, what genes have I carried. Why am I naturally strong but fat? Why can I conserve muscle by little workout but fat just never budges? It might seem a bit small but yes I would want to read my own DNA first. (priority-wise)
(ii) In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why?
I would use WGS (Whole Genome Sequencing) using Illumina (Next-Gen Sequencing) as to analyze my genome, I would requite a method that covers the entire genome with high accuracy. Illumina’s NGS offers that high capability.
Is your method first-, second- or third-generation or other? How so?
The method is second generation as first generation methods like Sanger sequencing make use of chain-termination methods to sequence DNA and the third generation methods provide single molecule real time reading. WGS uses parallel sequencing.
What is your input? How do you prepare your input (e.g. fragmentation, adapter ligation, PCR)? List the essential steps.
The input would be a DNA Sample of mine (blood/saliva). The steps to prepare would include fragmentation, (breaking the DNA down using enzymes) attaching adapters to allow primers to bind and if the sample is little, then PCR (to amplify the DNA, to make sure there’s enough to sequence)
What are the essential steps of your chosen sequencing technology, how does it decode the bases of your DNA sample (base calling)?
The fragments of DNA attach to the flow cell and undergo bridge amplification to create clusters of identical strands. The fluoroscently labeled reversible terminator nucleotides are added and then the polymerase adds a single matching base to the growing strand. (Thus this method is also called Sequencing by Synthesis) then a sensor captures the fluoroscent signal to identify which base was added, the terminator and dye are cleaved off, then next cycle begins.
What is the output of your chosen sequencing technology?
The output of this sequencing method is FASTQ files. They are like FASTA files but in FASTQ files theres also a Q (quality score) for every base, indicating how confident the machine is.
5.2 DNA Write
(i) What DNA would you want to synthesize (e.g., write) and why?
I would want to synthesize the PprI gene from Deinococcus radiodurans. Ever since I heard about a bacterium, surviving Chernobyl levels of radiation. I was pretty fascinated by it. I would want it synthesize its DNA and study it further, perhaps the genes for radiation resistance can be expressed in other organisms to help them operate in radiation heavy environments.
(ii) What technology or technologies would you use to perform this DNA synthesis and why?
I would use the phosphoramidite oligonucleotide synthesis method for DNA synthesis as to I would need accurate synthesis to create gene sequences to insert in other organisms.
What are the essential steps of your chosen sequencing methods?
The first step is In-Silico Design, breaking down the gene of interest into shorter chunks. Removing the chemical cap from previous base to make it reactive then add the next nucleotide to the growing chain. Blocking strands that didn’t accept the new base then oxidation to make the bases stable. Once all the short fragments are made on the silicon chip, they are released and then stitched together to form the full length genes.
What are the limitations of your sequencing method (if any) in terms of speed, accuracy, scalability?
High GC sequences are difficult to synthesize because they form secondary structures and they also have high melting temperatures, which can cause synthesis to fail or introduce errors.
5.3 DNA Edit
(i) What DNA would you want to edit and why?
This might be a bit controversial but I would want to edit my MSTN gene and try and tweak the gene for lower myostatin expression. I would want to make myself more muscular. I am aware however that the cascading effects could be unwelcome and lead to disorders. That is why this is just hypothetical.
(ii) What technology or technologies would you use to perform these DNA edits and why?
The best method I know of CRISPR. CRISPR is the most programmable and efficient method to edit specific genes.
How does your technology of choice edit DNA? What are the essential steps?
CRISPR makes use of a Guide RNA (gRNA) that binds to the Cas9 protein and directs it to the specific DNA sequence in the MSTN gene that matches the guide. The Cas9 nuclease creates a Double-Strand Break at that precise location. The cell then repairs the breakage. If we want to add a gene we provide a template, and the cell uses homology-directed repair to copy the new sequence into the DNA.
What preparation do you need to do (e.g. design steps) and what is the input (e.g. DNA template, enzymes, plasmids, primers, guides, cells) for the editing?
We have to design the sgRNA to ensure it specifically targets the gene of interest with minimal off-target effect potential. Input materials are Cas9 nuclease, gRNA, Template DNA
What are the limitations of your editing methods (if any) in terms of efficiency or precision?
Off-target effects- the nuclease might accidentally cut similar sequences elsewhere and cause mutations.
Getting the gRNA and Cas9 into the cell is difficult.
Not all cells get edited.
Week 3 HW: Lab Automation
Assignment : Python Script for Opentrons Artwork
I had to write a Python script for a art design. I chose to create a silhouette of the Indian subcontinent, with my city being highlighted. I did that using the Opentrons Artwork website. I thought I will make a pattern of sorts with code but I realized that would time consuming and not very symbolic as such. I got a clipart of India from google and cropped it and then used that too generate my artwork. It didn’t look very good, I had to fiddle around with the contrast, brightness and other values to make it work. It still wasn’t looking how I’d expected it too. I decided to redo it.
This is the India 2.0 Art (image below), I like this one much better. The green outline in the previous one was not a very good design choice. I don’t know what I was thinking. I created this one by editing my original clipart then striking a balance in the contrast, brightness values. This one look much better in my opinion.
The coordinates were right below my art on the opentrons art website, I just decided to download the script. I edited the script a bit in the PyCharm (adding my name).
Reading further, I found out that I didn’t even need to download my script. I could just publish my design on the opentrons art website and then share the link. I did exactly that and submitted my script to the google form. But in case, Murphy’s Law decides to apply, here’s my code:
Question 1. Find and describe a published paper that utilizes the Opentrons or an automation tool to achieve novel biological applications.
I thought that I will need to put in effort to find a paper but the Opentrons website has its own embedded search box thingie to find papers. I just selected the area of my preference and then got a paper I liked.
The paper I chose was titled “Optimizing automated layer‑by‑layer deposition of engineered ECM‑like microenvironments for mammalian cell culture” I chose the paper because last year I’d attended a workshop on 3D Bioprinting and I’d learnt about organoid cultures and replication of the ECM-like conditions for better cell cultures, so I was aware about the what the paper was about. Here’s a link to the paper. https://doi.org/10.1557/s43579-025-00912-9
Question 1 Detour: Reading a Paper and Consolidation.
Back in my graduate coursework, I didn’t have to read a lot of papers but my final dissertation did require me to read a lot of papers. I’d chosen a broad topic “Machine Learning in Life Sciences” There were a lot of papers to be read and a lot of distillation to be done. I used to read papers like header-to-footer but I realized that it was not an optimal approach. The very act of reading 30+ papers for my bachelor’s dissertation led me to evolve, iterate on my paper reading method. So now what I do is this.
ABSTRACT! I read the abstract to get a basic idea of what the paper is about. If I don’t get it, perhaps because of a lot of jargons. I search up on the jargons, revisit the fundamentals and then read the abstract again. I write it all down (what I understood from the abstract)
QUESTIONS! I try and see the abstract as a summary of the paper and the expectations I can make, then I skim the paper looking for the references to the abstract. For example, in this case the abstract talked about using heparin and collagen, I skimmed and found out the information related to it, WHY they were using it, WHAT were they creating. I try to connect the components via questions (you can see what I mean in my distillation sheet of sorts)
SKETCHES and FLOWS! I make a lot of sketches where there’s methods/compositions involved. I also try to properly write workflows in a sequence as it aids in a narratively coherent understanding. I DO NOT try to polish it, nobody else has to understand it, only me. I try to not be a perfectionist as then looking of a proper sketch/metaphor often leads to time wasting. (In case, I am making some kind of content, I use this sheet as a start and polish it further. NEVER try to be clean while understanding especially if your sheet isn’t going to be used by someone else!)
DATA! I have an inherent problem with graphs (I am actively trying to mitigate it) so I spend a lot of time, trying to understand graphs and then convert them into legit statements (for my ease, graphs don’t instantly make sense to me for some reason) I verify if the data makes sense with respect to the abstract, how they’ve validated the results, the metrics they’ve chosen (this part can take some time, but with exposure to more papers, it starts getting efficient as validation methods/approaches often follow a core principle that can repeat (it’s like a six basic plots type thing))
I don’t try to make it clean. This is one might seem clean but as I’ve read a lot of papers before this and iterated on my method, I have sort of gained a knowledge of what to write, where and what can be expected so due to prior experience/iterations, the running clutter is naturally reduced.
My distillation sheet for the paper:
Question 2. Write a description about what you intend to do with automation tools for your final project. You may include example pseudocode, Python scripts, 3D printed holders, a plan for how to use Ginkgo Nebula, and more.
For my final project, I have multiple ideas. I could use lab automation to help me perfect the accessory/secondary culture for the second step. Allow me to test various strains of bacteria with the clarified broth from a benchtop bioreactor. The benchtop reactor would contain the target culture producing our product and then I could use multiple engineered cultures with different affinity proteins and test their efficiency in binding to the product, allowing to perfect the downstream culture. this would only leave the engineering / design part to the scientists and take away the tedious repetition.
The lab automations could also help me in testing out an engineered yeast culture for my astrobioreactor. Also if the opentrons could also have a custom automated temperature controlled centrifuge then I could test and iterate on the creation of modified yeast for better shear durability and heat resistance to support fermentation in space.
Final Project Ideas
Week 4 HW: Protein Design I
Part A : Conceptual Questions
1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)
Amino Acids are protein building blocks, so whatever percent of protein the meat contains is technically the AA content. A quick google search tells me that most cooked meats contain 20%-30% protein by weight. I’ll take 25% as my number. Now, 25% of 500g is 125g. (500/4)
Amount of protein = 125g
Now, 1 AA avg. = 100 Daltons. but 1 Dalton = 1 g/mol
so 1 AA = 100g/mol. To find the amount of moles = mass / molar mass
therefore, No. of moles of amino acids = 125 / 100 = 1.25 moles
number of molecules = moles x Avogadro’s Number = 1.25 x 6.022 x 10^23
= 7.527 x 10^23 molecules of amino acids per 500 grams of meat.
2. Why do humans eat beef but do not become a cow, eat fish but do not become fish?
The proteins from other animals are built out of the same universal building blocks and the process of digestion breaks food down into the building blocks. (catabolism) These building blocks are then used to make YOUR own proteins using YOUR DNA. (metabolism) Basically:
When humans eat cow, human body not take cow protein. Human body break down protein into free amino acids. Free amino acid used by human body to make its own protein. Free amino acid not make a human a cow or fish.
3. Why are there only 20 natural amino acids?
They’re basically evolution-wise frozen in place. Early life settled on 20 AA that were chemically diverse enough to build different functional proteins. Once the genetic code was ’locked’ There was no way that evolution could now swap it, it would break everything. 20 amino acids have enough chemical variety to accomplish the protein goal.
4. Can you make other non-natural amino acids? Design some new amino acids.
Yes, it is possible to make new amino acids, labs do this using engineered tRNAs that insert a non-natural AA at the stop codon. An example of a new amino acid created by modifying the side chain is ‘fluorophenylalanine’ - it is basically a phenylalanine with a fluorine atom, making it more stable and UV trackable.
5. Where did amino acids come from before enzymes that make them, and before life started?
The Miller-Urey experiment has shown that lightning + early earth atmosphere could form amino acids spontaneously. Also amino acids have been found in meteorites like the Murchison meteorite, which could indicate that amino acids could’ve come from space. Amino acids aren’t strictly a product of life, but rather a tool life used.
6. If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?
Naturally, α-helixes made from L-amino acids have right handed turns. So logically if we were to use D- amino acids to make α-helixes then they should have left-handed turns.
7. Can you discover additional helices in proteins?
Skipped.
8. Why are most molecular helices right-handed?
Most molecular helices are right handed because the life uses L-AAs. the geometry of L-AAs favors right handed twisting when they form hydrogen bods along a backbone. It is just like the answer of Q.3, L-AAs dominated in the early life and that dominance carried over.
9.Why do β-sheets tend to aggregate? What is the driving force for β-sheet aggregation?
β-sheets have exposed hydrogen bond donors and acceptors along their edges. So when 2 β-sheets meet edge to edge, they form hydrogen bonds with each other and grow in to ordered stacks. The driving forces of this bonding are 1. Hydrogen bonding 2. Van der Waals interactions between sheets 3. Hydrophobic effect - water shoves the sheets together to get those nonpolar faces out of its way.
10. Why do many amyloid diseases form β-sheets? Can you use amyloid β-sheets as materials?
Once a protein misfolds, it can cause other copies of the same protein to misfold in the same way. In cases of misfolds, sometimes a β-strands edge can get exposed, this edge then acts like a template and causes the other proteins to misfold the same way and forms a stack, the stack keeps growing. The result is insoluble amyloid fibrils. Diseases like Alzheimer’s (Aβ plaques), Parkinson’s (α-synuclein), all involve this.
The same reason why amyloid diseases are pathological make them useful. Aggregate materials can be incredibly stable and heat resistant. They’re perfectly ordered. They are self-propagating/assembling.
11. Design a β-sheet motif that forms a well-ordered structure.
Skipped.
Part B : Protein Analysis and Visualization
Part C : Using ML-Based Design Tools
Part D: Group Brainstorm on Bacteriophage Engineering
Week 5 HW: Protein Design II
Part A: SOD1 Binder Peptide Design
Part 1: Generating Binders with PepMLM
Part 2: Evaluating Binders with AlphaFold3
Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse
Part 4: Generate Optimized Peptides with moPPIt
Part B: BRD4 Drug Discovery Platform Tutorial
Part C: Final L Protein Mutants
Week 6 HW: Genetic Circuits I
DNA Assembly
What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?
The Phusion Hi-Fi PCR Master Mix has multiple components like the Phusion Hot Start II DNA Polymerase is the central enzyme. It is a polymerase enzyme with 3’ to 5’ exonuclease activity that corrects mismatched bases and therefore has low error rates than Taq polymerase. The ‘Hot Start’ part in the name refers to the modification done to enzyme to keep it inactive until the initial denaturation step so that the polymerase doesn’t amplify some other DNA at room temperature. The mix also contains dNTPs which is a given as the nucleotides are the building blocks used for extension. Another component is MgCl2 which is a cofactor required for polymerase activity, The magnesium ion helps in catalysis of the phosphodiester bond formation between nucleotides. Magnesium ions are also important to form active substrate from dNTPs which is recognized by polymerase. (The magnesium ions neutralize some of the charge of the triphosphate group so they can fit into the active site of the polymerase without hindrance) Other components in the mix are Reaction Buffer and Stabilizers. The buffer maintains the optimal pH and Ionic strength for enzyme activity.
What are some factors that determine primer annealing temperature during PCR?
The annealing temperature is usually set 5 degree celsius below the melting temperature of the primer. (The melting temperature is the temperature at which half the DNA is separated and half is still attached ) The factors that dictate the melting temperature (and anneling temperature as it is set 5 degrees below Tm) are:
primer length (longer primer = higher melting temperature)
There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other.
The two methods do create DNA fragments but their uses are very different. PCR amplifies existing DNA and creates new fragments. Restriction enzymes cut existing DNA and form fragments, shredding existing into pieces, not creating new DNA.
PCR amplifies a specific sequence via the use of primers and a polymerase whereas Restriction enzymes cut a DNA at specific sites, separating fragments from an existing DNA molecule. PCR only requires knowledge of primers to create a specific fragment, using RE requires knowledge of existing restrictions sites in the DNA.
PCR is preferably when a sequence needs to be amplified and doesn’t exist as an isolated fragment or if you want to work with complex DNA and fragment it fro better study. RE digestion is preferrable, when the DNA contains compatible restriction sites.
PCR can generate any fragment with primer design but RE digestion is limited by where restriction sites naturally occur.
How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning?
Gibson Assembly requires that the insert and vector share 15-30 base pairs of overlapping homologous sequence at their junctions. PCR fragments can be ensured to be Gibson-compliant by designing primers with 5’ overhangs that are homologous to the adjacent sequence in the vector. After the PCR, the overlaps will be built into the ends.
As for RE-digested fragments, they are less commonly used in Gibson Assembly as restriction sites rarely leave exact overlapping sequences. But we can try and verify that the sequences left after digestion are compatible with the adjacent vector. Tools like Snapgene/Benchling can be used for In-Silico verification and confirm that the overlaps are 15-30 bp and contain no repetitive sequences.
How does the plasmid DNA enter the E. coli cells during transformation?
The most common lab method is heat-shock transformation using chemically competent Escherichia coli cells. Cells are treated with cold Calcium Chloride. This helps plasmid DNA stick closer to the cell surface by reducing charge repulsion between the negatively charged DNA and the negatively charged cell membrane. Then the cells are briefly warmed (usually 42°C for 30–45 seconds). This sudden temperature change temporarily makes the membrane more permeable, allowing some DNA to enter the cell. After entering, the Plasmid must remain intact and begin replicating using its own origin of replication.
Another method is Electroporation: A short high-voltage pulse creates temporary pores in the membrane, allowing DNA to enter. This is usually more efficient than heat shock, but it requires special equipment and specially prepared cells.
Describe another assembly method in detail (such as Golden Gate Assembly) in 5-7 sentences with diagrams.
Golden Gate Assembly is a cloning method used to join DNA fragments in a chosen order in one reaction. It uses a Type IIS restriction enzyme such as BsaI, which cuts outside its recognition site and creates custom-designed overhangs. These overhangs allow DNA fragments and the vector to fit together specifically and directionally. The reaction contains both BsaI and T4 DNA Ligase, so cutting and joining happen in the same tube during temperature cycling. Incorrect products are cut again, while correct assemblies remain intact. This makes the method efficient for assembling multiple fragments at once. Golden Gate Assembly is popular because it is fast, accurate, and leaves no unwanted extra sequence at the junctions.
Model this assembly method with Benchling or Asimov Kernel!
Asimov Kernel
Create a Repository for your work
Create a blank Notebook entry to document the homework and save it to that Repository
Explore the devices in the Bacterial Demos Repo to understand how the parts work together by running the Simulator on various examples, following the instructions for the simulator found in the “Info” panel (click the “i” icon on the right to open the Info panel)
Create a blank Construct and save it to your Repository
Recreate the Repressilator in that empty Construct by using parts from the Characterized Bacterial Parts repository
Search the parts using the Search function in the right menu
Drag and drop the parts into the Construct
Confirm it works as expected by running the Simulator (“play” button) and compare your results with the Repressilator Construct found in the Bacterial Demos repository
Document all of this work in your Notebook entry - you can copy the glyph image and the simulator graphs, and paste them into your Notebook
Build three of your own Constructs using the parts in the Characterized Bacterials Parts Repo
Explain in the Notebook Entry how you think each of the Constructs should function
Run the simulator and share your results in the Notebook Entry
If the results don’t match your expectations, speculate on why and see if you can adjust the simulator settings to get the expected outcome
Week 7 HW: Genetic Circuits II
What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions?
Traditional genetic circuits are boolean, like the question says. Therefore, they can be either ‘on’ or ‘off’ and only can compute boolean functions. Limiting the cell’s computational ability. IANNs are different in the way that they produce continuos signals, they can take in multiple inputs. I think the benefits of IANNs over conventional genetic circuits are synonymous to the benefits of a neural network over a hard-coded solution. IANNs can react to novel inputs whereas the conv. genetic circuits can only respond to the input they were designed for.
Describe a useful application for an IANN; include a detailed description of input/output behavior, as well as any limitations an IANN might face to achieve your goal.
A cell engineered with an IANN to continuously monitor the biomarkers of active demyelination, which is a common symptom in grave condtions like MS or Alzheimers. The input could be the MBP (myelin basic protein) fragment concentration (direct readout of myelin dmg) , neurofilament light chain (direct readout of axon dmg) and local ROS (indicator of inflammatory stress) and the output could be a fluoroscent compound or a peptide showing the stage of demyelination.
Below is a diagram depicting an intracellular single-layer perceptron where the X1 input is DNA encoding for the Csy4 endoribonuclease and the X2 input is DNA encoding for a fluorescent protein output whose mRNA is regulated by Csy4. Tx: transcription; Tl: translation.
SKIPPED FOR NOW
Draw a diagram for an intracellular multilayer perceptron where layer 1 outputs an endoribonuclease that regulates a fluorescent protein output in layer 2.
SKIPPED FOR NOW
Assignment 2
What are some examples of existing fungal materials and what are they used for? What are their advantages and disadvantages over traditional counterparts?
Some existing fungal materials are mycelium composites that use dense hyphal network of fungi like Ganoderma grown on agricultural waste to form rigid, foam like structures. They are used for packaging and thermal insulation. The advantages they have are that they are fully biodegradable, the production method is carbon sequestering and they can be grown into arbitrary molds with minimal energie. The disadvantages are that they have lower compressive strength & impact resistance than plastics. They are also very sensitive to moisture and the growth conditions need to sterile which can be tough to maintain in large scale operations. There are also mycoprotein foods that are used as meat substitutes. They are nutritionally well built and do far fewer pollution and can be produced continuously in bioreactors. Fungi have also been used to produce dyes and pigments. They have lower color permanence than their synthetic counterparts.
What might you want to genetically engineer fungi to do and why? What are the advantages of doing synthetic biology in fungi as opposed to bacteria?
The reasonable and next best thing to do would be to engineer fungus to produce materials with more tensile strength and water resistance than the current ones. Fungi can be used as biosensors to express certain colored reporters in responses to environmental pollution. The advantages of fungi over bacteria are that they are eukaryotes therefore closer to human cells. They possess the post-translational modification machinery that is need to produce complex mammalian proteins. Fungi naturally forms large 3D structures making it suitable for large scale production.
Week 9 HW:Cell-Free Systems
Homework Part A: General and Lecturer-Specific Questions
General homework questions
Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production.
Describe the main components of a cell-free expression system and explain the role of each component.
Why is energy provision regeneration critical in cell-free systems? Describe a method you could use to ensure continuous ATP supply in your cell-free experiment.
Compare prokaryotic versus eukaryotic cell-free expression systems. Choose a protein to produce in each system and explain why.
How would you design a cell-free experiment to optimize the expression of a membrane protein? Discuss the challenges and how you would address them in your setup.
Imagine you observe a low yield of your target protein in a cell-free system. Describe three possible reasons for this and suggest a troubleshooting strategy for each.
Homework question from Kate Adamala
Design an example of a useful synthetic minimal cell as follows:
Pick a function and describe it.
What would your synthetic cell do? What is the input and what is the output?
Could this function be realized by cell-free Tx/Tl alone, without encapsulation?
Could this function be realized by genetically modified natural cell?
Describe the desired outcome of your synthetic cell operation.
Design all components that would need to be part of your synthetic cell.
What would be the membrane made of?
What would you encapsulate inside? Enzymes, small molecules.
Which organism your Tx/Tl system will come from? Is bacterial OK, or do you need a mammalian system for some reason? (hint: for example, if you want to use small molecule modulated promotors, like Tet-ON, you need mammalian)
How will your synthetic cell communicate with the environment? (hint: are substrates permeable? or do you need to express the membrane channel?)
Experimental details
List all lipids and genes. (bonus: find the specific genes; for example, instead of just saying “small molecule membrane channel” pick the actual gene.)
How will you measure the function of your system?
Example solution
Based on: Lentini, R. et al., 2014. Nat comm, 5, p.4012.
Pick a function and describe it.
What would your synthetic cell do? What is the input and what is the output? Expand the sensing capacity of bacteria. Input: theophylline (inert to bacteria). Output of the SMC: IPTG. Output of the whole system: GFP produced in bacteria. (Theophyline aptamer reference: Martini, L. & Mansy, S.S., 2011. Cell-like systems with riboswitch controlled gene expression. Chemical Communications, 47(38), p.10734.)
Could this function be realized by cell-free Tx/Tl alone, without encapsulation? No. If the IPTG were not encapsulated, it would go into the bacteria without the need of theophylline-induced membrane channel synthesis, thus the synthetic cell actuator would not exist.
Could this function be realized by genetically modified natural cell? Yes, in this particular case: the theophylline aptamer could be incorporated into a transformed gene. This lacks generality though – it is easier to make SMC than modify bacteria, so in this system a single bacteria reporter can be used to detect various small molecules.
Describe the desired outcome of your synthetic cell operation. In the presence of SMC, bacteria sense theophylline.
Design all components that would need to be part of your synthetic cell.
What would be the membrane made of? Phospholipids + cholesterol.
What would you encapsulate inside? Enzymes, small molecules. cell-free Tx/Tl system, IPTG, gene for membrane transporter under the control of theophylline aptamer.
Which organism your Tx/Tl system will come from? Is bacterial OK, or do you need a mammalian system for some reason? (hint: for example, if you want to use small molecule modulated promotors, like Tet-ON, you need mammalian) Bacterial, because of the theophylline riboswitch used as SMC input.
How will your synthetic cell communicate with the environment? (hint: are substrates permeable? or do you need to express the membrane channel?) The membrane is permeable to the input molecule (theophylline), the output is IPTG that will cross the membrane via the membrane pore created after theophyline-initiated gene expression.
Experimental details
List all lipids and genes. (bonus: find the specific genes; for example, instead of just saying “small molecule membrane channel” pick the actual gene.)
Lipids: POPC, cholesterol
Enzymes: bacterial cell-free Tx/Tl
Genes: a-hemolysin (aHL) to encapsulate in SMC
Biological cells: E.coli transformed with GFP under T7 promoter and a lac operator
How will you measure the function of your system? Measure GFP output of the cells via flow cytometry. Alternatively, use enzymatic reporter, like luciferase, and measure bulk output of the enzyme.
Artificial cells translate chemical signals for E. coli. (a) In the absence of artificial cells (circles), E. coli (oblong) cannot sense theophylline. (b) Artificial cells can be engineered to detect theophylline and in response release IPTG, a chemical signal that induces a response in E. coli.
Homework question from Peter Nguyen
Freeze-dried cell-free systems can be incorporated into all kinds of materials as biological sensors or as inducible enzymes to modify the material itself or the surrounding environment. Choose one application field — Architecture, Textiles/Fashion, or Robotics — and propose an application using cell-free systems that are functionally integrated into the material. Answer each of these key questions for your proposal pitch:
Write a one-sentence summary pitch sentence describing your concept.
How will the idea work, in more detail? Write 3-4 sentences or more.
What societal challenge or market need will this address?
How do you envision addressing the limitation of cell-free reactions (e.g., activation with water, stability, one-time use)?
Homework question from Ally Huang
Freeze-dried cell-free reactions have great potential in space, where resources are constrained. As described in my talk, the Genes in Space competition challenges students to consider how biotechnology, including cell-free reactions, can be used to solve biological problems encountered in space. While the competition is limited to only high school students, your assignment will be to develop your own mock Genes in Space proposal to practice thinking about biotech applications in space!
For this particular assignment, your proposal is required to incorporate the BioBits® cell-free protein expression system, but you may also use the other tools in the Genes in Space toolkit (the miniPCR® thermal cycler and the P51 Molecular Fluorescence Viewer). For more inspiration, check out https://www.genesinspace.org/ .
Provide background information that describes the space biology question or challenge you propose to address. Explain why this topic is significant for humanity, relevant for space exploration, and scientifically interesting. (Maximum 100 words)
Name the molecular or genetic target that you propose to study. Examples of molecular targets include individual genes and proteins, DNA and RNA sequences, or broader -omics approaches. (Maximum 30 words)
Describe how your molecular or genetic target relates to the space biology question or challenge your proposal addresses. (Maximum 100 words)
Clearly state your hypothesis or research goal and explain the reasoning behind it. (Maximum 150 words)
Outline your experimental plan - identify the sample(s) you will test in your experiment, including any necessary controls, the type of data or measurements that will be collected, etc. (Maximum 100 words)