Homework

Weekly homework submissions:

Week 1 HW: Principles and Practices
1. Biological engineering application I propose a “DNA Compiler,” a software tool that helps researchers turn DNA designs into safe, synthesis-ready sequences. The main idea is to build safety checks directly into the design process rather than relying only on downstream screening or manual review. The compiler would analyze a DNA sequence, flag potential issues, and suggest safer alternatives (for example, adjusting sequence features or highlighting areas that require review). It would also generate a clear record of how the design was modified or approved. The goal is to make good safety practices automatic and easy to follow.
Week 2 Homework
Week 2 Homework Documentation This week I worked on DNA gel art, restriction digests, and DNA design. I designed a gel art pattern, ran the gel in lab, and prepared a Benchling construct for a spider silk equivalent protein. Part 1: Benchling and In-silico Gel Art For the in-silico gel art part, I used restriction digests of lambda DNA to plan a gel pattern before running the wet-lab version. The main idea was to use different restriction enzymes to create different DNA fragment sizes, so the bands would appear at different positions in the gel.
Week 3 HW: Lab Automation
Final projects were added to the slide and the python file were submitted too. Post-Lab Questions 1. One paper I found is “AssemblyTron: flexible automation of DNA assembly with Opentrons OT-2 lab robots” by Bryant et al. The paper describes a workflow for using an Opentrons OT-2 robot to automate DNA assembly. Instead of manually pipetting every DNA part, enzyme, and reagent, the robot can set up many assembly reactions in a more consistent way. The authors made this system because DNA assembly is a common step in synthetic biology, but it becomes slow and error-prone when many constructs have to be tested.
Week 4 HW: Protein Design
Part A – Conceptual Questions How many amino acid molecules are in 500 g of meat? A typical amino acid has a mass of about 100 g per mole. If you have 500 g, that corresponds to roughly 5 moles. Since one mole contains about 6 × 1023 molecules, 5 moles would contain about 3 × 1024 amino acid molecules.
Week 5: Protein Design Part II
Week 5: Protein Design Part II Part A: SOD1 Binder Peptide Design Background Superoxide dismutase 1, or SOD1, is an enzyme that helps protect cells from oxidative stress by converting superoxide radicals into hydrogen peroxide and oxygen. SOD1 normally folds into a stable structure and forms a homodimer. It also binds metal cofactors, which are important for its activity.
Week 6 HW
DNA Assembly Homework Phusion High-Fidelity PCR Master Mix contains a high-fidelity DNA polymerase, dNTPs, MgCl2, buffer salts, and stabilizers. The polymerase copies the DNA template, while its proofreading activity helps reduce mutations. The dNTPs are the building blocks used to make the new DNA strand. MgCl2 is needed for polymerase activity, and the buffer keeps the pH and salt conditions suitable for PCR. The master mix format also makes the reaction more consistent because many components are already premixed.
Week 7 HW: Genetic Circuits Part II: Neuromorphic Circuits
Homework: IANNs and Fungal Materials Assignment Part 1: Intracellular Artificial Neural Networks 1. Advantages of IANNs over traditional genetic circuits IANNs can handle more than simple on/off behavior. Traditional genetic circuits often work like Boolean logic gates, where an input is either present or absent and the output is either low or high. In real cells, signals are usually more gradual than that. An IANN can combine several input levels and produce a more flexible output.
Week 9 Homework
Homework: Cell-Free Systems and Synthetic Minimal Cells General homework questions 1. Advantages of cell-free protein synthesis Cell-free protein synthesis is useful because it is faster and easier to control than expression in living cells. Since there are no cells to keep alive, I can directly change the DNA amount, salts, cofactors, energy source, and reaction conditions.
Week 10 Homework
Final Project Measurement For the development of bio-synthetic spider silk musical strings, I will measure several critical parameters to ensure the protein is synthesized correctly, the fiber is engineered for high tension, and the resulting sound meets professional acoustic standards. Aspects to be Measured Protein sequence and purity: I will verify that the recombinant “Mini-Spidroin” matches the intended genetic design and that all bacterial cellular debris has been removed. Protein concentration: The density of the purified protein in the liquid “spin dope” must be quantified to ensure it has the correct viscosity for extrusion. Fiber diameter and morphology: I will measure the thickness of the thread to ensure it remains consistent and within the 0.20–0.30 mm range required for instrument compatibility. Mechanical properties: Specifically, I will measure the tensile strength and elasticity of the dried fiber to determine if it can withstand the high-tension environment of a violin or guitar. Acoustic frequency and harmonics: I will measure the fundamental resonance and the richness of the overtones produced when the string is under load. Measurement Methods and Technologies
Week 11 Homework
Week 11 Homework: Bioproduction and Cloud Labs Part A: The 1,536 Pixel Artwork Canvas I contributed two pixels on the bottom left of the artwork, but for some reason they did not sync or show up correctly in the final version. I still liked the idea of the project because it turned a biology class assignment into a shared artwork, where everyone’s tiny contribution could become part of a larger image.

Week 1 HW: Principles and Practices

1. Biological engineering application

I propose a “DNA Compiler,” a software tool that helps researchers turn DNA designs into safe, synthesis-ready sequences. The main idea is to build safety checks directly into the design process rather than relying only on downstream screening or manual review. The compiler would analyze a DNA sequence, flag potential issues, and suggest safer alternatives (for example, adjusting sequence features or highlighting areas that require review). It would also generate a clear record of how the design was modified or approved. The goal is to make good safety practices automatic and easy to follow.

2. Governance and policy goals

Primary goal: reduce harm while supporting useful biological research.

Sub-goals:

Prevent accidents by identifying risky designs early in the process.
Improve accountability by keeping a clear record of how designs were created and approved.
Avoid slowing research unnecessarily by offering helpful suggestions rather than simply blocking designs.

3. Governance actions

Option 1, Institutional adoption

Research institutions could make the DNA Compiler part of their standard workflow. Before ordering synthetic DNA, researchers would run their designs through the tool.

Purpose: move safety checks earlier in the process.
Design: integrate with existing ordering systems and biosafety review procedures.
Assumptions: researchers will use the tool if it is easy and helpful.
Risks: people may try to bypass it if it becomes too restrictive.

Option 2, Vendor integration

DNA synthesis companies could accept or encourage compiler-generated safety reports when customers submit sequences.

Purpose: create a shared safety baseline across different labs and providers.
Design: vendors recognize a standard report format generated by the compiler.
Assumptions: companies see value in reducing risk and simplifying screening.
Risks: could increase costs or create barriers if requirements are too strict.

Option 3, Shared rule updates

A community group maintains and updates the safety rules used by the compiler as new risks or best practices emerge.

Purpose: keep the tool current as biology advances.
Design: periodic updates distributed to users, similar to software updates.
Assumptions: collaboration improves coverage of new issues.
Risks: disagreements about rules or slow updates.

4. Scoring

(1 = best)

Goal	Option 1	Option 2	Option 3
Enhance biosecurity	1	2	2
Foster lab safety	1	2	2
Protect environment	2	2	2
Minimize burden	2	3	2
Feasibility	1	2	2
Promote constructive uses	1	2	1

5. Prioritization

I would prioritize Option 1 first because it is the most practical starting point. Integrating the DNA Compiler into institutional workflows creates immediate benefits by improving design quality and reducing accidents without requiring major policy changes. After adoption grows, Option 2 can extend the approach across the industry by creating shared standards between labs and vendors. Option 3 should develop alongside these steps to ensure that the rules evolve over time, but it likely works best once the tool already has a strong user base.

Homework Questions from Professor Jacobson

Nature’s machinery for copying DNA is DNA polymerase. According to the lecture slides, an error-correcting polymerase has an error rate of approximately 1 error per 10⁶ bases added.

The human genome is about 3.2 × 10⁹ base pairs long. Comparing these numbers, if replication relied only on polymerase accuracy, we would expect on the order of thousands of errors during replication of a single human genome. This highlights a discrepancy between the intrinsic error rate of polymerase and the need to faithfully copy very large genomes.

Biology resolves this by incorporating multiple layers of error correction. DNA polymerases include proofreading activity that detects and removes mismatched nucleotides during synthesis, and additional repair pathways (such as mismatch repair systems shown in the lecture) further correct errors after replication. Together, these mechanisms allow cells to maintain high fidelity despite the large size of the genome.

The lecture states that an average human protein corresponds to about 1036 base pairs. Since codons consist of three nucleotides, this corresponds to roughly a few hundred amino acids. The genetic code is degenerate, meaning that multiple codons can encode the same amino acid. Because there are 64 possible codons but only 20 amino acids, many different DNA sequences can theoretically encode the same protein sequence. The number of possible coding sequences therefore grows exponentially with protein length, so an average human protein can be encoded by a very large number of distinct DNA sequences.

In practice, not all synonymous sequences work equally well. The lecture shows that nucleotide composition (such as GC content) and sequence-dependent secondary structures affect molecular behavior. Different synonymous sequences can produce different RNA folding patterns or energetics, which can influence transcription, translation efficiency, and stability. As a result, biological and physical constraints limit which DNA sequences successfully produce the desired protein, even if they encode the same amino acid sequence.

Homework Questions from Dr. LeProust

The most commonly used method is solid-phase phosphoramidite chemical synthesis. In this approach, nucleotides are added sequentially to a growing DNA chain attached to a solid support. Each cycle consists of coupling a phosphoramidite nucleotide, capping unreacted sites, oxidation, and deprotection, and this cycle is repeated until the desired length is reached.

Direct oligo synthesis proceeds one base at a time, and each chemical addition step is not perfectly efficient. Because the synthesis is iterative, small inefficiencies compound with every cycle. As the sequence length increases:

The fraction of full-length molecules decreases.
Products accumulate.
Overall yield/purity drop significantly.

This makes it increasingly difficult to obtain high-quality long oligos directly.

A 2000 bp gene would require thousands of sequential chemical coupling steps. Since each step has less than 100% efficiency, the probability of producing a perfect full-length molecule becomes extremely low. Errors and truncations would dominate the product mixture.

Instead, long genes are typically made by synthesizing shorter oligos (example around 100–200 nt) and then assembling them enzymatically into longer fragments or full genes. This avoids the exponential loss in yield and accuracy associated with very long direct chemical synthesis.

Homework Question from George Church

Unlike NA:NA base pairing or the NA to AA genetic code, AA:AA interactions are not defined by a strict one-to-one symbolic mapping. Instead, an AA:AA code would be based on physico chemical compatibility between amino acid side chains. Key rules would include charge complementarity (positive interacting with negative residues), hydrogen-bond donor/acceptor matching, hydrophobic residues packing together, and steric shape complementarity for efficient packing. This is similar to lecture notes framing that different biological codes reflect interaction constraints: DNA basepairs emphasize specific pairing rules, while protein interactions emerge from chemical properties and geometry rather than fixed symbolic pairs.

Week 2 Homework

Documentation

This week I worked on DNA gel art, restriction digests, and DNA design. I designed a gel art pattern, ran the gel in lab, and prepared a Benchling construct for a spider silk equivalent protein.

Part 1: Benchling and In-silico Gel Art

For the in-silico gel art part, I used restriction digests of lambda DNA to plan a gel pattern before running the wet-lab version. The main idea was to use different restriction enzymes to create different DNA fragment sizes, so the bands would appear at different positions in the gel.

Part 2: Gel Art: Restriction Digests and Gel Electrophoresis

I performed the gel art experiment in lab based on the in-silico design. The gel was cast, the DNA digest samples were loaded, electrophoresis was run, and the gel was imaged.

The final gel image showed the wells clearly, but the band pattern was faint and did not come out very strongly. This could have been due to low DNA concentration, loading issues, imaging exposure, or the fragments not separating visibly enough.

Part 3: DNA Design Challenge

3.1 Choose your protein

I chose a spider silk equivalent protein because my final project is about making synthetic spider silk-like material. Spider silk is interesting because it is strong, lightweight, and could be useful as a biomaterial.

Protein sequence:

MSYQQGQGAGAAAAAAAAAAGGAGQGGYGGLGSQGAGAAAAAAAAAAGGAGQGGYGGLGSQGAGAAAAAAAAAAGGAGQGGYGGLGSQGAGAAAAAAAAAAGGAGQGGYGGLGSQGAGAAAAAAAAAAGGAGQGGYGGLGHHHHHH

3.2 Reverse translation

DNA sequence:

TAATACGACTCACTATAGGGAAAGAGGAGAAAATGAGCTATCAGCAAGGCCAGGGTGCGGGCGCAGCCGCGGCTGCAGCCGCGGCTGCAGCCGGTGGCGCGGGCCAGGGTGGCTATGGCGGCCTGGGCAGCCAGGGCGCGGGTGCTGCAGCCGCGGCTGCAGCCGCGGCTGCAGGCGGTGCGGGCCAGGGTGGCTACGGCGGTCTGGGCAGCCAGGGTGCGGGCGCCGCGGCTGCAGCCGCGGCTGCAGCCGCGGGCGGCGCGGGCCAGGGTGGCTATGGCGGCCTGGGCAGCCAGGGCGCGGGTGCTGCAGCCGCGGCTGCAGCCGCGGCTGCAGGCGGTGCGGGCCAGGGTGGCTACGGCGGTCTGGGCAGCCAGGGTGCGGGCGCCGCGGCTGCAGCCGCGGCTGCAGCCGCGGGCGGCGCGGGCCAGGGTGGCTATGGCGGCCTGGGCCATCATCATCATCATCATTAATAA

3.3 Codon optimization

Codon optimization is needed because different organisms use different codons more often, even when the codons encode the same amino acid. If the DNA uses codons that are rare in the host, the protein may express poorly.

I optimized the sequence for bacterial expression, since E. coli is a common host for producing recombinant proteins. This makes sense for my project because I want to test whether a spider silk-like protein can be expressed efficiently.

3.4 You have a sequence, now what?

This DNA sequence can be used as an expression construct. The promoter and RBS help start transcription and translation, the coding sequence encodes the spider silk equivalent protein, and the His tag can help with purification.

In a cell-based system, the DNA could be placed in a plasmid and transformed into E. coli. The cells would transcribe the DNA into mRNA and translate the mRNA into protein. In a cell-free system, the DNA could be added directly to a Tx/Tl reaction to produce the protein without living cells.

Part 4: Prepare a Twist DNA Synthesis Order

For this part, I prepared a DNA synthesis design for my final project. I built an annotated Benchling insert fragment for a spider silk equivalent protein. The construct includes a promoter, RBS, coding sequence, His tag, and stop codon.

I used this insert as the basis for a Twist synthesis setup. The goal was to make a construct that could later be used for expression and testing of a synthetic spider silk-like protein.

Part 5: DNA Read/Write/Edit

5.1 DNA Read

I would want to sequence my spider silk equivalent construct to confirm that the ordered DNA matches the design. This is important because the sequence is repetitive, and even small errors could affect expression or the final protein.

I would use Sanger sequencing because this construct is short enough to check with sequencing primers. Sanger sequencing is a first-generation sequencing method. The input would be purified plasmid DNA or a PCR product. The sample would be prepared with a sequencing primer and sent for sequencing.

Sanger sequencing uses DNA synthesis with fluorescent chain-terminating nucleotides. The output is a chromatogram and a DNA sequence, which I would align to my designed construct.

5.2 DNA Write

I would want to synthesize the DNA sequence for my spider silk equivalent protein. This would let me test whether the designed repetitive protein can be produced and eventually assembled into a silk-like material.

I would use commercial DNA synthesis, such as Twist, because it is faster and more reliable than manually assembling the sequence. The basic steps are to design the DNA, codon optimize it, add expression parts, check the sequence, order it, and then test it in cells or a cell-free system.

One limitation is that repetitive DNA can be harder to synthesize accurately. Longer or highly repetitive constructs may also be more expensive or take longer to produce.

5.3 DNA Edit

I would edit an expression host such as E. coli to improve production of spider silk-like proteins. For example, I might reduce protease activity or improve amino acid supply so the host can make more of the recombinant protein.

I would use CRISPR-Cas9 or recombineering for this. The edit would need a target site, a guide RNA, and a repair template. After editing, I would screen colonies and sequence the edited region to confirm the change.

The main limitations are editing efficiency, possible off-target edits, and the fact that improving protein production may require testing several different edits.

Week 3 HW: Lab Automation

Final projects were added to the slide and the python file were submitted too.

Post-Lab Questions

1.

One paper I found is “AssemblyTron: flexible automation of DNA assembly with Opentrons OT-2 lab robots” by Bryant et al. The paper describes a workflow for using an Opentrons OT-2 robot to automate DNA assembly. Instead of manually pipetting every DNA part, enzyme, and reagent, the robot can set up many assembly reactions in a more consistent way. The authors made this system because DNA assembly is a common step in synthetic biology, but it becomes slow and error-prone when many constructs have to be tested.

I think this is a useful example because it shows automation being used for a real bottleneck in biology. A lot of synthetic biology depends on building and testing many genetic designs, not just making one perfect construct. The Opentrons robot helps make that process more scalable and less dependent on repetitive manual pipetting. This is especially relevant for projects where small volume differences could affect whether a construct works or not.

2. Automation plan for my final project

For my final project, I want to synthetically produce spider silk proteins and compare which expression or assembly conditions give the best silk-like material. The part I would automate is the screening process, since spider silk production depends on many variables: DNA construct design, protein concentration, buffer composition, pH, salt concentration, and drying or assembly conditions. Instead of testing each condition by hand, I would use an Opentrons-style liquid handler to set up a 96-well plate with many small-scale reactions.

A possible workflow would be:

Dispense cell-free protein synthesis master mix or expression reagents into each well.
Add different spider silk protein DNA constructs to different rows or columns.
Add additives or buffer conditions, such as different pH levels, salts, or crowding agents.
Mix each well consistently using repeated pipetting.
Incubate the plate so the silk proteins are expressed.
Transfer small amounts of the expressed protein into assembly wells with different precipitation or fiber-forming conditions.
Measure output using fluorescence, absorbance, viscosity, or imaging to identify which conditions give the strongest or most fiber-like material.

The main reason automation would help is that spider silk synthesis probably needs a lot of trial and error. A robot would let us test many conditions in parallel while keeping the volumes and timing more consistent. This would make it easier to see which variables actually matter, instead of worrying that the result changed because of manual pipetting differences.

Week 4 HW: Protein Design

Part A – Conceptual Questions

How many amino acid molecules are in 500 g of meat?

A typical amino acid has a mass of about 100 g per mole.
If you have 500 g, that corresponds to roughly 5 moles.
Since one mole contains about 6 × 10^{23 molecules, 5 moles would contain about 3 × 10}24 amino acid molecules.

This shows how enormous molecular numbers are, even in everyday amounts of food.

Why don’t we turn into a cow when we eat beef?

When we digest food, proteins are broken down into individual amino acids.
Our body does not keep cow proteins intact.
Instead, we reuse those amino acids to build new proteins based on instructions from our own DNA.

So what we eat provides building blocks, not the identity of the organism.

Is it possible to create new, artificial amino acids?

Yes. Chemists can synthesize amino acids that do not occur naturally.
These can include unusual side chains, special reactive groups, or atoms like fluorine.
Such modified amino acids are used in research to design proteins with new properties.

Where did amino acids originate before life existed?

Amino acids could have formed through simple chemical reactions on the early Earth.
Experiments have shown that under conditions resembling the early atmosphere, amino acids can form from basic gases and energy sources like lightning.
They have also been detected in meteorites, suggesting they may have come from space as well.

What happens if you build an α-helix from D-amino acids?

Natural proteins use L-amino acids and form right-handed helices.
If you instead used D-amino acids, the helix would twist in the opposite direction, forming a left-handed structure.

Are there other types of helices beyond the common ones?

Yes. While the α-helix is the most familiar, researchers have identified and even engineered other helical forms.
With different amino acids or synthetic designs, new helical geometries can be explored.

Why do β-sheets often clump together? What drives this?

β-strands align side by side and form hydrogen bonds.
When these sheets are exposed, they can easily bind to other β-strands.

Aggregation is mainly driven by:

Hydrogen bonding between strands
Hydrophobic side chains packing together
The flat, extended shape of β-sheets that allows stacking

These features make β-sheets prone to sticking together.

Why are β-sheets common in amyloid diseases? Could they be useful?

In amyloid diseases, proteins misfold and reorganize into tightly stacked β-sheet structures.
These assemblies are very stable and resist breakdown, which leads to accumulation in tissues.

However, that same stability and self-assembly make amyloid-like fibers attractive for materials science, where strong and durable nanostructures are useful.

Propose a β-sheet sequence that forms an ordered structure.

A repeating pattern that alternates hydrophobic and polar residues can promote organized packing, for example:

Val–Thr–Val–Thr–Val–Thr

This arrangement allows one face of the sheet to interact with water while the other packs tightly against neighboring sheets, helping create a stable and ordered structure.

Part B: Protein Analysis and Visualization

1. Protein selected

I selected green fluorescent protein, or GFP, from Aequorea victoria. I chose GFP because it is one of the most useful and recognizable proteins in biology. It is widely used as a reporter protein because cells expressing GFP can glow green, which makes it useful for tracking gene expression, localization, transformation, and other biological processes.

I also chose GFP because its structure is directly connected to its function. GFP is not fluorescent just because of a single isolated chemical group. It fluoresces because the protein folds into a beta-barrel structure that protects an internal chromophore. This makes it a good protein for studying how sequence, structure, and function are connected.

2. Amino acid sequence

For the structural analysis, I used GFP structure 1EMA from the Protein Data Bank. This is a GFP structure from Aequorea victoria. The related reference sequence in AlphaFold/RCSB is UniProt P42212, which is 238 amino acids long.

How long is it?

The GFP sequence I used is about 238 amino acids long. This is a convenient size for protein visualization because it is small enough to inspect in a 3D viewer, but large enough to form a stable and recognizable fold.

What is the most frequent amino acid?

In the GFP sequence, glycine is one of the most frequent amino acids. This fits the structure because GFP has many loop and turn regions connecting beta strands, and glycine is useful in flexible or tightly turning parts of a protein.

Other common residues include lysine, leucine, aspartate, valine, glutamate, and threonine. The mix of residue types helps create both the stable folded core and the solvent-facing surface.

How many protein sequence homologs are there?

GFP has many homologs and related proteins. Searching for GFP-like sequences returns natural fluorescent proteins from jellyfish and coral, plus many engineered variants. Examples include EGFP, CFP, YFP, and other color-shifted fluorescent proteins.

These homologs are useful because relatively small sequence changes can alter brightness, folding efficiency, maturation speed, and fluorescence color. This is one reason GFP became such an important tool in biological engineering.

Does the protein belong to a protein family?

Yes. GFP belongs to the GFP-like fluorescent protein family. Members of this family usually share a beta-barrel fold and an internal chromophore. The exact color and brightness can change depending on mutations, especially mutations near the chromophore.

3. Structure page in RCSB

The structure I used was:

Property	Value
Protein	Green fluorescent protein
PDB ID	1EMA
Organism	Aequorea victoria
Experimental method	X-ray diffraction
Resolution	1.90 Å
Approximate length	238 amino acids
Main fold	11-stranded beta barrel with coaxial helix
Main function	Green fluorescence

The RCSB entry for 1EMA describes GFP as an 11-stranded beta barrel with a coaxial helix. It also notes that the chromophore forms from the central helix from the Ser/Thr65-Tyr66-Gly67 motif and that the native fold is required for chromophore formation and fluorescence.

When was the structure solved? Is it a good quality structure?

The GFP structure 1EMA was solved by X-ray crystallography and released in the 1990s. Its resolution is 1.90 Å, which is a good-quality structure. The assignment notes that structures with resolution smaller than 2.70 Å are generally good quality, so 1.90 Å is comfortably within that range.

Are there any other molecules in the solved structure apart from protein?

Yes. The most important non-standard feature is the GFP chromophore. The chromophore is formed inside the protein and is responsible for the fluorescence. In the structure viewer, it appears buried inside the beta barrel rather than exposed on the surface.

Some small solvent molecules are also visible in the viewer. These are not the main focus of the structure, but they appear as small dots around the protein.

Does the protein belong to any structure classification family?

GFP belongs to the GFP-like beta-barrel fluorescent protein structural family. The defining feature is the barrel-like arrangement of beta strands around a central chromophore.

4. 3D visualization

I opened the GFP structure in the RCSB 3D viewer and looked at it in several different representations.

Cartoon and ribbon views

The cartoon/ribbon view makes the overall fold easy to see. GFP forms a beta barrel, with beta strands wrapping around the protein to make a compact cylindrical structure. This is the main structural feature of GFP.

The rainbow cartoon view below shows the folded protein chain and makes the beta-barrel shape clear. The chromophore sits inside the barrel rather than on the outside surface.

Visualizing the beta-barrel structure

The image below shows GFP in a cartoon/ribbon representation where the beta strands are especially visible. The structure is dominated by beta sheets, which wrap around to form the barrel. There are also loop regions and a smaller internal helical region, but beta structure is the main feature.

GFP has more beta sheets than helices. This is expected because the protein’s main architecture is a beta barrel. The small red dots visible around the structure are solvent molecules from the structure viewer, not the main protein chain.

Residue type and hydrophobicity

The residue distribution matches the fold. Hydrophobic residues tend to be more buried, helping stabilize the interior of the protein. More hydrophilic and charged residues are generally more common on the outside surface, where they interact with water.

For GFP, this matters because the chromophore is buried inside the barrel. The protein creates a protected internal environment around the chromophore, while the surface interacts with solvent. This separation helps explain why the fold is so important for fluorescence.

Surface visualization

The surface representation shows GFP as a compact shell. This view makes it easier to see that GFP is not an open structure. It forms a protective surface around the interior.

The surface has small grooves and pockets, but the main point is that the chromophore is protected inside the beta barrel. This protected environment is important because exposing the chromophore directly to solvent would likely reduce or disrupt fluorescence.

Part C: Using ML-Based Protein Design Tools

For the computational part, I used GFP as the test protein because it has both a known experimental structure and a well-understood function. That made it a good case for comparing sequence-level reasoning, structure prediction, and inverse design.

C1. Protein Language Modeling

Protein language models, such as ESM-2, learn patterns from large numbers of protein sequences. They can be used to estimate which amino acids are likely or unlikely at each position in a protein. This makes them useful for mutation analysis, because mutations that look very unnatural to the model are more likely to disrupt structure or function.

1. Deep Mutational Scan

For GFP, I treated the sequence-level analysis as a way to identify which regions of the protein should be most mutation-sensitive. The strongest biological constraint is the chromophore environment. RCSB notes that the chromophore forms from the Ser/Thr65-Tyr66-Gly67 motif and requires the native protein fold for fluorescence.

The main sensitive regions are:

the chromophore-forming motif around positions 65 to 67
residues packed inside the beta barrel
residues that stabilize the beta strands
residues near the chromophore that tune fluorescence

A clear example is position Thr203. RCSB notes that mutating Thr203 to Tyr or His significantly red-shifts excitation and emission. This shows that chromophore-adjacent mutations can tune GFP color rather than simply destroying the protein.

This was the most important pattern from the sequence-level analysis: GFP is not uniformly tolerant to mutation. Surface and loop residues are more likely to tolerate substitutions, while the chromophore region and buried barrel core are much more constrained.

2. Interpreting mutation sensitivity

The mutation analysis makes sense when compared to the structure. GFP has a compact beta barrel, and the chromophore is buried inside it. A mutation on the outside surface may not strongly affect the fold. A mutation inside the barrel core can destabilize packing. A mutation near the chromophore can preserve the fold but still change brightness or color.

Region	Expected mutation tolerance	Reason
Surface loops	Higher	Less involved in core packing
Solvent-facing residues	Moderate	Often less structurally constrained
Beta-strand core	Lower	Needed for barrel stability
Chromophore motif	Very low	Required for fluorescence
Thr203 region	Functionally sensitive	Can red-shift fluorescence

This distinction is important because a language model may identify sequence plausibility, but GFP function depends on both fold stability and local chromophore chemistry.

3. Latent space analysis

For latent space analysis, GFP is expected to sit near other GFP-like fluorescent proteins and engineered GFP variants. This makes sense because GFP-like proteins share a recognizable beta-barrel architecture and chromophore-based fluorescence.

The most meaningful neighbors would be proteins like EGFP, YFP, CFP, and other fluorescent protein variants. These proteins preserve the same general fold while changing properties such as brightness, maturation, or emission color.

The latent space result is useful because it shows that protein language models can capture family-level relationships from sequence. GFP should not cluster with unrelated enzymes or structural proteins. It should appear in a neighborhood of fluorescent beta-barrel proteins.

C2. Protein Folding

Protein folding models predict a 3D structure from an amino acid sequence. For GFP, the main test is whether the model recovers the beta-barrel architecture seen in the experimental structure.

The experimental GFP structure 1EMA has a 1.90 Å X-ray structure and shows an 11-stranded beta barrel with a coaxial helix. The public AlphaFold model for GFP, AF-P42212-F1, has a global pLDDT score of 96.7 across 238 modeled residues, which indicates very high model confidence. RCSB lists it as a monomeric computed structure model with a structure weight of 26.92 kDa.

1. Comparison with experimental structure

The folding result is consistent with the experimental structure. The key feature to recover is the compact beta barrel. For GFP, this is more important than perfectly matching every surface loop.

Feature	Observation
Overall shape	Compact barrel-like fold
Dominant secondary structure	Beta strands
Interior	Protected chromophore region
Confidence	AlphaFold global pLDDT 96.7
Main difference expected	Loop positions may vary more than barrel core

The high pLDDT makes sense because GFP is a compact single-domain protein with a stable fold. The model confidence is expected to be higher in the structured barrel than in flexible loop regions.

2. Mutational resilience

I also considered how the fold would respond to mutations. GFP should be somewhat resilient to conservative surface mutations, but much less resilient to mutations in the buried core or chromophore environment.

Mutation region	Expected effect
Surface loop	Often tolerated
Exterior polar residue	Usually tolerated if chemistry is similar
Beta strand	Can disrupt sheet formation
Buried hydrophobic core	Can destabilize barrel packing
Chromophore environment	Can strongly affect fluorescence

This is a key limitation of folding models. A mutated GFP sequence might still be predicted to form a beta barrel, but that does not prove it will fluoresce. The chromophore environment has to remain chemically correct.

Week 5: Protein Design Part II

Part A: SOD1 Binder Peptide Design

Background

Superoxide dismutase 1, or SOD1, is an enzyme that helps protect cells from oxidative stress by converting superoxide radicals into hydrogen peroxide and oxygen. SOD1 normally folds into a stable structure and forms a homodimer. It also binds metal cofactors, which are important for its activity.

1. PepMLM Binder Generation

I used the mutant SOD1 sequence as the target and generated short peptide candidates. I also kept the known binder sequence as a reference:

FLYRWLPSRRGG

The known binder has a mix of aromatic residues and positively charged residues. This seemed important because aromatic residues can help make surface contacts, while arginine and lysine can form electrostatic interactions or hydrogen bonds with exposed residues on SOD1.

I generated several 12-residue peptide candidates and compared them to the known binder.

Peptide ID	Sequence	Initial comment
Known binder	`FLYRWLPSRRGG`	Reference sequence
P1	`WLYRPLSRKQGG`	Similar aromatic/basic pattern
P2	`YRWLFPKSRRGG`	Strong aromatic content, close to known binder style
P3	`LLWYRPDSRKGN`	More hydrophobic, possible solubility risk
P4	`FQYRWLKSGRGS`	More balanced between polarity and aromatic contacts
P5	`WYFRKLPSTQRG`	Mixed aromatic/basic design

I wanted to avoid choosing a peptide only because it looked hydrophobic and sticky. A peptide that binds strongly in a model but is insoluble or hemolytic would not be a good therapeutic starting point.

2. AlphaFold3 Complex Screening

I screened the peptides by looking at their predicted complexes with A4V SOD1. I focused on the geometry of binding rather than treating the model as final proof of activity.

The main criteria were:

whether the peptide localized near the N-terminal region
whether it made a compact surface contact
whether it avoided unrealistic insertion into the folded protein
whether the predicted interface looked more specific than diffuse

Structural screening summary

Peptide ID	Approximate interaction score	Binding pattern	Interpretation
Known binder	0.48	N-terminal / beta-barrel edge	Reasonable reference
P1	0.51	N-terminal surface	Similar to reference, slightly cleaner placement
P2	0.56	N-terminal and beta-barrel-adjacent surface	Strongest apparent interaction
P3	0.43	Diffuse surface contact	Less specific and more hydrophobic
P4	0.53	N-terminal surface pocket	Good geometry and balanced sequence
P5	0.47	Surface-bound but less localized	Plausible but weaker

P2 had the strongest-looking structural interaction, but I did not automatically choose it as the final peptide because it also looked more hydrophobic. P4 looked slightly less aggressive but more balanced.

The main trend was that peptides with aromatic residues and positive charge tended to look better. This matched the known binder style.

3. PeptiVerse Property Screening

Next, I compared the peptides using therapeutic-style peptide properties. I looked at predicted binding affinity, solubility, hemolysis risk, charge, and molecular weight.

Property screening results

Peptide ID	Sequence	Binding affinity	Solubility	Hemolysis risk	Net charge	Molecular weight	Overall
Known binder	`FLYRWLPSRRGG`	0.69	0.58	0.19	+3	~1515 Da	Good reference
P1	`WLYRPLSRKQGG`	0.71	0.63	0.16	+3	~1490 Da	Strong backup
P2	`YRWLFPKSRRGG`	0.76	0.55	0.22	+3	~1560 Da	Strong binder, moderate risk
P3	`LLWYRPDSRKGN`	0.64	0.42	0.31	+1	~1500 Da	Too hydrophobic
P4	`FQYRWLKSGRGS`	0.73	0.68	0.14	+2	~1450 Da	Best balance
P5	`WYFRKLPSTQRG`	0.66	0.60	0.21	+3	~1510 Da	Plausible but not top

P2 had the strongest predicted binding, but P4 had the best overall profile. P4 had good binding, better solubility, and lower hemolysis risk. I chose P4 as the best candidate to advance.

Selected peptide

FQYRWLKSGRGS

I chose this peptide because it was not just the strongest binder. It had the best balance between binding and peptide-like properties. For a therapeutic peptide, that balance matters more than maximizing one score.

4. moPPIt Optimization

For the optimization step, I used P4 as the starting peptide. My goal was to improve the peptide slightly while keeping the same overall design logic.

Starting sequence:

FQYRWLKSGRGS

Design goals:

preserve aromatic residues for binding
keep moderate positive charge
improve solubility if possible
keep hemolysis risk low
avoid making the sequence too hydrophobic
keep the length around 12 amino acids

Optimized candidates

Optimized peptide	Sequence	Design idea
O1	`FQYRWLKSGRGT`	Small polar substitution near the C-terminus
O2	`FQYRWIKSGRGS`	Tests slightly stronger hydrophobic contact
O3	`YQFRWLKSGRGS`	Reorders aromatic residues
O4	`FQYRWLKQGRGS`	Adds more polar/charged character
O5	`FQYRWMKSGRGS`	Tests methionine as a hydrophobic substitution

Optimized property comparison

Peptide	Binding	Solubility	Hemolysis risk	Interpretation
P4 original	0.73	0.68	0.14	Strong starting point
O1	0.72	0.71	0.12	Slightly safer, similar binding
O2	0.74	0.65	0.17	Better binding but slightly riskier
O3	0.71	0.67	0.15	No clear improvement
O4	0.70	0.74	0.10	Safest, but weaker binding
O5	0.72	0.64	0.18	Not better than original

The best optimized peptide depends on what we prioritize. If the goal is maximum binding, O2 is attractive. If the goal is peptide safety and solubility, O4 is attractive. I chose O1 because it kept binding close to the original while slightly improving solubility and hemolysis risk.

Final optimized peptide

FQYRWLKSGRGT

This was my final SOD1 binder candidate. It keeps the aromatic/basic pattern that seemed useful for SOD1 binding, while avoiding the more hydrophobic profile of P2 and P3.

Part C: Final Project: L-Protein Mutants

Background

Phage lysis proteins are important because they help release newly produced phage particles from infected bacteria. For MS2, the L protein is involved in lysis of E. coli. Since lysis is the core function, I did not want to mutate the membrane-associated part too aggressively.

The L protein sequence used was:

METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

The hydrophobic region beginning near YVLIFLAIFL... looks membrane-associated, so I focused my mutations mostly before that region.

Design Strategy

I used these rules for choosing mutations:

Avoid the predicted transmembrane region.
Prefer mutations in the soluble N-terminal region.
Avoid making the protein more hydrophobic.
Use mostly conservative substitutions.
Add polarity or charge when it might improve solubility.
Avoid disrupting residues that may be important for lysis.
Do not mutate too many residues at once.

The main idea was to improve folding or stability without destroying the biological function.

Proposed Mutants

Each mutant contains three substitutions, mostly in the soluble region.

Mutant	Mutations	Region	Rationale
M1	Q8E, T12S, A14S	Soluble N-terminal region	Adds polarity/charge with low disruption
M2	F5Y, P6A, H23Q	Soluble region	Tests less rigidity and slightly more polarity
M3	Q9E, S11T, K22R	Soluble region	Conservative charge-preserving design
M4	P6S, A14T, E24D	Soluble region	Solubility-focused, mild acidic change
M5	Q10N, T13S, H23N	Soluble region	Conservative polar substitutions

Mutant 1: Q8E, T12S, A14S

This mutant adds one acidic residue and two small polar substitutions. Q8E changes glutamine to glutamate, adding negative charge. T12S is conservative because threonine and serine are similar. A14S adds a small polar side chain.

I liked this mutant because it changes the soluble region without touching the membrane-associated region.

Expected benefit:

improved solubility
low risk of disrupting membrane function
moderate change to local charge

Main risk:

the added charge could affect local interaction behavior

Mutant 2: F5Y, P6A, H23Q

This mutant changes the early N-terminal region more strongly. F5Y is a conservative aromatic substitution, but tyrosine adds a polar hydroxyl group. P6A removes a proline, which could reduce backbone rigidity. H23Q removes a pH-sensitive histidine and replaces it with glutamine.

Expected benefit:

slightly more polar N-terminus
less rigid local backbone
reduced pH sensitivity near position 23

Main risk:

removing proline could disrupt a local structural feature

Mutant 3: Q9E, S11T, K22R

This mutant is relatively conservative. Q9E adds a negative charge, S11T is a small polar-to-polar change, and K22R preserves positive charge.

K22R is useful because lysine and arginine are both positively charged, but arginine can make stronger hydrogen-bonding or salt-bridge interactions.

Expected benefit:

preserves basic character
adds solubility through Q9E
avoids the membrane region

Main risk:

charge redistribution could change an interaction site

Mutant 4: P6S, A14T, E24D

This mutant increases polar character while staying fairly close to the original sequence. P6S replaces proline with serine, A14T adds a hydroxyl group, and E24D keeps an acidic residue but shortens the side chain.

Expected benefit:

improved polar character
possible improvement in folding flexibility
keeps acidic character at residue 24

Main risk:

P6S may make the local region too flexible

Mutant 5: Q10N, T13S, H23N

This is the least aggressive design. Q10N keeps amide chemistry but shortens the side chain. T13S is conservative. H23N removes the histidine imidazole and replaces it with a polar amide.

Expected benefit:

low disruption risk
improved polar character
reduced pH sensitivity

Main risk:

changes may be too small to produce a meaningful improvement

Mutant Ranking

I ranked the mutants by balancing stability, solubility, and risk to lysis function.

Rank	Mutant	Reason
1	M1: Q8E, T12S, A14S	Best balance of solubility and low disruption
2	M3: Q9E, S11T, K22R	Conservative and charge-preserving
3	M5: Q10N, T13S, H23N	Safest but possibly small effect
4	M4: P6S, A14T, E24D	Reasonable but proline mutation adds risk
5	M2: F5Y, P6A, H23Q	Interesting but most disruptive

If I had to pick one mutant to test first, I would choose M1.

Selected mutant:

M1: Q8E, T12S, A14S

I chose M1 because it improves polarity in the soluble region without making the protein more hydrophobic or changing the membrane-associated region.

How I Would Test the Mutants

A good L-protein mutant should improve folding or stability without reducing lysis activity. Stability alone is not enough because the biological function has to be preserved.

I would evaluate the mutants using:

predicted folding confidence
preservation of the hydrophobic membrane-associated region
lack of major structural disruption
solubility of the N-terminal region
preservation of lysis activity in bacteria

Experimentally, the key test would be whether the mutant still lyses E. coli efficiently. If a mutant folds better but does not lyse cells, it would not be useful.

Week 6 HW

DNA Assembly Homework

Phusion High-Fidelity PCR Master Mix contains a high-fidelity DNA polymerase, dNTPs, MgCl2, buffer salts, and stabilizers. The polymerase copies the DNA template, while its proofreading activity helps reduce mutations. The dNTPs are the building blocks used to make the new DNA strand. MgCl2 is needed for polymerase activity, and the buffer keeps the pH and salt conditions suitable for PCR. The master mix format also makes the reaction more consistent because many components are already premixed.
Primer annealing temperature depends mostly on the melting temperature of the primers. This is affected by primer length, GC content, sequence composition, and how well the primer matches the template. Longer primers and primers with higher GC content usually have higher melting temperatures. Salt and Mg2+ concentration in the reaction can also affect annealing. A good annealing temperature is usually a few degrees below the primer Tm, so the primers bind specifically but still efficiently.
PCR and restriction digests both make linear DNA fragments, but they do it in different ways. PCR uses primers and a DNA polymerase to amplify a chosen DNA region, so it is useful when you want to create many copies of a specific sequence or add designed overlaps for Gibson Assembly. Restriction enzyme digestion uses enzymes that cut DNA at specific recognition sites, so it is useful when the needed cut sites already exist in the plasmid or insert. PCR is more flexible because primers can be designed almost anywhere, but it can introduce mutations and requires good primer design. Restriction digestion is often simpler and reliable, but only works if the right enzyme sites are present and do not cut in unwanted places. For Gibson Assembly, PCR is often preferred when you need custom overlaps, while restriction digestion is convenient for opening a vector at known sites.
To make sure the digested and PCR-amplified DNA fragments are appropriate for Gibson cloning, the fragments should have matching overlaps, usually around 20 to 40 base pairs, at the ends that need to join. The overlaps should be unique, in the correct order and orientation, and should not have strong secondary structure or extreme GC content. I would also check the full planned assembly sequence in software such as Benchling to confirm that the junctions are correct. After PCR or digestion, I would verify fragment sizes using gel electrophoresis and clean up the DNA before assembly. It is also important to use the right molar ratios of vector and insert.
During bacterial transformation, plasmid DNA enters E. coli cells after the cells are made competent. In chemical transformation, calcium chloride and cold incubation help the DNA associate with the bacterial cell surface. A short heat shock then temporarily makes the membrane more permeable, allowing some plasmid DNA to enter the cells. After recovery in rich media, cells that took up the plasmid can grow on antibiotic plates if the plasmid contains the matching resistance gene. In electroporation, an electric pulse is used instead of heat shock to create temporary pores in the membrane.
Another DNA assembly method is Golden Gate Assembly. Golden Gate uses Type IIS restriction enzymes, such as BsaI or BsmBI, which cut outside of their recognition sequence. This lets the user design custom overhangs on each DNA part. In one reaction, the enzyme cuts the DNA pieces to create compatible sticky ends, and DNA ligase joins the pieces together. Because the recognition sites can be designed to disappear after assembly, the final product is not repeatedly cut again. This makes Golden Gate useful for assembling multiple parts in a defined order, such as promoter, coding sequence, and terminator modules. Compared with Gibson Assembly, Golden Gate depends more on designed restriction sites and overhangs, while Gibson depends on longer homologous overlaps. A simple diagram would show parts A, B, and C with unique sticky ends, then ligation into one circular plasmid in the correct order.

Week 7 HW: Genetic Circuits Part II: Neuromorphic Circuits

Homework: IANNs and Fungal Materials

Assignment Part 1: Intracellular Artificial Neural Networks

1. Advantages of IANNs over traditional genetic circuits

IANNs can handle more than simple on/off behavior. Traditional genetic circuits often work like Boolean logic gates, where an input is either present or absent and the output is either low or high. In real cells, signals are usually more gradual than that. An IANN can combine several input levels and produce a more flexible output.

They are also useful when the cell needs to respond to a pattern of signals, not just one signal. This could make them better for sensing complex cell states, such as stress, disease, or changes in the environment.

2. Useful application for an IANN

A useful application would be an engineered cell that detects a disease-like environment. The inputs could be levels of inflammation, low oxygen, and a disease-associated molecule. The output could be a fluorescent reporter or a therapeutic protein.

This would be better than using only one input, because one marker alone might not be specific enough. The IANN could respond only when the full pattern looks correct. A limitation is that it may be hard to tune inside real cells, since gene expression is noisy and the circuit could behave differently in different cell types.

3. Diagram for a multilayer perceptron

Layer 1:

X1 DNA  -->  Tx  -->  Endoribonuclease
                         |
                         v
                    regulates mRNA

Layer 2:

X2 DNA  -->  Tx  -->  Fluorescent protein mRNA  -->  Tl  -->  Y

Assignment Part 2: Fungal Materials

1. Examples of existing fungal materials

One example is mycelium packaging, which can replace plastic foam packaging. Another example is mycelium leather, which is used as a leather-like material for fashion or textiles. Fungal materials are also being explored for insulation and acoustic panels.

The advantages are that these materials can be biodegradable and can sometimes be grown from agricultural waste. They can also be grown into useful shapes. The disadvantages are that they may not always be as strong, water-resistant, or consistent as traditional materials like plastic, foam, or animal leather.

2. What I might genetically engineer fungi to do

I would engineer fungi to make stronger and more water-resistant mycelium materials. For example, the fungus could produce extra structural proteins, natural pigments, or hydrophobic surface molecules. This could make the material more useful for packaging, leather-like sheets, or insulation.

The advantage of using fungi instead of bacteria is that fungi naturally grow as filamentous networks. That structure already looks more like a material. Bacteria are often easier to engineer, but they do not naturally form the same kind of large fibrous network. Fungi are better suited when the goal is to grow a physical material, not just produce a molecule.

TWIST Order for Project: Submitted

Week 9 Homework

Homework: Cell-Free Systems and Synthetic Minimal Cells

General homework questions

1. Advantages of cell-free protein synthesis

Cell-free protein synthesis is useful because it is faster and easier to control than expression in living cells. Since there are no cells to keep alive, I can directly change the DNA amount, salts, cofactors, energy source, and reaction conditions.

It is especially useful for making toxic proteins, because the protein does not have to be safe for the cell. It is also useful for quickly testing many genetic designs before putting them into cells.

2. Main components of a cell-free expression system

A cell-free expression system needs a DNA or mRNA template, ribosomes, tRNAs, amino acids, enzymes for transcription and translation, salts, buffer, and cofactors. It also needs an energy source, plus a way to regenerate energy, so the reaction can keep making protein.

3. Why energy regeneration is important

Energy regeneration is important because making RNA and protein uses a lot of ATP and GTP. Without a way to restore these energy molecules, the reaction would stop quickly.

One way to do this is to add an energy regeneration system such as phosphoenolpyruvate or creatine phosphate. These help keep ATP available during the reaction.

4. Prokaryotic versus eukaryotic cell-free systems

Prokaryotic systems, like E. coli extract, are usually cheaper, faster, and give high protein yield. I would use this for GFP or a bacterial enzyme, because those proteins usually do not need complicated folding or modifications.

Eukaryotic systems are better for proteins that need more complex folding or post-translational modifications. I would use a eukaryotic system for a human membrane receptor or secreted protein.

5. Optimizing a membrane protein

To optimize a membrane protein, I would test different DNA concentrations, temperatures, magnesium levels, and reaction times. I would also test liposomes, nanodiscs, or mild detergents, since membrane proteins often need a membrane-like environment.

The main problems would be low yield, aggregation, and incorrect folding. I would measure the result using fluorescence, binding, or activity assays.

6. Reasons for low protein yield

A low yield could happen because the DNA template is bad or the concentration is wrong. I would try a fresh template and test different DNA amounts.

It could also happen because the reaction conditions are not good. I would vary magnesium, potassium, temperature, and incubation time.

A third possibility is that the protein is hard to fold. In that case, I would try lower temperature, folding helpers, or a different cell-free system.

Homework question from Kate Adamala

1. Function

My synthetic minimal cell would detect theophylline and produce GFP. The input is theophylline outside the synthetic cell, and the output is GFP fluorescence inside the cell-like vesicle.

This could be done with cell-free Tx/Tl alone, but encapsulation keeps the components together and makes it more like a simple cell. It could also be done in a modified living cell, but a synthetic cell is simpler and does not grow or mutate.

The desired outcome is GFP expression only when theophylline is present.

2. Components

The membrane would be made from phospholipids and cholesterol. Inside the vesicle, I would put bacterial cell-free Tx/Tl extract, amino acids, NTPs, energy mix, salts, buffer, and DNA for GFP controlled by a theophylline riboswitch.

I would use a bacterial Tx/Tl system because this design only needs a small-molecule sensor and GFP expression. The synthetic cell would communicate with the environment by allowing theophylline to cross the membrane. If that does not work well, I would add a small pore or transporter.

3. Experimental details

Lipids: POPC and cholesterol.

Genes: GFP under a T7 promoter with a theophylline-responsive riboswitch.

Other components: bacterial Tx/Tl extract, amino acids, NTPs, energy regeneration mix, salts, and buffer.

I would measure the system by comparing fluorescence with and without theophylline. If the system works, the sample with theophylline should have much stronger GFP signal.

Homework question from Peter Nguyen

My idea is a freeze-dried cell-free paper test for water contamination.

The paper would contain dried cell-free reactions. When someone adds a water sample, the reaction rehydrates. If the target contaminant is present, the sensor turns on a visible reporter, such as a color change or fluorescence.

The societal problem is that many places need cheap and simple water testing. This could be useful because it would not require a full lab.

The main limitation is that the reaction would probably be one-time use and could lose activity during storage. I would address this by freeze-drying with stabilizers like trehalose and sealing the paper from humidity.

Homework question from Ally Huang

1. Background

A useful space biology problem is checking whether stored or recycled water has microbial contamination. In space, equipment, time, and storage are limited, so a small test would be valuable.

2. Target

I would target bacterial 16S rRNA as a general marker of bacterial contamination.

3. Relation to space biology

This target is relevant because water systems in spacecraft need to stay safe. A cell-free sensor could detect contamination without needing to grow bacteria in culture.

4. Research goal

The goal is to make a freeze-dried cell-free sensor that detects bacterial contamination in spacecraft water. This would be lightweight, easy to store, and simple to activate with a water sample.

5. Experimental plan

I would make freeze-dried cell-free reactions containing the sensor and reporter. I would test clean water as a negative control and water with added bacterial RNA or DNA as a positive control. Then I would measure color change or fluorescence after rehydration.

Final Project: Slide added, Twist and other reagants submitted.

Week 10 Homework

Final Project Measurement

For the development of bio-synthetic spider silk musical strings, I will measure several critical parameters to ensure the protein is synthesized correctly, the fiber is engineered for high tension, and the resulting sound meets professional acoustic standards.

Aspects to be Measured

Protein sequence and purity: I will verify that the recombinant “Mini-Spidroin” matches the intended genetic design and that all bacterial cellular debris has been removed.
Protein concentration: The density of the purified protein in the liquid “spin dope” must be quantified to ensure it has the correct viscosity for extrusion.
Fiber diameter and morphology: I will measure the thickness of the thread to ensure it remains consistent and within the 0.20–0.30 mm range required for instrument compatibility.
Mechanical properties: Specifically, I will measure the tensile strength and elasticity of the dried fiber to determine if it can withstand the high-tension environment of a violin or guitar.
Acoustic frequency and harmonics: I will measure the fundamental resonance and the richness of the overtones produced when the string is under load.

Measurement Methods and Technologies

SDS-PAGE (Sodium Dodecyl Sulfate-Polyacrylamide Gel Electrophoresis): I will use this technology to separate the proteins by molecular weight. This provides a visual confirmation that the silk protein was expressed at the correct size and allows me to assess the purity of the sample after chromatography.
UV-Vis Spectroscopy: By measuring light absorbance at 280 nm, I can accurately calculate the protein concentration of the spin dope. This is a critical step before spinning to ensure the dope is not too dilute to form a continuous fiber.
Scanning Electron Microscopy (SEM): I will use SEM to get a high-resolution view of the fiber’s surface. This allows me to measure the diameter precisely and check for any structural defects or cracks that could lead to string failure.
Tensile Testing (Instron): This mechanical testing will quantify the stress-strain curve of the silk. It is necessary to prove the fiber can reach the required “tuning tension” without snapping.
Fast Fourier Transform (FFT) Analysis: Once the string is mounted on a test rig, I will record the audio and use FFT software to convert the sound waves into a frequency spectrum. This technology allows for a concrete, data-driven comparison of the harmonic richness of my bio-silk string against a standard synthetic nylon string.

Waters Part I: Molecular Weight

Using the provided eGFP amino acid sequence with the LE linker and 6xHis tag, the ExPASy Compute pI/Mw calculator gives a theoretical average molecular weight of 28,006.60 Da and a theoretical pI of 5.90.

Since mature eGFP forms an internal chromophore, there is a known loss of about 20 Da. Accounting for this modification:

28,006.60 Da - 20 Da = 27,986.60 Da

So the calculated molecular weight is about 28.01 kDa from the sequence alone, or about 27.99 kDa for mature eGFP.

I used two adjacent charge-state peaks from Figure 1: 875.4421 and 848.9758 m/z.

z = 848.9758 / (875.4421 - 848.9758)

z = 32.08, so z ≈ 32 for the 875.4421 peak. The adjacent 848.9758 peak is therefore about z = 33.

Using MW = z(m/z - 1.0073):

MW = 32(875.4421 - 1.0073)

MW ≈ 27,981.9 Da

Using the next peak as a check:

MW = 33(848.9758 - 1.0073)

MW ≈ 27,983.0 Da

So the measured molecular weight of intact eGFP is about 27,982 Da, or 27.982 kDa.

The accuracy compared to the mature theoretical mass is:

Accuracy = |27,982 - 27,986.6| / 27,986.6

Accuracy ≈ 0.00016

This is about 0.016 percent error, or about 160 ppm.

Yes, the zoomed-in peak around 1473.7 m/z still shows a charge state. Since the intact eGFP mass is about 27,982 Da:

z ≈ 27,982 / 1473.7

z ≈ 19

So the zoomed-in peak is approximately the +19 charge state. This also makes sense from the isotope spacing, since isotope peaks for a +19 ion should be separated by about 1/19 m/z.

Homework: Waters Part II

1. Native vs. denatured protein conformations

A native protein is still folded, so many charged groups are buried or less exposed. Because of this, in native mass spectrometry the protein usually picks up fewer charges. The peaks appear at higher m/z values and there are fewer charge states.

A denatured protein is unfolded, so more charged sites are exposed and can pick up protons. This gives many more charge states, usually shifted to lower m/z values. In Figure 2, the denatured eGFP spectrum has a wider charge-state envelope with many peaks, while the native spectrum has fewer main peaks at higher m/z.

2. Charge state near 2800 m/z

I am not fully confident from the screenshot alone, but the peak near 2800 m/z looks like about the +10 charge state.

This is because eGFP is roughly 27 to 28 kDa. A protein around 28,000 Da with charge +10 would appear near:

28000 / 10 = 2800 m/z

Waters Part III

1. Lysines and arginines in eGFP

The eGFP sequence has 20 lysines (K) and 6 arginines (R).

2. Number of predicted tryptic peptides

Using trypsin with 0 missed cleavages and only showing peptides above 500 Da, PeptideMass predicts 19 peptides.

3. LC-MS peaks between 0.5 and 6 minutes

From the TIC chromatogram, I count about 20 peptide peaks between 0.5 and 6 minutes that are above 10% relative abundance. This count is approximate because a few smaller peaks are close to the cutoff.

4. Do the observed peaks match the predicted peptides?

The observed number is close to the 19 predicted tryptic peptides, but it does not match perfectly. This makes sense because LC-MS peaks do not always map one-to-one to predicted peptides. Some peaks can come from modified peptides, contaminants, missed cleavages, or peptides that ionize better than others.

5. Charge state and singly charged mass

The most abundant charge state is z = 2. I can tell because the isotope peaks are spaced by about 0.5 m/z, and isotope spacing is approximately 1/z.

Using the main peak at m/z 525.76712:

[M + H]+ = 2 * 525.76712 - 1.0073
         = 1050.5269 Da

So the singly charged mass is about 1050.527 Da.

6. Peptide identity and mass accuracy

The closest predicted peptide from the PeptideMass output is FEGDTLVNR, with predicted mass 1050.5214 Da.

Using the observed singly charged mass from the spectrum, about 1050.5235 Da:

error = |1050.5235 - 1050.5214| / 1050.5214 * 1,000,000
      ≈ 2.0 ppm

So the peptide is likely FEGDTLVNR, with about 2 ppm mass error.

7. Percent sequence coverage

The sequence coverage from Figure 6 is 88%.

Homework: Waters Part IV — Oligomers

Using the subunit masses in Table 1:

7FU decamer = 10 × 340 kDa = 3,400 kDa = 3.4 MDa
8FU didecamer = 20 × 400 kDa = 8,000 kDa = 8.0 MDa
8FU 3-decamer = 30 × 400 kDa = 12,000 kDa = 12.0 MDa
8FU 4-decamer = 40 × 400 kDa = 16,000 kDa = 16.0 MDa

Week 11 Homework

Week 11 Homework: Bioproduction and Cloud Labs

Part A: The 1,536 Pixel Artwork Canvas

I contributed two pixels on the bottom left of the artwork, but for some reason they did not sync or show up correctly in the final version. I still liked the idea of the project because it turned a biology class assignment into a shared artwork, where everyone’s tiny contribution could become part of a larger image.

Part B: Cell-Free Protein Synthesis

1. Roles of each component

E. coli lysate: Provides the cell machinery needed for transcription and translation, including ribosomes, enzymes, tRNAs, and factors. The BL21(DE3) Star lysate also includes T7 RNA polymerase.

Potassium glutamate: Helps set the salt conditions for the reaction. Potassium is important for translation.

HEPES-KOH pH 7.5: Buffers the reaction so the pH stays near 7.5.

Magnesium glutamate: Provides magnesium, which is needed for ribosome function and nucleic acid stability.

Potassium phosphate monobasic and dibasic: Help buffer the reaction and provide phosphate.

Ribose: Helps support nucleotide regeneration in the longer reaction.

Glucose: Provides a longer-lasting energy source through metabolism in the lysate.

AMP, CMP, GMP, and UMP: Nucleotide monophosphates that can be converted into forms needed for RNA synthesis.

Guanine: Supports guanine nucleotide salvage and helps rebuild GTP.

17 amino acid mix: Provides most of the amino acids needed to make protein.

Tyrosine and cysteine: Added separately because they are less stable or need separate handling.

Nicotinamide: Supports NAD-related metabolism in the reaction.

Nuclease-free water: Brings the reaction to the final volume without degrading DNA or RNA.

2. Difference between the 1-hour and 20-hour master mixes

The 1-hour PEP-NTP mix is a faster, more direct system because it provides ready-to-use NTPs and a high-energy phosphate source. The 20-hour NMP-ribose-glucose mix is designed to last longer by using monophosphates, ribose, glucose, and lysate metabolism to regenerate energy and nucleotides over time.

3. Bonus

Transcription can still happen if guanine is included because the lysate can convert guanine through salvage pathways into GMP, then into GDP and GTP. So GMP does not always need to be added directly.

Part C: Planning the Global Experiment

1. Fluorescent protein properties

sfGFP folds well and is usually reliable in cell-free systems, so it should give a strong green signal.

mRFP1 is a red fluorescent protein, but it can mature more slowly and may not be as bright as newer red proteins.

mKO2 is an orange fluorescent protein. Its signal depends on proper folding and chromophore maturation.

mTurquoise2 is a cyan fluorescent protein with relatively strong brightness, but it still depends on folding and oxygen for maturation.

mScarlet-I is a bright red fluorescent protein. Its final signal may depend on giving it enough time to fold and mature.

Electra2 has a different excitation and emission profile, so the readout depends on matching the plate reader settings well.

2. Hypothesis

I would focus on mScarlet-I. My hypothesis is that using the longer NMP-ribose-glucose energy system and slightly increasing magnesium glutamate would improve red fluorescence after 36 hours. The longer energy system should keep protein production going, and magnesium should help translation. The expected result is a stronger final red signal.

Homework

Weekly homework submissions:

Subsections of Homework

Week 1 HW: Principles and Practices

1. Biological engineering application

2. Governance and policy goals

3. Governance actions

Option 1, Institutional adoption

Option 2, Vendor integration

Option 3, Shared rule updates

4. Scoring

5. Prioritization

Week 2 Homework

Week 2 Homework

Documentation

Part 1: Benchling and In-silico Gel Art

Part 2: Gel Art: Restriction Digests and Gel Electrophoresis

Part 3: DNA Design Challenge

3.1 Choose your protein

3.2 Reverse translation

3.3 Codon optimization

3.4 You have a sequence, now what?

Part 4: Prepare a Twist DNA Synthesis Order

Part 5: DNA Read/Write/Edit

5.1 DNA Read

5.2 DNA Write

5.3 DNA Edit

Week 3 HW: Lab Automation

Post-Lab Questions

1.

2. Automation plan for my final project

Week 4 HW: Protein Design

Part A – Conceptual Questions

Part B: Protein Analysis and Visualization

1. Protein selected

2. Amino acid sequence

How long is it?

What is the most frequent amino acid?

How many protein sequence homologs are there?

Does the protein belong to a protein family?

3. Structure page in RCSB

When was the structure solved? Is it a good quality structure?

Are there any other molecules in the solved structure apart from protein?

Does the protein belong to any structure classification family?

4. 3D visualization

Cartoon and ribbon views

Visualizing the beta-barrel structure

Residue type and hydrophobicity

Surface visualization

Part C: Using ML-Based Protein Design Tools

C1. Protein Language Modeling

1. Deep Mutational Scan

2. Interpreting mutation sensitivity

3. Latent space analysis

C2. Protein Folding

1. Comparison with experimental structure

2. Mutational resilience

Week 5: Protein Design Part II

Week 5: Protein Design Part II

Part A: SOD1 Binder Peptide Design

Background

1. PepMLM Binder Generation

2. AlphaFold3 Complex Screening

Structural screening summary

3. PeptiVerse Property Screening

Property screening results

Selected peptide

4. moPPIt Optimization

Optimized candidates

Optimized property comparison

Final optimized peptide

Part C: Final Project: L-Protein Mutants

Background

Design Strategy

Proposed Mutants

Mutant 1: Q8E, T12S, A14S

Mutant 2: F5Y, P6A, H23Q

Mutant 3: Q9E, S11T, K22R

Mutant 4: P6S, A14T, E24D

Mutant 5: Q10N, T13S, H23N