Subsections of ANDREA CARRILLO — HTGAA Spring 2026

Homework

Weekly homework submissions:

  • Week 1 HW: Principles and Practices

    First, describe a biological engineering application or tool you want to develop and why. I am interested in developing a biological engineering approach that uses living organisms to help us understand and preserve archaeological materials and sites. Specifically, I want to explore how microorganisms could be used to study how materials such as stone, soil, or ceramics change over time, or how biological growth can be guided to protect fragile archaeological surfaces.

  • Week 2 HW: DNA Read, Write, & Edit

    Part 1: Benchling & In-silico Gel Art Create a pattern/image in the style of Paul Vanouse’s Latent Figure Protocol artworks. Part 2: Gel Art - Restriction Digests and Gel Electrophoresis Optional for Committed Listeners Part 3: DNA Design Challenge 3.1 Choose your protein

  • Week 3 HW: Lab Automation

    Assignment: Python Script for Opentrons Artwork This design is inspired by traditional Inca geometric art, particularly the tocapu textile patterns of the Inca Empire. The composition features a symmetrical stepped cross motif enclosed within a square, referencing the Andean worldview and the symbolic structure of the Chakana (Andean cross). The use of straight lines and geometric repetition reflects the mathematical precision and cosmological symbolism characteristic of Inca visual culture. (https://opentrons-art.rcdonovan.com/?id=6ef1d0494o5n1p7)

  • Week 4 HW: Protein Design Part I

    Part A. Conceptual Questions Answer any NINE of the following questions from Shuguang Zhang: (i.e. you can select two to skip)

  1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons) To estimate how many amino acid molecules are in 500 grams of meat, we start with the fact that the average mass of one amino acid is about 100 Daltons.

Subsections of Homework

Week 1 HW: Principles and Practices

cover image cover image
  1. First, describe a biological engineering application or tool you want to develop and why.

I am interested in developing a biological engineering approach that uses living organisms to help us understand and preserve archaeological materials and sites. Specifically, I want to explore how microorganisms could be used to study how materials such as stone, soil, or ceramics change over time, or how biological growth can be guided to protect fragile archaeological surfaces.

This idea is interesting to me because archaeological materials are shaped by long-term interactions between the environment and living systems. Instead of seeing biology only as a source of damage, I am curious about how biological processes could become a tool for analysis or conservation. For an HTGAA project, I want to explore how growth, decay, and environmental conditions can be treated as design variables to better understand the past and develop new, more sustainable conservation methods.

  1. Next, describe one or more governance/policy goals related to ensuring that this application or tool contributes to an “ethical” future, like ensuring non-malfeasance (preventing harm). Break big goals down into two or more specific sub-goals.

One key governance and policy goal for this application is to ensure that biological tools used in archaeological contexts do not cause harm to people, sites, or cultural heritage. Because archaeological materials are fragile and often irreplaceable, it is important that biological interventions are carefully controlled and ethically guided.

A first sub-goal is non-malfeasance, meaning preventing physical or biological damage. This includes ensuring that any microorganisms used cannot spread uncontrollably, alter archaeological materials in irreversible ways, or disrupt surrounding ecosystems. Strict containment, reversibility, and testing protocols would be essential before any real-world application.

A second sub-goal is cultural and community respect. Archaeological sites are often connected to living communities and cultural identities. Governance frameworks should ensure that local stakeholders are informed, consulted, and involved in decisions about the use of biological technologies on heritage sites. This helps prevent extractive or colonial practices and supports ethical collaboration.

A third sub-goal is responsible knowledge use. Research outcomes, data, and tools should be shared transparently for conservation and educational purposes, while avoiding misuse, commercialization without consent, or applications that prioritize novelty over preservation. Together, these goals help ensure that biological engineering contributes to an ethical, respectful, and sustainable future for archaeology.

  1. Next, describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”)

Action 1 : Ethical Review Requirement for Biological Interventions in Archaeology

(New rule / requirement — actors: universities, museums, heritage authorities)

  • Purpose Currently, ethical review processes mainly focus on research involving humans, while biological interventions on archaeological sites are often evaluated only for scientific merit. I propose creating a specific ethical review requirement for any use of living organisms in archaeological contexts, focused on protecting sites, materials, and surrounding ecosystems.

  • Design This action would require interdisciplinary review committees including archaeologists, biologists, conservation experts, and ethicists. Universities, museums, and heritage authorities would require approval from these committees before allowing biological tools to be tested or deployed at archaeological sites. Researchers would opt in by agreeing to this process as a condition of site access.

  • Assumptions This proposal assumes that such committees can be formed with sufficient expertise and that ethical review will meaningfully guide research rather than becoming a purely bureaucratic step. It also assumes researchers will accept additional oversight.

  • Risks of Failure & “Success” This approach could fail if reviews become symbolic or overly slow, discouraging exploratory research. If highly successful, it could unintentionally favor large institutions with more resources, making it harder for smaller or community-based projects to participate.

Action 2: Incentives for Reversible and Low-Risk Biological Methods

(Incentive — actors: funding agencies, research sponsors)

  • Purpose At present, research funding often prioritizes novelty and impact over safety, reversibility, or long-term risk. I propose creating funding incentives that prioritize biological methods which are reversible, low-risk, and environmentally contained when used in archaeological contexts.

  • Design Funding agencies and foundations would include ethical and safety criteria in grant calls, explicitly rewarding projects that minimize ecological and cultural risk. Researchers would voluntarily design projects to meet these criteria, and reviewers would need guidance on how to evaluate risk and reversibility alongside scientific merit.

  • Assumptions This proposal assumes that funding incentives can meaningfully influence research behavior and that risk can be reasonably assessed in advance. It also assumes that safer approaches will still allow for meaningful scientific insight.

  • Risks of Failure & “Success” The action may fail if incentives are too weak or applied superficially. If overly successful, it could discourage more experimental or unconventional approaches, potentially slowing innovation in the field.

Action 3: Community Co-Governance of Bio-Archaeological Applications

(Governance strategy — actors: local communities, researchers, heritage organizations)

  • Purpose Decisions about technological interventions at archaeological sites are often made by researchers or institutions, with limited involvement from local or descendant communities. I propose a co-governance approach in which communities connected to archaeological sites participate directly in decisions about the use of biological tools.

  • Design This would involve early consultation processes, accessible communication (non-technical language), and shared decision-making authority. Researchers and institutions would need to allocate time and resources to support meaningful participation and be willing to adapt or halt projects based on community input.

  • Assumptions This approach assumes that communities wish to participate, that diverse perspectives can be reconciled, and that scientific and local knowledge can productively inform each other.

  • Risks of Failure & “Success” Co-governance could fail if participation is symbolic rather than meaningful or if internal conflicts arise. If highly successful, it may slow down research or limit certain projects, but this may be an acceptable trade-off in contexts involving irreversible cultural heritage.

  1. Next, score (from 1-3 with, 1 as the best, or n/a) each of your governance actions against your rubric of policy goals. The following is one framework but feel free to make your own:
Does the option:Option 1Option 2Option 3
Enhance Biosecurity
• By preventing incidents123
• By helping respond213
Foster Lab Safety
• By preventing incident12n/a
• By helping respond21n/a
Protect the environment
• By preventing incidents112
• By helping respond221
Other considerations
• Minimizing costs and burdens to stakeholders312
• Feasibility?123
• Not impede research312
• Promote constructive applications211
  1. Last, drawing upon this scoring, describe which governance option, or combination of options, you would prioritize, and why. Outline any trade-offs you considered as well as assumptions and uncertainties.

Based on the scoring, I would prioritize a combination of Option 1 (Ethical Review Requirement) and Option 2 (Incentives for Reversible and Low-Risk Methods). This recommendation is directed to international organizations such as UNESCO and the United Nations, which play a key role in setting global norms for cultural heritage protection and emerging technologies.

Option 1 should function as a global baseline. It scores highest in preventing biosecurity, lab safety, and environmental harm, which is especially important for archaeological sites that are fragile and irreversible. An international ethical review framework, supported by UNESCO and the UN, could guide national and local authorities while allowing for contextual adaptation. The main trade-off is the potential increase in administrative complexity and slower research approval processes.

Option 2 should complement this baseline by encouraging safer and reversible biological methods through funding priorities and international research programs. This incentive-based approach preserves flexibility and innovation while reinforcing ethical behavior.

Option 3 (Community Co-Governance) should be promoted by UNESCO and the UN as a guiding principle, particularly in culturally sensitive contexts. While it may reduce speed and scalability, it strengthens legitimacy, equity, and long-term trust in the governance of biological tools applied to archaeology.

This approach assumes that UNESCO and the UN can influence national policies through standards, funding, and guidance, but there is uncertainty around consistent adoption and enforcement across different regions.

                       - ASSIGNMENT(Week 2 Lecture Prep) -

Homework Questions from Professor Jacobson:

  1. Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?

DNA polymerase is the enzyme that copies DNA. It is very accurate, but not perfect:

  • It makes about 1 mistake in every 10 million DNA letters.

  • With correction systems, the final error rate is about 1 mistake in 1 billion letters.

The human genome has about 3 billion DNA letters, so without correction there would be many mistakes every time DNA is copied. How does biology fix this?

  • DNA polymerase checks its own work (proofreading).

  • Cells have repair systems that fix mistakes.

  • Harmful mutations are reduced over time by natural selection.

  1. How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?

Because the genetic code is made of three-letter codons and there are 64 possible codons but only 20 amino acids, most amino acids can be encoded by more than one codon. This means that the same protein can be written in many different DNA sequences. For an average human protein, the number of possible DNA sequences that could code for it is extremely large.

However, in practice, not all of these DNA sequences work equally well. Some codons are translated more efficiently in human cells, while others slow down protein production. Certain DNA sequences can form mRNA structures that interfere with translation or make the mRNA unstable. In addition, some sequences can disrupt regulatory signals or affect how the protein folds during synthesis. As a result, only a subset of possible DNA codes is actually effective for producing the desired protein in cells.

Homework Questions from Dr. LeProust:

  1. What’s the most commonly used method for oligo synthesis currently?

Oligos are made using a chemical method where DNA is built one letter at a time on a solid surface. This method is called phosphoramidite synthesis, and it is the standard method used today.

  1. Why is it difficult to make oligos longer than 200nt via direct synthesis?

Each time a new DNA letter is added, there is a small chance of error.When the oligo gets longer, these small errors add up. Because of this:

  • Many DNA strands end up incomplete
  • Many have mistakes
  • Very few are perfect full-length oligos

After about 200 nucleotides, the number of correct oligos becomes very low.

  1. Why can’t you make a 2000bp gene via direct oligo synthesis?

To make a 2000 bp gene, you would need to add 2000 DNA letters in a row. With so many steps, almost every DNA molecule will contain errors. So, direct synthesis does not work for long genes. Instead, scientists make short oligos and then join them together to build long genes.

Homework Question from George Church:

  1. What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?

Animals need 20 amino acids to make proteins, but they cannot make all of them. The 10 essential amino acids (they must come from the diet) are: Histidine, Isoleucine, Leucine, Lysine, Methionine, Phenylalanine, Threonine, Tryptophan, Valine, and Arginine (Arginine is essential especially during growth.)

Because animals cannot make these amino acids themselves, they depend on their food to get them.

The “Lysine Contingency” refers to the idea that lysine availability strongly limits animal growth, especially because lysine is often low in plant-based foods. Since lysine is essential and cannot be synthesized by animals, a lack of lysine can directly restrict protein synthesis and growth. This highlights how animal biology is dependent on plants and microbes, which can make lysine. It supports the idea that animal evolution and nutrition are constrained by the availability of lysine in the environment.

Week 2 HW: DNA Read, Write, & Edit

Part 1: Benchling & In-silico Gel Art

Create a pattern/image in the style of Paul Vanouse’s Latent Figure Protocol artworks.

Part 2: Gel Art - Restriction Digests and Gel Electrophoresis

Optional for Committed Listeners

Part 3: DNA Design Challenge

3.1 Choose your protein

For this assignment, I chose Collagen Type I (alpha 1 chain). I selected this protein because collagen is the main structural protein found in bone, teeth, and connective tissues. In archaeology, collagen is extremely important because it can survive for thousands of years in skeletal remains and artifacts made from bone or leather. It is widely used in radiocarbon dating and paleoproteomics to identify species and study ancient diets. Since I am interested in archaeology, this protein connects molecular biology with archaeological research.

Using UniProt, I obtained the protein sequence for human Collagen Type I alpha 1 chain (COL7A1).

sp|Q02388-1|CO7A1_HUMAN Isoform 1 of Collagen alpha-1(VII) chain OS=Homo sapiens OX=9606 GN=COL7A1 MTLRLLVAALCAGILAEAPRVRAQHRERVTCTRLYAADIVFLLDGSSSIGRSNFREVRSF LEGLVLPFSGAASAQGVRFATVQYSDDPRTEFGLDALGSGGDVIRAIRELSYKGGNTRTG AAILHVADHVFLPQLARPGVPKVCILITDGKSQDLVDTAAQRLKGQGVKLFAVGIKNADP EELKRVASQPTSDFFFFVNDFSILRTLLPLVSRRVCTTAGGVPVTRPPDDSTSAPRDLVL SEPSSQSLRVQWTAASGPVTGYKVQYTPLTGLGQPLPSERQEVNVPAGETSVRLRGLRPL TEYQVTVIALYANSIGEAVSGTARTTALEGPELTIQNTTAHSLLVAWRSVPGATGYRVTW RVLSGGPTQQQELGPGQGSVLLRDLEPGTDYEVTVSTLFGRSVGPATSLMARTDASVEQT LRPVILGPTSILLSWNLVPEARGYRLEWRRETGLEPPQKVVLPSDVTRYQLDGLQPGTEY RLTLYTLLEGHEVATPATVVPTGPELPVSPVTDLQATELPGQRVRVSWSPVPGATQYRII VRSTQGVERTLVLPGSQTAFDLDDVQAGLSYTVRVSARVGPREGSASVLTVRREPETPLA VPGLRVVVSDATRVRVAWGPVPGASGFRISWSTGSGPESSQTLPPDSTATDITGLQPGTT YQVAVSVLRGREEGPAAVIVARTDPLGPVRTVHVTQASSSSVTITWTRVPGATGYRVSWH SAHGPEKSQLVSGEATVAELDGLEPDTEYTVHVRAHVAGVDGPPASVVVRTAPEPVGRVS RLQILNASSDVLRITWVGVTGATAYRLAWGRSEGGPMRHQILPGNTDSAEIRGLEGGVSY SVRVTALVGDREGTPVSIVVTTPPEAPPALGTLHVVQRGEHSLRLRWEPVPRAQGFLLHW QPEGGQEQSRVLGPELSSYHLDGLEPATQYRVRLSVLGPAGEGPSAEVTARTESPRVPSI ELRVVDTSIDSVTLAWTPVSRASSYILSWRPLRGPGQEVPGSPQTLPGISSSQRVTGLEP GVSYIFSLTPVLDGVRGPEASVTQTPVCPRGLADVVFLPHATQDNAHRAEATRRVLERLV LALGPLGPQAVQVGLLSYSHRPSPLFPLNGSHDLGIILQRIRDMPYMDPSGNNLGTAVVT AHRYMLAPDAPGRRQHVPGVMVLLVDEPLRGDIFSPIREAQASGLNVVMLGMAGADPEQL RRLAPGMDSVQTFFAVDDGPSLDQAVSGLATALCQASFTTQPRPEPCPVYCPKGQKGEPG EMGLRGQVGPPGDPGLPGRTGAPGPQGPPGSATAKGERGFPGADGRPGSPGRAGNPGTPG APGLKGSPGLPGPRGDPGERGPRGPKGEPGAPGQVIGGEGPGLPGRKGDPGPSGPPGPRG PLGDPGPRGPPGLPGTAMKGDKGDRGERGPPGPGEGGIAPGEPGLPGLPGSPGPQGPVGP PGKKGEKGDSEDGAPGLPGQPGSPGEQGPRGPPGAIGPKGDRGFPGPLGEAGEKGERGPP GPAGSRGLPGVAGRPGAKGPEGPPGPTGRQGEKGEPGRPGDPAVVGPAVAGPKGEKGDVG PAGPRGATGVQGERGPPGLVLPGDPGPKGDPGDRGPIGLTGRAGPPGDSGPPGEKGDPGR PGPPGPVGPRGRDGEVGEKGDEGPPGDPGLPGKAGERGLRGAPGVRGPVGEKGDQGDPGE DGRNGSPGSSGPKGDRGEPGPPGPPGRLVDTGPGAREKGEPGDRGQEGPRGPKGDPGLPG APGERGIEGFRGPPGPQGDPGVRGPAGEKGDRGPPGLDGRSGLDGKPGAAGPSGPNGAAG KAGDPGRDGLPGLRGEQGLPGPSGPPGLPGKPGEDGKPGLNGKNGEPGDPGEDGRKGEKG DSGASGREGRDGPKGERGAPGILGPQGPPGLPGPVGPPGQGFPGVPGGTGPKGDRGETGS KGEQGLPGERGLRGEPGSVPNVDRLLETAGIKASALREIVETWDESSGSFLPVPERRRGP KGDSGEQGPPGKEGPIGFPGERGLKGDRGDPGPQGPPGLALGERGPPGPSGLAGEPGKPG IPGLPGRAGGVGEAGRPGERGERGEKGERGEQGRDGPPGLPGTPGPPGPPGPKVSVDEPG PGLSGEQGPPGLKGAKGEPGSNGDQGPKGDRGVPGIKGDRGEPGPRGQDGNPGLPGERGM AGPEGKPGLQGPRGPPGPVGGHGDPGPPGAPGLAGPAGPQGPSGLKGEPGETGPPGRGLT GPTGAVGLPGPPGPSGLVGPQGSPGLPGQVGETGKPGAPGRDGASGKDGDRGSPGVPGSP GLPGPVGPKGEPGPTGAPGQAVVGLPGAKGEKGAPGGLAGDLVGEPGAKGDRGLPGPRGE KGEAGRAGEPGDPGEDGQKGAPGPKGFKGDPGVGVPGSPGPPGPPGVKGDLGLPGLPGAP GVVGFPGQTGPRGEMGQPGPSGERGLAGPPGREGIPGPLGPPGPPGSVGPPGASGLKGDK GDPGVGLPGPRGERGEPGIRGEDGRPGQEGPRGLTGPPGSRGERGEKGDVGSAGLKGDKG DSAVILGPPGPRGAKGDMGERGPRGLDGDKGPRGDNGDPGDKGSKGEPGDKGSAGLPGLR GLLGPQGQPGAAGIPGDPGSPGKDGVPGIRGEKGDVGFMGPRGLKGERGVKGACGLDGEK GDKGEAGPPGRPGLAGHKGEMGEPGVPGQSGAPGKEGLIGPKGDRGFDGQPGPKGDQGEK GERGTPGIGGFPGPSGNDGSAGPPGPPGSVGPRGPEGLQGQKGERGPPGERVVGAPGVPG APGERGEQGRPGPAGPRGEKGEAALTEDDIRGFVRQEMSQHCACQGQFIASGSRPLPSYA ADTAGSQLHAVPVLRVSHAEEEERVPPEDDEYSEYSEYSVEEYQDPEAPWDSDDPCSLPL DEGSCTAYTLRWYHRAVTGSTEACHPFVYGGCGGNANRFGTREACERRCPPRVVQSQGTG TAQD

3.2 Reverse Translation

According to the Central Dogma, DNA is transcribed into RNA and then translated into protein. Since each amino acid is encoded by a three-nucleotide codon, we can work backwards from a protein sequence to determine a possible DNA sequence.

For the partial collagen sequence shown previously:

  • Protein sequence (partial): MTLRLLVAALCAGILAEAPRVRAQHRERVTCTRLYAADIVFLLDGSSSIGRSNFREVRSF

Using NCBI, one possible nucleotide sequence that encodes this amino acid sequence is:

  • DNA sequence (one possible version):

      1 aattcccaca aaccctgctg acttgacccc attggcccag acccctgttc cctgccactg
     61 gatgagggct cctgcactgc ctacaccctg cgctggtacc atcgggctgt gacaggcagc
    121 acagaggcct gtcacccttt tgtctatggt ggctgtggag ggaatgccaa ccgttttggg
    181 acccgtgagc ctgcgagcgc cgctgcccac cccgggtgtc cagagccagg ggacaggtac
    241 tgcccaggac tgaggcccag ataatgagct gagattcagc atcccctgga ggacgtcggg
    301 gtctcagcag aaccccactg tccctcccct tggtgctaga ggcttgtgtg cacgtgagcg
    361 tcggttgtgc agttcccgtt atttcagtga cttggtcccg tgggtctaac cttcccccct
    421 gtggacaaac ccccattgtg gctccn
    

Explanation:

ATG → Methionine (M)

GGT → Glycine (G)

CCT → Proline (P)

CGT → Arginine (R)

Because the genetic code is degenerate (multiple codons can encode the same amino acid), this is only one possible DNA sequence. Many other nucleotide sequences could produce the exact same collagen protein segment.

3.3 Codon Optimization

After obtaining a possible DNA sequence from reverse translation, the next step is codon optimization. Although multiple DNA sequences can encode the same protein, different organisms prefer certain codons over others. This is known as codon bias. If a gene contains many codons that are rarely used in the host organism, protein production may be slow or inefficient. For this assignment, I chose to optimize the collagen sequence for Escherichia coli because it is widely used in biotechnology. E. coli grows quickly, is inexpensive to culture, and is commonly used for recombinant protein production. Using an online codon optimization tool, the DNA sequence was adjusted to:

  • Use codons that are frequently used in E. coli
  • Improve translation efficiency
  • Avoid problematic sequences (such as strong secondary structures or unwanted restriction sites)

Importantly, codon optimization does not change the amino acid sequence of the protein. It only changes the nucleotide sequence to improve expression in the chosen organism. By optimizing the codons for E. coli, the collagen gene would be more efficiently transcribed and translated, leading to higher protein yield.

  • Improved ADN: GCCGCAACCACCTGCTGCTGCGCCTGCGCCGCGGCCTGCTGTTGCACCGGCTGCACCGGCGCGTGTACCACCGGCGCATGCTGCTGCTGCGCGACCACCGGTGGTTGCTGTTGTGCGGGTGCCTGCTGCTGTTGCACCGGCACCACCTGTTGCTGCACCGGCTGCTGCGCATGTACCGGCGGAGCCACCGGCGCGGGCGGTGGCTGCACCTGTTGCACCGGCTGCGCATGTACCGGCTGCTGCACCGCGTGCGCCTGCTGCTGCACCGGCTGCGGCTGCACTGGCGGTACTGCATGCTGCGCGACCTGCGGCGGCGGCTGCACCGGCACCGGTGCCTGTGCCGGCGGCTGCGCGGGCTGCGCGTGCGCGGGCGCCGGTGGCTGCTGCACCGGCACCTGCGCCTGTTGCTGCACCACCACCACCGGCACCTGTACAGCGACCGGCGGCACCGGCGGCTGCACCGGCACCGGCGGTGCGGGTGGCGGCGCGGCGACCGGTTGTTGTGCAGCGTGCTGCGGCACCACCACCACAGGTGGTGGTGCGTGTTGCTGCGGCACCGGCGCGGGCTGCTGCACCGGCTGTGGTGCGGGCTGCGGCTGTTGCGGCTGTACCGGCTGTTGCTGCGCGTGTTGCTGTTGTGGCGGTGGCACCGGTACCTGCTGCGCGGGCGCGGGTTGCTGTGCCGGCGGCGGCGGCGCCTGTGCGGGTGGCACCGCGTGCACCGGCTGCTGCTGCGCCGGAGGCGCGTGCACCGGCGCCGGTGGTTGCTGTTGCGCGGGTGCCACCGCGGCAACCGGCGCGGGCTGTACCGGCGCAGGCGCAACCACCTGCGCAGGTTGTGCGACGTGCTGCTGTTGTACAGGTGGCGCGGGCGGCGCCTGTGGCACGTGTGGCGGCGGCGGTACCTGCACCTGTGCCGGCTGCGCGGGCGCAGCGTGTTGTTGCTGCGCGTGCACGGGCACCTGCTGTTGCACCTGCTGTTGCTGTACCACCGGGGGCACCGGCTGCACCGCAGGTGCCGGCGGTTGTACCACCGGTACCGGTACCGGTTGTGCGTGCGGCACCGGCGCGGGCTGCGGCACCTGCGGCGGTACCACCGGTACCGGCTGCGCGGGTACCACCTGCTGCTGCGGCACCACCGCGACGACCACCTGCGCAGGTACCGGTGCGTGCACCACCGGTGGCACCTGCTGCTGTGGCACCGGCGGTGGTACCTGCACAGCGGCCTGCTGCACCACCTGCTGTTGCTGCTGTTGCACCGGCACCGGTGGTGCGTGTGCGGCGGCCTGCTGCTGCTGTTGCGCGACGACCGGCACCGGCGGATGTACCTGCTGCAATTAA

3.4 You have a sequence! Now what?

Now that I have a codon-optimized DNA sequence for collagen, the next step is to produce the protein. One common method is a cell-dependent system. In this approach, the optimized DNA sequence is inserted into a plasmid (a small circular DNA molecule). The plasmid is then introduced into Escherichia coli cells through transformation. Once inside the cell:

  • The DNA is transcribed into mRNA.
  • The mRNA is translated by ribosomes into the collagen protein.
  • The bacterial cell essentially acts as a biological factory, producing the protein as it grows.

Another option is a cell-free system. In this method, instead of using living cells, the DNA is added to a solution containing the necessary molecular machinery (ribosomes, enzymes, nucleotides, amino acids). The transcription and translation processes occur in a test tube, producing the protein directly. This method is faster and more controlled, but usually more expensive. In both cases, the DNA sequence follows the Central Dogma: DNA → RNA → Protein, resulting in the production of the collagen protein.

Part 4: Prepare a Twist DNA Synthesis Order

This project uses E. coli and collagen to create reproducible patterns that simulate organic components of ancient artifacts, such as textiles or adhesives. Collagen acts as a structural scaffold to hold proteins in place, while engineered E. coli produce proteins that form visible patterns. Automation ensures precision and repeatability, allowing us to study how these materials might degrade or be preserved over time, providing insights into experimental archaeology and conservation.

Week 3 HW: Lab Automation

Assignment: Python Script for Opentrons Artwork

This design is inspired by traditional Inca geometric art, particularly the tocapu textile patterns of the Inca Empire. The composition features a symmetrical stepped cross motif enclosed within a square, referencing the Andean worldview and the symbolic structure of the Chakana (Andean cross). The use of straight lines and geometric repetition reflects the mathematical precision and cosmological symbolism characteristic of Inca visual culture. (https://opentrons-art.rcdonovan.com/?id=6ef1d0494o5n1p7)

Post-Lab Questions

  1. Find and describe a published paper that utilizes the Opentrons or an automation tool to achieve novel biological applications.

A relevant example is the paper “An open-source automated platform for high-throughput RT-qPCR testing” developed during the COVID-19 pandemic. In this work, researchers used the Opentrons OT-2 liquid handling robot to automate RNA extraction and RT-qPCR setup.

The system enabled scalable, low-cost diagnostic testing by reducing manual pipetting steps, minimizing human error, and increasing reproducibility. This study demonstrated how open-source automation tools can expand access to molecular diagnostics, especially in resource-limited settings.

The novelty of this application lies in democratizing laboratory automation—allowing smaller labs to perform high-throughput testing without expensive proprietary systems.

LINK: https://pubmed.ncbi.nlm.nih.gov/34260637/

  1. Write a description about what you intend to do with automation tools for your final project. You may include example pseudocode, Python scripts, 3D printed holders, a plan for how to use Ginkgo Nebula, and more.

For my final project, I plan to use laboratory automation tools to develop controlled collagen-based biomaterials inspired by ancient Andean techniques. Collagen will serve as a structural matrix that mimics organic components found in archaeological artifacts such as textiles, adhesives, or composite materials. By using automated liquid handling, I aim to precisely control mixing ratios and spatial deposition of biological components within the collagen scaffold. This will allow the creation of reproducible material samples that can be used to study degradation processes, conservation strategies, or experimental archaeology models.

Automation ensures precision and repeatability, which are essential when comparing material behavior under different environmental conditions.

For my final project, I plan to use two main pieces of equipment to create collagen-based materials inspired by archaeological patterns. First, I will use the Opentrons OT-2 liquid handling robot. This robot allows precise mixing of liquids and can deposit the mixture in exact locations with consistent volumes. In my project, I will prepare different mixtures of collagen and pigments that mimic the colors and textures of ancient textiles or other organic components found in archaeological artifacts. The robot will then deposit these mixtures according to a predetermined pattern, such as geometric motifs inspired by Inca textiles. Using the robot ensures that each replica is precise and reproducible, allowing me to create multiple samples under the same conditions without human error.

Second, I will use 3D-printed holders or molds to support the materials during deposition. These molds will be designed to match the shape of specific archaeological patterns, such as squares or other geometric compartments. The robot will deposit the collagen mixtures into these molds, and once the collagen sets, the molds can be removed to reveal a precise replica of the intended pattern. This combination of automation and custom molds allows me to accurately reproduce complex designs and study how these materials behave, degrade, or can be conserved, providing a controlled and repeatable approach to experimental archaeology.

Final Project Ideas

Week 4 HW: Protein Design Part I

Part A. Conceptual Questions

Answer any NINE of the following questions from Shuguang Zhang: (i.e. you can select two to skip)

1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)

To estimate how many amino acid molecules are in 500 grams of meat, we start with the fact that the average mass of one amino acid is about 100 Daltons.

One Dalton equals 1.66 x 10^(-24) grams. So, 100 Daltons equals:

100 x 1.66 x 10^(-24) = 1.66 x 10^(-22) grams per amino acid.

Now we divide the total mass of meat (500 grams) by the mass of one amino acid:

500 divided by 1.66 x 10^(-22) = 3 x 10^(24)

Therefore, 500 grams of meat contain approximately 3 x 10^(24) amino acid molecules.

This shows that even a small amount of food contains an enormous number of molecular building blocks.

2. Why do humans eat beef but do not become a cow, eat fish but do not become fish?

Humans eat beef or fish, but we do not become a cow or a fish because our bodies break down food into basic molecular components before using it.

When we eat meat, our digestive system breaks proteins into amino acids, fats into fatty acids, and carbohydrates into simple sugars. These small molecules are absorbed into the bloodstream and then reused by our cells to build human proteins, human tissues, and human cells according to our own DNA instructions.

The key reason we do not become what we eat is that our genetic information controls how these molecules are assembled. A cow’s DNA builds cow proteins and tissues, while human DNA builds human proteins and tissues. Even though the raw materials are similar, the instructions are different.

In short, we do not become a cow or a fish because our body does not copy their structure — it only reuses their molecular building blocks to maintain and build our own human body.

3. Why are there only 20 natural amino acids?

The reason there are 20 natural amino acids is that those 20 are enough to build all the proteins that life needs.

Think of amino acids like Lego pieces. You do not need thousands of different pieces to build something complex; with a well-designed set, you can create almost any structure. The 20 amino acids have different sizes, electrical charges, and properties (some are hydrophobic, others are hydrophilic, some are positive, others negative). This variety is enough for proteins to fold into many different shapes and perform many different functions.

Another important reason is evolution. Early in the origin of life, more types of amino acids may have existed. However, these 20 worked well together within the genetic system. Once the genetic code was established using these 20 amino acids, changing it would have been very risky for organisms. For that reason, the system remained stable.

In summary, there are 20 amino acids because this number provides enough chemical diversity to create the complexity of life, and evolution fixed this set as the standard.

4. Can you make other non-natural amino acids? Design some new amino acids

Yes, it is possible to create non-natural amino acids. Scientists can design new amino acids by modifying the part of the molecule known as the side chain, or R group. All amino acids share the same basic structure: an amino group, a carboxyl group, a hydrogen atom, and a variable side chain. The side chain is what determines the chemical properties of each amino acid. By changing this side chain, new amino acids with new properties can be created.

For example, one could design a modified version of phenylalanine by adding fluorine atoms to its aromatic ring. This change could make the amino acid more chemically stable and resistant to degradation, which would be useful in biomaterials. Another possibility would be designing a photo-responsive amino acid whose side chain changes shape when exposed to light. This could allow scientists to control protein activity using specific wavelengths of light. A third example could be a metal-binding amino acid with a side chain designed to strongly interact with metals such as copper or iron, which could be useful in environmental or material science applications.

Although natural organisms use only 20 standard amino acids, synthetic biology has made it possible to expand the genetic code. By engineering specialized transfer RNAs and modifying translation systems, researchers can incorporate non-natural amino acids into proteins. This allows the creation of proteins with entirely new properties that do not exist in nature.

In summary, non-natural amino acids can be designed by modifying the chemical structure of existing ones, particularly their side chains, enabling the development of new biological functions and materials.

5. Where did amino acids come from before enzymes that make them, and before life started?

Before life began, amino acids likely formed through natural chemical reactions on the early Earth. At that time, there were no enzymes or living cells. Instead, simple molecules such as water (H2O), methane (CH4), ammonia (NH3), hydrogen (H2), and carbon dioxide (CO2) were present in the atmosphere and oceans. Energy from lightning, volcanic activity, ultraviolet radiation from the sun, and geothermal heat provided the energy needed to drive chemical reactions between these small molecules.

In 1953, the Miller-Urey experiment showed that when simple gases thought to exist on early Earth were exposed to electrical sparks (simulating lightning), amino acids formed spontaneously. This demonstrated that the building blocks of proteins can arise from non-living chemical processes. In addition, amino acids have been found in meteorites, suggesting that some may have formed in space and arrived on Earth through asteroid impacts.

After amino acids were present, some of them began to link together, forming short chains called peptides. Over time, certain peptides may have developed the ability to speed up chemical reactions slightly. These primitive catalytic molecules would have provided an advantage, because they made useful reactions happen more efficiently.

Before true protein enzymes existed, many scientists believe that RNA molecules played an important role. RNA can both store information and act as a catalyst (these catalytic RNAs are called ribozymes). This idea is known as the “RNA world” hypothesis. Eventually, as biological systems became more complex, proteins replaced most RNA catalysts because proteins are more versatile and efficient. These protein catalysts are what we now call enzymes.

In summary, amino acids likely formed through natural chemical reactions powered by environmental energy sources. Over time, they combined into peptides, some of which gained catalytic abilities. Through gradual evolution—possibly beginning with catalytic RNA—modern enzymes eventually emerged, allowing life to develop increasingly complex biochemical systems.

6. If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?

If you make an alpha-helix using D-amino acids, you would expect it to form a left-handed helix.

In nature, proteins are made almost entirely from L-amino acids. These naturally form right-handed alpha-helices because of their specific three-dimensional geometry. The spatial arrangement of atoms in L-amino acids favors a right-handed twist when they fold into an alpha-helix structure.

D-amino acids are mirror images of L-amino acids. Because their geometry is reversed, the preferred helix direction is also reversed. As a result, a chain made entirely of D-amino acids would form a left-handed alpha-helix.

In summary, L-amino acids form right-handed alpha-helices, while D-amino acids form left-handed alpha-helices due to their mirror-image stereochemistry.

7. Can you discover additional helices in proteins?

Yes, additional helices can be discovered in proteins. Proteins are very flexible molecules that can fold into many different shapes depending on their amino acid sequence. While the alpha-helix is one of the most common helical structures found in nature, it is not the only possible one.

Scientists have identified other types of helices, such as the 3₁₀ helix and the pi-helix. These structures differ slightly in how tightly they are wound and in the pattern of hydrogen bonds that stabilize them. They are less common than the alpha-helix but still naturally occur in some proteins.

In addition, researchers in protein engineering and synthetic biology can design entirely new helical structures by changing amino acid sequences or by incorporating non-natural amino acids. Advances in computational tools and artificial intelligence now allow scientists to predict and design novel protein folds that may not exist in nature.

In summary, additional helices can be discovered or designed because protein structure depends on the chemical properties and arrangement of amino acids, and these combinations allow for many possible folding patterns.

8. Why are most molecular helices right-handed?

Most molecular helices are right-handed because the building blocks of life are not symmetrical. In living organisms, amino acids are almost always in the L-form, which has a specific three-dimensional orientation. This asymmetry, called chirality, influences how molecules fold and assemble.

When many L-amino acids link together to form a protein, their geometry naturally favors a right-handed twist when forming structures like the alpha-helix. The specific angles between chemical bonds and the way hydrogen bonds stabilize the structure make the right-handed version more stable for L-amino acids.

In general, once life selected L-amino acids as the standard building blocks, the structures that formed from them (such as protein helices) also followed a consistent handedness. This biological preference became universal because it was energetically favorable and evolutionarily fixed.

In summary, most molecular helices are right-handed because life uses L-amino acids, and their three-dimensional structure naturally leads to right-handed helical folding.

9. Why do β-sheets tend to aggregate?

Beta-sheets tend to aggregate because their structure allows strong and repeated interactions between neighboring protein strands. In a beta-sheet, the backbone of the protein forms many hydrogen bonds in a very regular and extended pattern. This creates flat surfaces that can easily align with other beta-strands from nearby molecules.

When these flat regions come close together, they can form additional hydrogen bonds between different protein molecules. This stacking effect is energetically favorable, meaning it lowers the system’s energy and makes aggregation more stable. Because the pattern of hydrogen bonding is repetitive and strong, beta-sheets can “zip up” with each other, leading to large aggregates.

This behavior is especially important in diseases like Alzheimer’s, where proteins misfold and form beta-sheet–rich aggregates known as amyloid fibrils. The beta-sheet structure makes it easy for many copies of the same protein to stick together in an ordered way.

In summary, beta-sheets tend to aggregate because their flat, hydrogen-bonded structure allows them to align and form stable intermolecular interactions with other beta-sheets, promoting stacking and aggregation.

  • What is the driving force for β-sheet aggregation?

The main driving force for beta-sheet aggregation is the formation of hydrogen bonds between protein backbones, combined with hydrophobic interactions.

In a beta-sheet structure, the protein backbone is extended and forms hydrogen bonds in a very regular pattern. When multiple beta-strands from different protein molecules come close together, they can form additional hydrogen bonds between each other. This creates a very stable, repetitive “zipper-like” structure.

At the same time, many beta-sheet–forming regions contain hydrophobic (water-repelling) amino acids. When these hydrophobic surfaces are exposed to water, it is energetically unfavorable. Aggregation helps bury these hydrophobic regions away from water, which lowers the overall energy of the system. This hydrophobic effect strongly promotes aggregation.

So, the driving forces are:

  1. Backbone hydrogen bonding between strands.
  2. The hydrophobic effect, which pushes nonpolar regions to cluster together.
  3. Overall energy minimization, making the aggregated state more stable.

In summary, beta-sheet aggregation is driven by strong hydrogen bonding and hydrophobic interactions that stabilize stacked protein structures.

Part B: Protein Analysis and Visualization

1. Briefly describe the protein you selected and why you selected it

I selected collagen as the protein for this assignment. Collagen is a structural protein that is the main component of connective tissues such as skin, tendons, cartilage, and bone. It is especially important in archaeology because collagen is one of the primary organic materials preserved in ancient bones, textiles, and artifacts. Archaeologists often analyze collagen to study diet, radiocarbon dating, and preservation conditions.

Collagen has a unique three-dimensional structure known as a triple helix, formed by three polypeptide chains tightly wound around each other. This structure gives collagen its strength and stability. I selected collagen because it is directly relevant to archaeological research and material preservation, and its distinctive structure makes it an excellent example for understanding how protein structure relates to function and long-term stability.

2. Identify the amino acid sequence of your protein.

  • How long is it? What is the most frequent amino acid?

The length of the protein is: 2993 aminoacids. The most common amino acid is: G, which appears 627 times.

  • How many protein sequence homologs are there for your protein?

To determine the number of homologs, I used UniProt’s BLAST tool to search for sequences similar to human collagen type I (COL1A1). The BLAST results showed thousands of homologous protein sequences across many different organisms, particularly vertebrates such as mammals, birds, reptiles, and fish.

Collagen is a highly conserved structural protein, meaning its sequence has remained relatively similar throughout evolution. Because it plays a critical role in connective tissues such as bone and skin, it is present in nearly all multicellular animals. As a result, BLAST identifies a very large number of homologs with significant sequence similarity.

This high number of homologous sequences reflects the essential structural role of collagen and its evolutionary conservation across species.

  • Does your protein belong to any protein family?

Yes, collagen Type I alpha 1 chain (COL1A1) belongs to the collagen protein family. More specifically, it is part of the fibrillar collagen family.

Collagens are a large family of structural proteins that form the extracellular matrix in connective tissues. They share a characteristic triple-helix structure composed of three polypeptide chains and a repeating Gly-X-Y amino acid sequence, where glycine appears every third residue. This repeating pattern is essential for forming the stable triple helix.

Within the collagen superfamily, Type I collagen belongs to the fibrillar collagens, which also include types II, III, V, and XI. These collagens form rope-like fibers that provide tensile strength to tissues such as bone, skin, and tendons.

In summary, COL1A1 is a member of the collagen superfamily and specifically part of the fibrillar collagen family, which is responsible for structural support in connective tissues.

3. Identify the structure page of your protein in RCSB

  • When was the structure solved? Is it a good quality structure?

One representative collagen triple-helix structure was solved using X-ray crystallography with a resolution of approximately 2.7 Å. A resolution of 2.7 Å is considered good quality, since smaller values indicate higher structural detail. At this resolution, atomic positions can be determined with reasonable accuracy. LINK: https://www.rcsb.org/structure/1CAG

  • Are there any other molecules in the solved structure apart from protein?

Yes. In addition to the collagen protein chains, X-ray crystallography structures often include water molecules and sometimes small ions or stabilizing molecules. These are commonly found in crystal structures because they help stabilize packing interactions in the crystal lattice.

  • Does your protein belong to any structure classification family?

Yes. The collagen triple helix belongs to the fibrous protein structural family. Collagens are classified as structural proteins with a unique triple-helix motif, consisting of three polypeptide chains wound around each other. This distinct arrangement classifies collagen as a structural superfamily separate from globular proteins.

4. Open the structure of your protein in any 3D molecule visualization software:

  • Visualize the protein as “cartoon”, “ribbon” and “ball and stick”.

When visualized as cartoon and ribbon representations, the structure clearly shows the characteristic triple-helix arrangement of collagen. The protein consists of three polypeptide chains tightly wound around each other. When displayed in ball-and-stick representation, individual atoms and the repeating Gly-X-Y pattern can be clearly observed.

  • Color the protein by secondary structure. Does it have more helices or sheets?

When colored by secondary structure, the protein shows predominantly helical structure. Collagen does not form beta-sheets like many globular proteins. Instead, it forms a unique triple-helix composed of three left-handed helices wrapped into a right-handed superhelix. Therefore, the structure contains more helical content and essentially no beta-sheets.

  • Color the protein by residue type. What can you tell about the distribution of hydrophobic vs hydrophilic residues?

When colored by residue type, a clear pattern appears. Glycine residues are distributed regularly throughout the structure because glycine occurs every third position in collagen (Gly-X-Y repeat). Many hydrophobic residues are buried toward the interior of the triple helix, contributing to structural stability, while more hydrophilic residues are exposed toward the solvent. This distribution supports structural integrity and interaction with the extracellular environment.

  • Visualize the surface of the protein. Does it have any “holes” (aka binding pockets)?

When visualizing the surface of the protein, collagen does not display deep binding pockets like many enzymes. Since it is a structural fibrous protein rather than a globular enzyme, it lacks large internal cavities or active-site pockets. Instead, the surface is elongated and repetitive, consistent with its mechanical structural role.

Part C. Using ML-Based Protein Design Tools

For this project, I selected the collagen triple helix model (PDB ID: 1CAG). Collagen is a structural protein that forms the extracellular matrix in connective tissues such as bone and skin. I chose collagen because of its biological relevance and its characteristic Gly-X-Y repeating motif, which is essential for triple-helix formation. As an archaeologist interested in biomaterials and preservation, collagen is particularly meaningful due to its importance in bone structure and archaeological remains.

1. Deep Mutational Scans

Using ESM2, I generated an unsupervised deep mutational scan of the collagen triple helix model (PDB ID: 1CAG) based on language model likelihood scores. The heatmap reveals clear positional constraints across the sequence.

A particularly striking pattern appears at glycine positions. For example, at position 13, glycine shows a strongly positive score (yellow), while most alternative amino acids show strongly negative scores (blue/purple). This indicates that substitutions at this position are highly unfavorable.

This pattern reflects the structural constraint of the Gly-X-Y repeating motif characteristic of collagen. Glycine is required every third residue to allow tight packing of the triple helix. Replacing glycine with a bulkier amino acid would introduce steric clashes and destabilize the structure. The language model successfully captures this evolutionary constraint without being explicitly trained on structural data.

Overall, the deep mutational scan demonstrates that structurally critical residues, particularly glycine, are strongly conserved and intolerant to mutation.

Part D. Group Brainstorm on Bacteriophage Engineering

Project Objective

  • Engineer the L protein of the MS2 phage to increase structural stability.
  • Disrupt or reduce its interaction with the bacterial chaperone DnaJ.
  • Preserve the C-terminal lysis domain to maintain lytic function.
  • Avoid mutations that interfere with structurally or evolutionarily coupled residues.

Phase 1: Mapping the DnaJ Interaction Interface

Since the exact binding interface between the L protein and DnaJ is unknown, the first step is to identify it computationally rather than introducing arbitrary mutations.

  • Use AlphaFold-Multimer to model the complex between L protein and DnaJ.
  • Generate multiple structural predictions and select the top-ranked models.
  • Identify consensus interface residues that consistently appear in the predicted binding interface.
  • Perform in silico alanine scanning of the N-terminal residues in the complex to determine which residues significantly contribute to binding energy (ΔΔG).
  • Analyze whether the N-terminal region resembles known DnaJ-binding motifs, typically hydrophobic residues flanked by basic amino acids.

This phase defines which residues are critical for interaction and should not be mutated randomly.

Phase 2: Targeted N-Terminal Redesign

Instead of deleting regions or performing extensive random substitutions, introduce controlled chemical modifications to disrupt interaction while preserving structural stability.

  • Focus on charge inversion strategies:

    • Basic residues (K, R) → Acidic residues (E, D)
    • Acidic residues (E, D) → Basic residues (K, R)
  • Disrupt hydrophobic interaction patches:

    • Hydrophobic residues (L, I, V, F) → Polar residues (S, T, N, Q)
    • Aromatic residues (F, Y, W) → Aliphatic or small residues
  • Generate a graded library of variants:

    • Minor charge modifications
    • Moderate interface perturbations
    • Strong hydrophobic disruption

This creates a Pareto front of variants balancing reduced DnaJ interaction and preserved protein stability.

Phase 3: Stability and Functional Filtering

To ensure that redesigned variants remain structurally viable and functionally relevant:

  • Use Rosetta or FoldX to calculate ΔΔG and verify that mutations do not destabilize the overall protein fold.

  • Confirm that mutations in the N-terminal region do not propagate structural stress toward the C-terminal lysis domain.

  • Perform co-evolutionary analysis (e.g., EVcouplings):

    • Identify residue pairs that co-evolved between the N-terminal and C-terminal regions.
    • Avoid mutating co-evolved residues independently to prevent functional disruption.
  • Evaluate aggregation propensity using tools such as Aggrescan3D to ensure that mutations do not create exposed hydrophobic patches leading to cytoplasmic aggregation.

  • Assess sequence plausibility using protein language models such as ESM to filter out unlikely or non-natural variants.

Key Limitations

  • The DnaJ binding mode may be transient or dynamic, reducing AlphaFold-Multimer accuracy.
  • Protein language model scores do not guarantee in vivo functionality.
  • Intrinsically disordered regions may not be accurately modeled.
  • Computational predictions must ultimately be validated experimentally.

Subsections of Labs

Week 1 Lab: Pipetting

cover image cover image

Subsections of Projects

Individual Final Project

cover image cover image

Group Final Project

cover image cover image