ANDREA CARRILLO — HTGAA Spring 2026

About me
I am an undergraduate archaeologist from Peru
Contact info
Homework
- Week 1 HW: Principles and Practices
- Week 2 HW: DNA Read, Write, & Edit
- Week 3 HW: Lab Automation
- Week 4 HW: Protein Design Part I

I am an undergraduate archaeologist from Peru
Week 1 HW: Principles and Practices
First, describe a biological engineering application or tool you want to develop and why. I am interested in developing a biological engineering approach that uses living organisms to help us understand and preserve archaeological materials and sites. Specifically, I want to explore how microorganisms could be used to study how materials such as stone, soil, or ceramics change over time, or how biological growth can be guided to protect fragile archaeological surfaces.
Week 2 HW: DNA Read, Write, & Edit
Part 1: Benchling & In-silico Gel Art Create a pattern/image in the style of Paul Vanouse’s Latent Figure Protocol artworks. Part 2: Gel Art - Restriction Digests and Gel Electrophoresis Optional for Committed Listeners Part 3: DNA Design Challenge 3.1 Choose your protein
Assignment: Python Script for Opentrons Artwork This design is inspired by traditional Inca geometric art, particularly the tocapu textile patterns of the Inca Empire. The composition features a symmetrical stepped cross motif enclosed within a square, referencing the Andean worldview and the symbolic structure of the Chakana (Andean cross). The use of straight lines and geometric repetition reflects the mathematical precision and cosmological symbolism characteristic of Inca visual culture. (https://opentrons-art.rcdonovan.com/?id=6ef1d0494o5n1p7)
Week 4 HW: Protein Design Part I
Part A. Conceptual Questions Answer any NINE of the following questions from Shuguang Zhang: (i.e. you can select two to skip)

I am interested in developing a biological engineering approach that uses living organisms to help us understand and preserve archaeological materials and sites. Specifically, I want to explore how microorganisms could be used to study how materials such as stone, soil, or ceramics change over time, or how biological growth can be guided to protect fragile archaeological surfaces.
This idea is interesting to me because archaeological materials are shaped by long-term interactions between the environment and living systems. Instead of seeing biology only as a source of damage, I am curious about how biological processes could become a tool for analysis or conservation. For an HTGAA project, I want to explore how growth, decay, and environmental conditions can be treated as design variables to better understand the past and develop new, more sustainable conservation methods.
One key governance and policy goal for this application is to ensure that biological tools used in archaeological contexts do not cause harm to people, sites, or cultural heritage. Because archaeological materials are fragile and often irreplaceable, it is important that biological interventions are carefully controlled and ethically guided.
A first sub-goal is non-malfeasance, meaning preventing physical or biological damage. This includes ensuring that any microorganisms used cannot spread uncontrollably, alter archaeological materials in irreversible ways, or disrupt surrounding ecosystems. Strict containment, reversibility, and testing protocols would be essential before any real-world application.
A second sub-goal is cultural and community respect. Archaeological sites are often connected to living communities and cultural identities. Governance frameworks should ensure that local stakeholders are informed, consulted, and involved in decisions about the use of biological technologies on heritage sites. This helps prevent extractive or colonial practices and supports ethical collaboration.
A third sub-goal is responsible knowledge use. Research outcomes, data, and tools should be shared transparently for conservation and educational purposes, while avoiding misuse, commercialization without consent, or applications that prioritize novelty over preservation. Together, these goals help ensure that biological engineering contributes to an ethical, respectful, and sustainable future for archaeology.
Action 1 : Ethical Review Requirement for Biological Interventions in Archaeology
(New rule / requirement — actors: universities, museums, heritage authorities)
Purpose Currently, ethical review processes mainly focus on research involving humans, while biological interventions on archaeological sites are often evaluated only for scientific merit. I propose creating a specific ethical review requirement for any use of living organisms in archaeological contexts, focused on protecting sites, materials, and surrounding ecosystems.
Design This action would require interdisciplinary review committees including archaeologists, biologists, conservation experts, and ethicists. Universities, museums, and heritage authorities would require approval from these committees before allowing biological tools to be tested or deployed at archaeological sites. Researchers would opt in by agreeing to this process as a condition of site access.
Assumptions This proposal assumes that such committees can be formed with sufficient expertise and that ethical review will meaningfully guide research rather than becoming a purely bureaucratic step. It also assumes researchers will accept additional oversight.
Risks of Failure & “Success” This approach could fail if reviews become symbolic or overly slow, discouraging exploratory research. If highly successful, it could unintentionally favor large institutions with more resources, making it harder for smaller or community-based projects to participate.
Action 2: Incentives for Reversible and Low-Risk Biological Methods
(Incentive — actors: funding agencies, research sponsors)
Purpose At present, research funding often prioritizes novelty and impact over safety, reversibility, or long-term risk. I propose creating funding incentives that prioritize biological methods which are reversible, low-risk, and environmentally contained when used in archaeological contexts.
Design Funding agencies and foundations would include ethical and safety criteria in grant calls, explicitly rewarding projects that minimize ecological and cultural risk. Researchers would voluntarily design projects to meet these criteria, and reviewers would need guidance on how to evaluate risk and reversibility alongside scientific merit.
Assumptions This proposal assumes that funding incentives can meaningfully influence research behavior and that risk can be reasonably assessed in advance. It also assumes that safer approaches will still allow for meaningful scientific insight.
Risks of Failure & “Success” The action may fail if incentives are too weak or applied superficially. If overly successful, it could discourage more experimental or unconventional approaches, potentially slowing innovation in the field.
Action 3: Community Co-Governance of Bio-Archaeological Applications
(Governance strategy — actors: local communities, researchers, heritage organizations)
Purpose Decisions about technological interventions at archaeological sites are often made by researchers or institutions, with limited involvement from local or descendant communities. I propose a co-governance approach in which communities connected to archaeological sites participate directly in decisions about the use of biological tools.
Design This would involve early consultation processes, accessible communication (non-technical language), and shared decision-making authority. Researchers and institutions would need to allocate time and resources to support meaningful participation and be willing to adapt or halt projects based on community input.
Assumptions This approach assumes that communities wish to participate, that diverse perspectives can be reconciled, and that scientific and local knowledge can productively inform each other.
Risks of Failure & “Success” Co-governance could fail if participation is symbolic rather than meaningful or if internal conflicts arise. If highly successful, it may slow down research or limit certain projects, but this may be an acceptable trade-off in contexts involving irreversible cultural heritage.
| Does the option: | Option 1 | Option 2 | Option 3 |
|---|---|---|---|
| Enhance Biosecurity | |||
| • By preventing incidents | 1 | 2 | 3 |
| • By helping respond | 2 | 1 | 3 |
| Foster Lab Safety | |||
| • By preventing incident | 1 | 2 | n/a |
| • By helping respond | 2 | 1 | n/a |
| Protect the environment | |||
| • By preventing incidents | 1 | 1 | 2 |
| • By helping respond | 2 | 2 | 1 |
| Other considerations | |||
| • Minimizing costs and burdens to stakeholders | 3 | 1 | 2 |
| • Feasibility? | 1 | 2 | 3 |
| • Not impede research | 3 | 1 | 2 |
| • Promote constructive applications | 2 | 1 | 1 |
Based on the scoring, I would prioritize a combination of Option 1 (Ethical Review Requirement) and Option 2 (Incentives for Reversible and Low-Risk Methods). This recommendation is directed to international organizations such as UNESCO and the United Nations, which play a key role in setting global norms for cultural heritage protection and emerging technologies.
Option 1 should function as a global baseline. It scores highest in preventing biosecurity, lab safety, and environmental harm, which is especially important for archaeological sites that are fragile and irreversible. An international ethical review framework, supported by UNESCO and the UN, could guide national and local authorities while allowing for contextual adaptation. The main trade-off is the potential increase in administrative complexity and slower research approval processes.
Option 2 should complement this baseline by encouraging safer and reversible biological methods through funding priorities and international research programs. This incentive-based approach preserves flexibility and innovation while reinforcing ethical behavior.
Option 3 (Community Co-Governance) should be promoted by UNESCO and the UN as a guiding principle, particularly in culturally sensitive contexts. While it may reduce speed and scalability, it strengthens legitimacy, equity, and long-term trust in the governance of biological tools applied to archaeology.
This approach assumes that UNESCO and the UN can influence national policies through standards, funding, and guidance, but there is uncertainty around consistent adoption and enforcement across different regions.
- ASSIGNMENT(Week 2 Lecture Prep) -
Homework Questions from Professor Jacobson:
DNA polymerase is the enzyme that copies DNA. It is very accurate, but not perfect:
It makes about 1 mistake in every 10 million DNA letters.
With correction systems, the final error rate is about 1 mistake in 1 billion letters.
The human genome has about 3 billion DNA letters, so without correction there would be many mistakes every time DNA is copied. How does biology fix this?
DNA polymerase checks its own work (proofreading).
Cells have repair systems that fix mistakes.
Harmful mutations are reduced over time by natural selection.
Because the genetic code is made of three-letter codons and there are 64 possible codons but only 20 amino acids, most amino acids can be encoded by more than one codon. This means that the same protein can be written in many different DNA sequences. For an average human protein, the number of possible DNA sequences that could code for it is extremely large.
However, in practice, not all of these DNA sequences work equally well. Some codons are translated more efficiently in human cells, while others slow down protein production. Certain DNA sequences can form mRNA structures that interfere with translation or make the mRNA unstable. In addition, some sequences can disrupt regulatory signals or affect how the protein folds during synthesis. As a result, only a subset of possible DNA codes is actually effective for producing the desired protein in cells.
Homework Questions from Dr. LeProust:
Oligos are made using a chemical method where DNA is built one letter at a time on a solid surface. This method is called phosphoramidite synthesis, and it is the standard method used today.
Each time a new DNA letter is added, there is a small chance of error.When the oligo gets longer, these small errors add up. Because of this:
After about 200 nucleotides, the number of correct oligos becomes very low.
To make a 2000 bp gene, you would need to add 2000 DNA letters in a row. With so many steps, almost every DNA molecule will contain errors. So, direct synthesis does not work for long genes. Instead, scientists make short oligos and then join them together to build long genes.
Homework Question from George Church:
Animals need 20 amino acids to make proteins, but they cannot make all of them. The 10 essential amino acids (they must come from the diet) are: Histidine, Isoleucine, Leucine, Lysine, Methionine, Phenylalanine, Threonine, Tryptophan, Valine, and Arginine (Arginine is essential especially during growth.)
Because animals cannot make these amino acids themselves, they depend on their food to get them.
The “Lysine Contingency” refers to the idea that lysine availability strongly limits animal growth, especially because lysine is often low in plant-based foods. Since lysine is essential and cannot be synthesized by animals, a lack of lysine can directly restrict protein synthesis and growth. This highlights how animal biology is dependent on plants and microbes, which can make lysine. It supports the idea that animal evolution and nutrition are constrained by the availability of lysine in the environment.
Create a pattern/image in the style of Paul Vanouse’s Latent Figure Protocol artworks.

Optional for Committed Listeners
3.1 Choose your protein
For this assignment, I chose Collagen Type I (alpha 1 chain). I selected this protein because collagen is the main structural protein found in bone, teeth, and connective tissues. In archaeology, collagen is extremely important because it can survive for thousands of years in skeletal remains and artifacts made from bone or leather. It is widely used in radiocarbon dating and paleoproteomics to identify species and study ancient diets. Since I am interested in archaeology, this protein connects molecular biology with archaeological research.
Using UniProt, I obtained the protein sequence for human Collagen Type I alpha 1 chain (COL7A1).
sp|Q02388-1|CO7A1_HUMAN Isoform 1 of Collagen alpha-1(VII) chain OS=Homo sapiens OX=9606 GN=COL7A1 MTLRLLVAALCAGILAEAPRVRAQHRERVTCTRLYAADIVFLLDGSSSIGRSNFREVRSF LEGLVLPFSGAASAQGVRFATVQYSDDPRTEFGLDALGSGGDVIRAIRELSYKGGNTRTG AAILHVADHVFLPQLARPGVPKVCILITDGKSQDLVDTAAQRLKGQGVKLFAVGIKNADP EELKRVASQPTSDFFFFVNDFSILRTLLPLVSRRVCTTAGGVPVTRPPDDSTSAPRDLVL SEPSSQSLRVQWTAASGPVTGYKVQYTPLTGLGQPLPSERQEVNVPAGETSVRLRGLRPL TEYQVTVIALYANSIGEAVSGTARTTALEGPELTIQNTTAHSLLVAWRSVPGATGYRVTW RVLSGGPTQQQELGPGQGSVLLRDLEPGTDYEVTVSTLFGRSVGPATSLMARTDASVEQT LRPVILGPTSILLSWNLVPEARGYRLEWRRETGLEPPQKVVLPSDVTRYQLDGLQPGTEY RLTLYTLLEGHEVATPATVVPTGPELPVSPVTDLQATELPGQRVRVSWSPVPGATQYRII VRSTQGVERTLVLPGSQTAFDLDDVQAGLSYTVRVSARVGPREGSASVLTVRREPETPLA VPGLRVVVSDATRVRVAWGPVPGASGFRISWSTGSGPESSQTLPPDSTATDITGLQPGTT YQVAVSVLRGREEGPAAVIVARTDPLGPVRTVHVTQASSSSVTITWTRVPGATGYRVSWH SAHGPEKSQLVSGEATVAELDGLEPDTEYTVHVRAHVAGVDGPPASVVVRTAPEPVGRVS RLQILNASSDVLRITWVGVTGATAYRLAWGRSEGGPMRHQILPGNTDSAEIRGLEGGVSY SVRVTALVGDREGTPVSIVVTTPPEAPPALGTLHVVQRGEHSLRLRWEPVPRAQGFLLHW QPEGGQEQSRVLGPELSSYHLDGLEPATQYRVRLSVLGPAGEGPSAEVTARTESPRVPSI ELRVVDTSIDSVTLAWTPVSRASSYILSWRPLRGPGQEVPGSPQTLPGISSSQRVTGLEP GVSYIFSLTPVLDGVRGPEASVTQTPVCPRGLADVVFLPHATQDNAHRAEATRRVLERLV LALGPLGPQAVQVGLLSYSHRPSPLFPLNGSHDLGIILQRIRDMPYMDPSGNNLGTAVVT AHRYMLAPDAPGRRQHVPGVMVLLVDEPLRGDIFSPIREAQASGLNVVMLGMAGADPEQL RRLAPGMDSVQTFFAVDDGPSLDQAVSGLATALCQASFTTQPRPEPCPVYCPKGQKGEPG EMGLRGQVGPPGDPGLPGRTGAPGPQGPPGSATAKGERGFPGADGRPGSPGRAGNPGTPG APGLKGSPGLPGPRGDPGERGPRGPKGEPGAPGQVIGGEGPGLPGRKGDPGPSGPPGPRG PLGDPGPRGPPGLPGTAMKGDKGDRGERGPPGPGEGGIAPGEPGLPGLPGSPGPQGPVGP PGKKGEKGDSEDGAPGLPGQPGSPGEQGPRGPPGAIGPKGDRGFPGPLGEAGEKGERGPP GPAGSRGLPGVAGRPGAKGPEGPPGPTGRQGEKGEPGRPGDPAVVGPAVAGPKGEKGDVG PAGPRGATGVQGERGPPGLVLPGDPGPKGDPGDRGPIGLTGRAGPPGDSGPPGEKGDPGR PGPPGPVGPRGRDGEVGEKGDEGPPGDPGLPGKAGERGLRGAPGVRGPVGEKGDQGDPGE DGRNGSPGSSGPKGDRGEPGPPGPPGRLVDTGPGAREKGEPGDRGQEGPRGPKGDPGLPG APGERGIEGFRGPPGPQGDPGVRGPAGEKGDRGPPGLDGRSGLDGKPGAAGPSGPNGAAG KAGDPGRDGLPGLRGEQGLPGPSGPPGLPGKPGEDGKPGLNGKNGEPGDPGEDGRKGEKG DSGASGREGRDGPKGERGAPGILGPQGPPGLPGPVGPPGQGFPGVPGGTGPKGDRGETGS KGEQGLPGERGLRGEPGSVPNVDRLLETAGIKASALREIVETWDESSGSFLPVPERRRGP KGDSGEQGPPGKEGPIGFPGERGLKGDRGDPGPQGPPGLALGERGPPGPSGLAGEPGKPG IPGLPGRAGGVGEAGRPGERGERGEKGERGEQGRDGPPGLPGTPGPPGPPGPKVSVDEPG PGLSGEQGPPGLKGAKGEPGSNGDQGPKGDRGVPGIKGDRGEPGPRGQDGNPGLPGERGM AGPEGKPGLQGPRGPPGPVGGHGDPGPPGAPGLAGPAGPQGPSGLKGEPGETGPPGRGLT GPTGAVGLPGPPGPSGLVGPQGSPGLPGQVGETGKPGAPGRDGASGKDGDRGSPGVPGSP GLPGPVGPKGEPGPTGAPGQAVVGLPGAKGEKGAPGGLAGDLVGEPGAKGDRGLPGPRGE KGEAGRAGEPGDPGEDGQKGAPGPKGFKGDPGVGVPGSPGPPGPPGVKGDLGLPGLPGAP GVVGFPGQTGPRGEMGQPGPSGERGLAGPPGREGIPGPLGPPGPPGSVGPPGASGLKGDK GDPGVGLPGPRGERGEPGIRGEDGRPGQEGPRGLTGPPGSRGERGEKGDVGSAGLKGDKG DSAVILGPPGPRGAKGDMGERGPRGLDGDKGPRGDNGDPGDKGSKGEPGDKGSAGLPGLR GLLGPQGQPGAAGIPGDPGSPGKDGVPGIRGEKGDVGFMGPRGLKGERGVKGACGLDGEK GDKGEAGPPGRPGLAGHKGEMGEPGVPGQSGAPGKEGLIGPKGDRGFDGQPGPKGDQGEK GERGTPGIGGFPGPSGNDGSAGPPGPPGSVGPRGPEGLQGQKGERGPPGERVVGAPGVPG APGERGEQGRPGPAGPRGEKGEAALTEDDIRGFVRQEMSQHCACQGQFIASGSRPLPSYA ADTAGSQLHAVPVLRVSHAEEEERVPPEDDEYSEYSEYSVEEYQDPEAPWDSDDPCSLPL DEGSCTAYTLRWYHRAVTGSTEACHPFVYGGCGGNANRFGTREACERRCPPRVVQSQGTG TAQD
3.2 Reverse Translation
According to the Central Dogma, DNA is transcribed into RNA and then translated into protein. Since each amino acid is encoded by a three-nucleotide codon, we can work backwards from a protein sequence to determine a possible DNA sequence.
For the partial collagen sequence shown previously:
Using NCBI, one possible nucleotide sequence that encodes this amino acid sequence is:
DNA sequence (one possible version):
1 aattcccaca aaccctgctg acttgacccc attggcccag acccctgttc cctgccactg
61 gatgagggct cctgcactgc ctacaccctg cgctggtacc atcgggctgt gacaggcagc
121 acagaggcct gtcacccttt tgtctatggt ggctgtggag ggaatgccaa ccgttttggg
181 acccgtgagc ctgcgagcgc cgctgcccac cccgggtgtc cagagccagg ggacaggtac
241 tgcccaggac tgaggcccag ataatgagct gagattcagc atcccctgga ggacgtcggg
301 gtctcagcag aaccccactg tccctcccct tggtgctaga ggcttgtgtg cacgtgagcg
361 tcggttgtgc agttcccgtt atttcagtga cttggtcccg tgggtctaac cttcccccct
421 gtggacaaac ccccattgtg gctccn
Explanation:
ATG → Methionine (M)
GGT → Glycine (G)
CCT → Proline (P)
CGT → Arginine (R)
Because the genetic code is degenerate (multiple codons can encode the same amino acid), this is only one possible DNA sequence. Many other nucleotide sequences could produce the exact same collagen protein segment.
3.3 Codon Optimization
After obtaining a possible DNA sequence from reverse translation, the next step is codon optimization. Although multiple DNA sequences can encode the same protein, different organisms prefer certain codons over others. This is known as codon bias. If a gene contains many codons that are rarely used in the host organism, protein production may be slow or inefficient. For this assignment, I chose to optimize the collagen sequence for Escherichia coli because it is widely used in biotechnology. E. coli grows quickly, is inexpensive to culture, and is commonly used for recombinant protein production. Using an online codon optimization tool, the DNA sequence was adjusted to:
Importantly, codon optimization does not change the amino acid sequence of the protein. It only changes the nucleotide sequence to improve expression in the chosen organism. By optimizing the codons for E. coli, the collagen gene would be more efficiently transcribed and translated, leading to higher protein yield.
3.4 You have a sequence! Now what?
Now that I have a codon-optimized DNA sequence for collagen, the next step is to produce the protein. One common method is a cell-dependent system. In this approach, the optimized DNA sequence is inserted into a plasmid (a small circular DNA molecule). The plasmid is then introduced into Escherichia coli cells through transformation. Once inside the cell:
Another option is a cell-free system. In this method, instead of using living cells, the DNA is added to a solution containing the necessary molecular machinery (ribosomes, enzymes, nucleotides, amino acids). The transcription and translation processes occur in a test tube, producing the protein directly. This method is faster and more controlled, but usually more expensive. In both cases, the DNA sequence follows the Central Dogma: DNA → RNA → Protein, resulting in the production of the collagen protein.
This project uses E. coli and collagen to create reproducible patterns that simulate organic components of ancient artifacts, such as textiles or adhesives. Collagen acts as a structural scaffold to hold proteins in place, while engineered E. coli produce proteins that form visible patterns. Automation ensures precision and repeatability, allowing us to study how these materials might degrade or be preserved over time, providing insights into experimental archaeology and conservation.
This design is inspired by traditional Inca geometric art, particularly the tocapu textile patterns of the Inca Empire. The composition features a symmetrical stepped cross motif enclosed within a square, referencing the Andean worldview and the symbolic structure of the Chakana (Andean cross). The use of straight lines and geometric repetition reflects the mathematical precision and cosmological symbolism characteristic of Inca visual culture. (https://opentrons-art.rcdonovan.com/?id=6ef1d0494o5n1p7)

A relevant example is the paper “An open-source automated platform for high-throughput RT-qPCR testing” developed during the COVID-19 pandemic. In this work, researchers used the Opentrons OT-2 liquid handling robot to automate RNA extraction and RT-qPCR setup.
The system enabled scalable, low-cost diagnostic testing by reducing manual pipetting steps, minimizing human error, and increasing reproducibility. This study demonstrated how open-source automation tools can expand access to molecular diagnostics, especially in resource-limited settings.
The novelty of this application lies in democratizing laboratory automation—allowing smaller labs to perform high-throughput testing without expensive proprietary systems.
LINK: https://pubmed.ncbi.nlm.nih.gov/34260637/
For my final project, I plan to use laboratory automation tools to develop controlled collagen-based biomaterials inspired by ancient Andean techniques. Collagen will serve as a structural matrix that mimics organic components found in archaeological artifacts such as textiles, adhesives, or composite materials. By using automated liquid handling, I aim to precisely control mixing ratios and spatial deposition of biological components within the collagen scaffold. This will allow the creation of reproducible material samples that can be used to study degradation processes, conservation strategies, or experimental archaeology models.
Automation ensures precision and repeatability, which are essential when comparing material behavior under different environmental conditions.
For my final project, I plan to use two main pieces of equipment to create collagen-based materials inspired by archaeological patterns. First, I will use the Opentrons OT-2 liquid handling robot. This robot allows precise mixing of liquids and can deposit the mixture in exact locations with consistent volumes. In my project, I will prepare different mixtures of collagen and pigments that mimic the colors and textures of ancient textiles or other organic components found in archaeological artifacts. The robot will then deposit these mixtures according to a predetermined pattern, such as geometric motifs inspired by Inca textiles. Using the robot ensures that each replica is precise and reproducible, allowing me to create multiple samples under the same conditions without human error.
Second, I will use 3D-printed holders or molds to support the materials during deposition. These molds will be designed to match the shape of specific archaeological patterns, such as squares or other geometric compartments. The robot will deposit the collagen mixtures into these molds, and once the collagen sets, the molds can be removed to reveal a precise replica of the intended pattern. This combination of automation and custom molds allows me to accurately reproduce complex designs and study how these materials behave, degrade, or can be conserved, providing a controlled and repeatable approach to experimental archaeology.

Answer any NINE of the following questions from Shuguang Zhang: (i.e. you can select two to skip)
1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)
To estimate how many amino acid molecules are in 500 grams of meat, we start with the fact that the average mass of one amino acid is about 100 Daltons.
One Dalton equals 1.66 x 10^(-24) grams. So, 100 Daltons equals:
100 x 1.66 x 10^(-24) = 1.66 x 10^(-22) grams per amino acid.
Now we divide the total mass of meat (500 grams) by the mass of one amino acid:
500 divided by 1.66 x 10^(-22) = 3 x 10^(24)
Therefore, 500 grams of meat contain approximately 3 x 10^(24) amino acid molecules.
This shows that even a small amount of food contains an enormous number of molecular building blocks.
2. Why do humans eat beef but do not become a cow, eat fish but do not become fish?
Humans eat beef or fish, but we do not become a cow or a fish because our bodies break down food into basic molecular components before using it.
When we eat meat, our digestive system breaks proteins into amino acids, fats into fatty acids, and carbohydrates into simple sugars. These small molecules are absorbed into the bloodstream and then reused by our cells to build human proteins, human tissues, and human cells according to our own DNA instructions.
The key reason we do not become what we eat is that our genetic information controls how these molecules are assembled. A cow’s DNA builds cow proteins and tissues, while human DNA builds human proteins and tissues. Even though the raw materials are similar, the instructions are different.
In short, we do not become a cow or a fish because our body does not copy their structure — it only reuses their molecular building blocks to maintain and build our own human body.
3. Why are there only 20 natural amino acids?
The reason there are 20 natural amino acids is that those 20 are enough to build all the proteins that life needs.
Think of amino acids like Lego pieces. You do not need thousands of different pieces to build something complex; with a well-designed set, you can create almost any structure. The 20 amino acids have different sizes, electrical charges, and properties (some are hydrophobic, others are hydrophilic, some are positive, others negative). This variety is enough for proteins to fold into many different shapes and perform many different functions.
Another important reason is evolution. Early in the origin of life, more types of amino acids may have existed. However, these 20 worked well together within the genetic system. Once the genetic code was established using these 20 amino acids, changing it would have been very risky for organisms. For that reason, the system remained stable.
In summary, there are 20 amino acids because this number provides enough chemical diversity to create the complexity of life, and evolution fixed this set as the standard.
4. Can you make other non-natural amino acids? Design some new amino acids
Yes, it is possible to create non-natural amino acids. Scientists can design new amino acids by modifying the part of the molecule known as the side chain, or R group. All amino acids share the same basic structure: an amino group, a carboxyl group, a hydrogen atom, and a variable side chain. The side chain is what determines the chemical properties of each amino acid. By changing this side chain, new amino acids with new properties can be created.
For example, one could design a modified version of phenylalanine by adding fluorine atoms to its aromatic ring. This change could make the amino acid more chemically stable and resistant to degradation, which would be useful in biomaterials. Another possibility would be designing a photo-responsive amino acid whose side chain changes shape when exposed to light. This could allow scientists to control protein activity using specific wavelengths of light. A third example could be a metal-binding amino acid with a side chain designed to strongly interact with metals such as copper or iron, which could be useful in environmental or material science applications.
Although natural organisms use only 20 standard amino acids, synthetic biology has made it possible to expand the genetic code. By engineering specialized transfer RNAs and modifying translation systems, researchers can incorporate non-natural amino acids into proteins. This allows the creation of proteins with entirely new properties that do not exist in nature.
In summary, non-natural amino acids can be designed by modifying the chemical structure of existing ones, particularly their side chains, enabling the development of new biological functions and materials.
5. Where did amino acids come from before enzymes that make them, and before life started?
Before life began, amino acids likely formed through natural chemical reactions on the early Earth. At that time, there were no enzymes or living cells. Instead, simple molecules such as water (H2O), methane (CH4), ammonia (NH3), hydrogen (H2), and carbon dioxide (CO2) were present in the atmosphere and oceans. Energy from lightning, volcanic activity, ultraviolet radiation from the sun, and geothermal heat provided the energy needed to drive chemical reactions between these small molecules.
In 1953, the Miller-Urey experiment showed that when simple gases thought to exist on early Earth were exposed to electrical sparks (simulating lightning), amino acids formed spontaneously. This demonstrated that the building blocks of proteins can arise from non-living chemical processes. In addition, amino acids have been found in meteorites, suggesting that some may have formed in space and arrived on Earth through asteroid impacts.
After amino acids were present, some of them began to link together, forming short chains called peptides. Over time, certain peptides may have developed the ability to speed up chemical reactions slightly. These primitive catalytic molecules would have provided an advantage, because they made useful reactions happen more efficiently.
Before true protein enzymes existed, many scientists believe that RNA molecules played an important role. RNA can both store information and act as a catalyst (these catalytic RNAs are called ribozymes). This idea is known as the “RNA world” hypothesis. Eventually, as biological systems became more complex, proteins replaced most RNA catalysts because proteins are more versatile and efficient. These protein catalysts are what we now call enzymes.
In summary, amino acids likely formed through natural chemical reactions powered by environmental energy sources. Over time, they combined into peptides, some of which gained catalytic abilities. Through gradual evolution—possibly beginning with catalytic RNA—modern enzymes eventually emerged, allowing life to develop increasingly complex biochemical systems.
6. If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?
If you make an alpha-helix using D-amino acids, you would expect it to form a left-handed helix.
In nature, proteins are made almost entirely from L-amino acids. These naturally form right-handed alpha-helices because of their specific three-dimensional geometry. The spatial arrangement of atoms in L-amino acids favors a right-handed twist when they fold into an alpha-helix structure.
D-amino acids are mirror images of L-amino acids. Because their geometry is reversed, the preferred helix direction is also reversed. As a result, a chain made entirely of D-amino acids would form a left-handed alpha-helix.
In summary, L-amino acids form right-handed alpha-helices, while D-amino acids form left-handed alpha-helices due to their mirror-image stereochemistry.
7. Can you discover additional helices in proteins?
Yes, additional helices can be discovered in proteins. Proteins are very flexible molecules that can fold into many different shapes depending on their amino acid sequence. While the alpha-helix is one of the most common helical structures found in nature, it is not the only possible one.
Scientists have identified other types of helices, such as the 3₁₀ helix and the pi-helix. These structures differ slightly in how tightly they are wound and in the pattern of hydrogen bonds that stabilize them. They are less common than the alpha-helix but still naturally occur in some proteins.
In addition, researchers in protein engineering and synthetic biology can design entirely new helical structures by changing amino acid sequences or by incorporating non-natural amino acids. Advances in computational tools and artificial intelligence now allow scientists to predict and design novel protein folds that may not exist in nature.
In summary, additional helices can be discovered or designed because protein structure depends on the chemical properties and arrangement of amino acids, and these combinations allow for many possible folding patterns.
8. Why are most molecular helices right-handed?
Most molecular helices are right-handed because the building blocks of life are not symmetrical. In living organisms, amino acids are almost always in the L-form, which has a specific three-dimensional orientation. This asymmetry, called chirality, influences how molecules fold and assemble.
When many L-amino acids link together to form a protein, their geometry naturally favors a right-handed twist when forming structures like the alpha-helix. The specific angles between chemical bonds and the way hydrogen bonds stabilize the structure make the right-handed version more stable for L-amino acids.
In general, once life selected L-amino acids as the standard building blocks, the structures that formed from them (such as protein helices) also followed a consistent handedness. This biological preference became universal because it was energetically favorable and evolutionarily fixed.
In summary, most molecular helices are right-handed because life uses L-amino acids, and their three-dimensional structure naturally leads to right-handed helical folding.
9. Why do β-sheets tend to aggregate?
Beta-sheets tend to aggregate because their structure allows strong and repeated interactions between neighboring protein strands. In a beta-sheet, the backbone of the protein forms many hydrogen bonds in a very regular and extended pattern. This creates flat surfaces that can easily align with other beta-strands from nearby molecules.
When these flat regions come close together, they can form additional hydrogen bonds between different protein molecules. This stacking effect is energetically favorable, meaning it lowers the system’s energy and makes aggregation more stable. Because the pattern of hydrogen bonding is repetitive and strong, beta-sheets can “zip up” with each other, leading to large aggregates.
This behavior is especially important in diseases like Alzheimer’s, where proteins misfold and form beta-sheet–rich aggregates known as amyloid fibrils. The beta-sheet structure makes it easy for many copies of the same protein to stick together in an ordered way.
In summary, beta-sheets tend to aggregate because their flat, hydrogen-bonded structure allows them to align and form stable intermolecular interactions with other beta-sheets, promoting stacking and aggregation.
The main driving force for beta-sheet aggregation is the formation of hydrogen bonds between protein backbones, combined with hydrophobic interactions.
In a beta-sheet structure, the protein backbone is extended and forms hydrogen bonds in a very regular pattern. When multiple beta-strands from different protein molecules come close together, they can form additional hydrogen bonds between each other. This creates a very stable, repetitive “zipper-like” structure.
At the same time, many beta-sheet–forming regions contain hydrophobic (water-repelling) amino acids. When these hydrophobic surfaces are exposed to water, it is energetically unfavorable. Aggregation helps bury these hydrophobic regions away from water, which lowers the overall energy of the system. This hydrophobic effect strongly promotes aggregation.
So, the driving forces are:
In summary, beta-sheet aggregation is driven by strong hydrogen bonding and hydrophobic interactions that stabilize stacked protein structures.
1. Briefly describe the protein you selected and why you selected it
I selected collagen as the protein for this assignment. Collagen is a structural protein that is the main component of connective tissues such as skin, tendons, cartilage, and bone. It is especially important in archaeology because collagen is one of the primary organic materials preserved in ancient bones, textiles, and artifacts. Archaeologists often analyze collagen to study diet, radiocarbon dating, and preservation conditions.
Collagen has a unique three-dimensional structure known as a triple helix, formed by three polypeptide chains tightly wound around each other. This structure gives collagen its strength and stability. I selected collagen because it is directly relevant to archaeological research and material preservation, and its distinctive structure makes it an excellent example for understanding how protein structure relates to function and long-term stability.
2. Identify the amino acid sequence of your protein.
The length of the protein is: 2993 aminoacids. The most common amino acid is: G, which appears 627 times.
To determine the number of homologs, I used UniProt’s BLAST tool to search for sequences similar to human collagen type I (COL1A1). The BLAST results showed thousands of homologous protein sequences across many different organisms, particularly vertebrates such as mammals, birds, reptiles, and fish.
Collagen is a highly conserved structural protein, meaning its sequence has remained relatively similar throughout evolution. Because it plays a critical role in connective tissues such as bone and skin, it is present in nearly all multicellular animals. As a result, BLAST identifies a very large number of homologs with significant sequence similarity.
This high number of homologous sequences reflects the essential structural role of collagen and its evolutionary conservation across species.
Yes, collagen Type I alpha 1 chain (COL1A1) belongs to the collagen protein family. More specifically, it is part of the fibrillar collagen family.
Collagens are a large family of structural proteins that form the extracellular matrix in connective tissues. They share a characteristic triple-helix structure composed of three polypeptide chains and a repeating Gly-X-Y amino acid sequence, where glycine appears every third residue. This repeating pattern is essential for forming the stable triple helix.
Within the collagen superfamily, Type I collagen belongs to the fibrillar collagens, which also include types II, III, V, and XI. These collagens form rope-like fibers that provide tensile strength to tissues such as bone, skin, and tendons.
In summary, COL1A1 is a member of the collagen superfamily and specifically part of the fibrillar collagen family, which is responsible for structural support in connective tissues.
3. Identify the structure page of your protein in RCSB
One representative collagen triple-helix structure was solved using X-ray crystallography with a resolution of approximately 2.7 Å. A resolution of 2.7 Å is considered good quality, since smaller values indicate higher structural detail. At this resolution, atomic positions can be determined with reasonable accuracy. LINK: https://www.rcsb.org/structure/1CAG
Yes. In addition to the collagen protein chains, X-ray crystallography structures often include water molecules and sometimes small ions or stabilizing molecules. These are commonly found in crystal structures because they help stabilize packing interactions in the crystal lattice.
Yes. The collagen triple helix belongs to the fibrous protein structural family. Collagens are classified as structural proteins with a unique triple-helix motif, consisting of three polypeptide chains wound around each other. This distinct arrangement classifies collagen as a structural superfamily separate from globular proteins.
4. Open the structure of your protein in any 3D molecule visualization software:
When visualized as cartoon and ribbon representations, the structure clearly shows the characteristic triple-helix arrangement of collagen. The protein consists of three polypeptide chains tightly wound around each other. When displayed in ball-and-stick representation, individual atoms and the repeating Gly-X-Y pattern can be clearly observed.
When colored by secondary structure, the protein shows predominantly helical structure. Collagen does not form beta-sheets like many globular proteins. Instead, it forms a unique triple-helix composed of three left-handed helices wrapped into a right-handed superhelix. Therefore, the structure contains more helical content and essentially no beta-sheets.
When colored by residue type, a clear pattern appears. Glycine residues are distributed regularly throughout the structure because glycine occurs every third position in collagen (Gly-X-Y repeat). Many hydrophobic residues are buried toward the interior of the triple helix, contributing to structural stability, while more hydrophilic residues are exposed toward the solvent. This distribution supports structural integrity and interaction with the extracellular environment.
When visualizing the surface of the protein, collagen does not display deep binding pockets like many enzymes. Since it is a structural fibrous protein rather than a globular enzyme, it lacks large internal cavities or active-site pockets. Instead, the surface is elongated and repetitive, consistent with its mechanical structural role.
For this project, I selected the collagen triple helix model (PDB ID: 1CAG). Collagen is a structural protein that forms the extracellular matrix in connective tissues such as bone and skin. I chose collagen because of its biological relevance and its characteristic Gly-X-Y repeating motif, which is essential for triple-helix formation. As an archaeologist interested in biomaterials and preservation, collagen is particularly meaningful due to its importance in bone structure and archaeological remains.
1. Deep Mutational Scans
Using ESM2, I generated an unsupervised deep mutational scan of the collagen triple helix model (PDB ID: 1CAG) based on language model likelihood scores. The heatmap reveals clear positional constraints across the sequence.
A particularly striking pattern appears at glycine positions. For example, at position 13, glycine shows a strongly positive score (yellow), while most alternative amino acids show strongly negative scores (blue/purple). This indicates that substitutions at this position are highly unfavorable.
This pattern reflects the structural constraint of the Gly-X-Y repeating motif characteristic of collagen. Glycine is required every third residue to allow tight packing of the triple helix. Replacing glycine with a bulkier amino acid would introduce steric clashes and destabilize the structure. The language model successfully captures this evolutionary constraint without being explicitly trained on structural data.
Overall, the deep mutational scan demonstrates that structurally critical residues, particularly glycine, are strongly conserved and intolerant to mutation.

Project Objective
Phase 1: Mapping the DnaJ Interaction Interface
Since the exact binding interface between the L protein and DnaJ is unknown, the first step is to identify it computationally rather than introducing arbitrary mutations.
This phase defines which residues are critical for interaction and should not be mutated randomly.
Phase 2: Targeted N-Terminal Redesign
Instead of deleting regions or performing extensive random substitutions, introduce controlled chemical modifications to disrupt interaction while preserving structural stability.
Focus on charge inversion strategies:
Disrupt hydrophobic interaction patches:
Generate a graded library of variants:
This creates a Pareto front of variants balancing reduced DnaJ interaction and preserved protein stability.
Phase 3: Stability and Functional Filtering
To ensure that redesigned variants remain structurally viable and functionally relevant:
Use Rosetta or FoldX to calculate ΔΔG and verify that mutations do not destabilize the overall protein fold.
Confirm that mutations in the N-terminal region do not propagate structural stress toward the C-terminal lysis domain.
Perform co-evolutionary analysis (e.g., EVcouplings):
Evaluate aggregation propensity using tools such as Aggrescan3D to ensure that mutations do not create exposed hydrophobic patches leading to cytoplasmic aggregation.
Assess sequence plausibility using protein language models such as ESM to filter out unlikely or non-natural variants.
Key Limitations