Week 2 HW: DNA Read, Write, & Edit

Homework — DUE BY FEB 17 2PM MIT TIME

👨‍🦰Part 0: Basics of Gel Electrophoresis

Keypoint: Gel Electrophoresis: Used for separating, identifying, and purifying fragments of DNA, RNA, or proteins.

Gel Preparation: Add agarose powder to the buffer, heat until melted, pour the solution into the gel tray, insert the comb, and allow it to cool and solidify.

Sample Loading: Remove the comb, place the gel into the electrophoresis tank, and add buffer until the gel is covered. Mix the DNA sample with loading buffer, then load the mixture into the wells.

Electrophoresis: Connect the power supply, set the voltage, and start running the gel. The tracking dye (e.g., bromophenol blue) can be seen moving downward with the naked eye.

Staining and Visualization: After electrophoresis, stain the gel by immersing it in a staining solution (e.g., nucleic acid dye), or add the dye to the gel beforehand during preparation. Finally, observe the bands under a UV light or a blue light transilluminator.

👲Part 1: Benchling & In-silico Gel Art

cover image

🧒Part 2: Gel Art - Restriction Digests and Gel Electrophoresis（Optional (for those with Lab access)）

🎅Part 3: DNA Design Challenge

3.1. Choose your protein.

Photosystem II：The structural analysis of Photosystem II (PSII) is of profound significance and holds substantial future value, primarily in three key areas: fundamentally understanding the water-splitting mechanism, elucidating the processes of its own biogenesis and repair, and inspiring the development of next-generation bio-inspired energy technologies.

Fundamental Understanding of the Water-Splitting Mechanism

The primary significance of PSII structural studies lies in unraveling how nature performs the energy-demanding and chemically complex reaction of water oxidation.

Atomic-Level Resolution of the Catalytic Core: Recent breakthroughs, such as the 1.7 Å resolution cryo-EM structure of PSII, have allowed scientists to visualize for the first time the positions of hydrogen atoms and the detailed water network within this massive membrane complex . This level of detail is crucial because it reveals how water molecules are channeled to the catalytic Mn₄CaO₅ cluster and how protons are guided out after water is split . Understanding these precise pathways is essential for comprehending the enzyme’s near-perfect efficiency（Hussein et al., 2024）.

Hussein, R., Graça, A., Forsman, J., Aydin, A.O., Hall, M., Gaetcke, J., & Schröder, W.P. (2024). Cryo–electron microscopy reveals hydrogen positions and water networks in photosystem II. Science, 384(6702), 1349-1355.

Capturing Reaction Dynamics: Beyond static snapshots, research is now focused on the dynamic process. For instance, serial femtosecond crystallography (SFX) using XFELs has enabled the capture of intermediate states (like the S₂ and S₃ states) in the catalytic cycle, revealing structural changes during the O-O bond formation . Furthermore, studies on specific mutants, such as the D2-Lys317Ala substitution, have shown how alterations in the hydrogen-bonding network can disrupt proton egress and slow down oxygen release, providing direct experimental evidence for the role of specific amino acids and channels.

Flesher, D.A., Shin, J., Debus, R.J., & Brudvig, G.W. (2025). Structure of a mutated photosystem II complex reveals changes to the hydrogen-bonding network that affect proton egress during O–O bond formation. Journal of Biological Chemistry, 301(3).

Elucidating Biogenesis, Repair, and Regulation

PSII is uniquely vulnerable to light-induced damage, particularly its D1 reaction center protein. Understanding how it is repaired is a research area of immense biological importance.

Unveiling the Repair Cycle: Structural biology has been pivotal in revealing the assembly and repair mechanisms of PSII. For example, research on green algae (Chlamydomonas reinhardtii) has solved the structures of four PSII-repair intermediates associated with the protein TEF30. These near-atomic resolution structures provide a working model for how different modules are reassembled during the mid-to-late stages of the repair cycle, a process vital for sustaining oxygenic photosynthesis under constant light stress（Wang et al., 2025）.

Wang, Y., Wang, C., Li, A., & Liu, Z. (2025). Roles of multiple TEF30-associated intermediate complexes in the repair and reassembly of photosystem II in Chlamydomonas reinhardtii. Nature Plants, 11(7), 1455-1469.

A Model System for Membrane Proteins: PSII is proving to be an excellent system for studying the general principles of how large, multi-subunit membrane protein complexes are assembled and maintained in the thylakoid membrane. Insights from PSII repair, such as the synchronization of chlorophyll synthesis with protein synthesis, have broader implications for cell biology and plant physiology(Komenda et al., 2024).

Komenda, J., Sobotka, R., & Nixon, P. J. (2024). The biogenesis and maintenance of PSII: recent advances and current challenges. The Plant Cell, 36(10), 3997-4013.

Future Value: Bio-inspired and Semi-Artificial Applications

The knowledge gained from PSII structures is a treasure trove for bioengineers and chemists aiming to create sustainable technologies. The future value lies in translating this biological blueprint into real-world applications.

Blueprint for Artificial Catalysts: The main barrier to scalable renewable energy, such as producing hydrogen as a fuel, is the reliance on rare and expensive metals (like platinum) to split water. PSII achieves this using cheap and abundant manganese and calcium . By understanding the precise atomic structure and mechanism of the oxygen-evolving complex, scientists hope to design synthetic catalysts that mimic nature’s solution for efficient water oxidation with earth-abundant materials(Hussein et al., 2024).

Creating Semi-Artificial Photosynthetic Devices: A more direct application is the integration of isolated PSII proteins into bio-photoelectrochemical cells. A landmark study has successfully created a scalable “artificial leaf” by spray-coating PSII from spinach onto a specially designed protonated macroporous carbon nitride (MCN) support . This large-area photoanode (33 cm²) generated milliampere-level photocurrents with nearly 100% faradaic efficiency for oxygen production. The device was stable enough to power an LED when eight units were connected in series, demonstrating the potential of PSII-based biophotovoltaics for powering low-consumption electronic devices(Zhang et al.,2025).

Zhang, H., Tian, W., Lin, J., Zhang, P., Shao, G., Ravi, S. K., … & Wang, S. (2025). Photosystem II‐Carbon Nitride Photoanodes for Scalable Biophotoelectrochemistry. Advanced Materials, e08813.

https://www.uniprot.org/blast

sp|Q39195|PST2_ARATH Photosystem II 5 kDa protein, chloroplastic OS=Arabidopsis thaliana OX=3702 GN=PSBT PE=1 SV=2 MASMTMTATFFPAVAKVPSATGGRRLSVVRASTSDNTPSLEVKEQSSTTMRRDLMFTAAA AAVCSLAKVAMAEEEEPKRGTEAAKKKYAQVCVTMPTAKICRY

cover image

3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.

reverse translation of sample sequence to a 309 base sequence of most likely codons. atggcgagcatgaccatgaccgcgaccttttttccggcggtggcgaaagtgccgagcgcg accggcggccgccgcctgagcgtggtgcgcgcgagcaccagcgataacaccccgagcctg gaagtgaaagaacagagcagcaccaccatgcgccgcgatctgatgtttaccgcggcggcg gcggcggtgtgcagcctggcgaaagtggcgatggcggaagaagaagaaccgaaacgcggc accgaagcggcgaaaaaaaaatatgcgcaggtgtgcgtgaccatgccgaccgcgaaaatt tgccgctat

reverse translation of sample sequence to a 309 base sequence of consensus codons. atggcnwsnatgacnatgacngcnacnttyttyccngcngtngcnaargtnccnwsngcn acnggnggnmgnmgnytnwsngtngtnmgngcnwsnacnwsngayaayacnccnwsnytn gargtnaargarcarwsnwsnacnacnatgmgnmgngayytnatgttyacngcngcngcn gcngcngtntgywsnytngcnaargtngcnatggcngargargargarccnaarmgnggn acngargcngcnaaraaraartaygcncargtntgygtnacnatgccnacngcnaarath tgymgntay

3.3. Codon optimization.

1 ATGGCATCTA TGACTATGAC TGCTACATTC TTTCCTGCTG TAGCGAAGGT ACCAAGTGCT ACTGGGGGTA 71 GAAGGCTTAG CGTTGTTCGA GCGTCGACTT CGGATAACAC ACCTTCCTTA GAGGTGAAGG AGCAGTCATC 141 CACTACCATG AGAAGAGATC TGATGTTCAC TGCTGCTGCA GCAGCCGTAT GTTCCTTGGC CAAAGTCGCA 211 ATGGCTGAGG AAGAAGAACC TAAGAGAGGA ACTGAGGCGG CTAAGAAGAA GTATGCCCAA GTTTGTGTTA 281 CGATGCCTAC CGCGAAGATA TGCCGATAC

3.4. You have a sequence! Now what?

What technologies could be used to produce this protein from your DNA? Describe in your words the DNA sequence can be transcribed and translated into your protein. You may describe either cell-dependent or cell-free methods, or both.

Cell-Dependent Methods (In Vivo) This is the most common approach, where we insert our gene of interest into a host organism, turning it into a tiny protein factory.

Recombinant DNA & Cloning: The first step is to insert your gene of interest into a small, circular piece of DNA called a plasmid. This plasmid acts as a vector or delivery vehicle. It’s engineered to contain all the necessary control elements for the host cell to read the gene: a promoter (to start transcription), a ribosome binding site (to start translation), and often a selectable marker (like an antibiotic resistance gene) to help us find cells that have taken up the plasmid.

Transformation/Transfection: This recombinant plasmid is then introduced into the host cells. For bacteria like E. coli, this is called transformation. For animal cells, it’s often called transfection.

Selection and Growth: The host cells are grown on a special medium (e.g., containing an antibiotic). Only the cells that successfully took up the plasmid will survive and grow, forming colonies. Each colony is a clone of cells all producing your protein.

Induction and Harvesting: Once we have a large culture of these cells, we can add a chemical to induce the promoter, turning on high-level production of our target protein. After the cells have grown and produced the protein, they are harvested, and the protein is purified away from all the host cell’s components.

Common Host Organisms:

E. coli (Bacteria): The workhorse of the industry. It’s fast, cheap, and easy to grow. Best for simple proteins that don’t require complex modifications.

Yeast (e.g., S. cerevisiae): A single-celled fungus that is also easy to grow but can perform some more complex protein processing tasks than bacteria.

Mammalian Cells (e.g., CHO cells): The gold standard for complex human therapeutic proteins (like antibodies). They can perform all the necessary human-like modifications (like glycosylation) to make the protein fully functional and safe.

Insect Cells: A good middle-ground, using a virus (baculovirus) to infect insect cells, which then produce the protein. They offer more complex processing than yeast but are easier to handle than mammalian cells.

Cell-Free Methods (In Vitro) These systems produce proteins without using living cells. Instead, they use the cellular machinery (ribosomes, tRNAs, enzymes) extracted from cells.

How it works: A cell lysate is created by breaking open cells (like E. coli, wheat germ, or rabbit reticulocytes) and removing the cell debris. What’s left is a “soup” containing all the components needed for transcription and translation: ribosomes, amino acids, tRNA, and energy-generating molecules. To this soup, you add your DNA template (containing your gene) and the necessary nucleotides.

Transcription and Translation: If you add a DNA template, the system will begin transcribing it into mRNA and immediately translating that mRNA into protein, all in the same test tube.

Advantages:

Speed: Protein production can happen in hours, not days or weeks.

Toxicity: You can produce proteins that would be toxic to a living cell, as there’s no cell to kill.

Simplicity: It bypasses the need for cloning, transformation, and maintaining cell cultures.

Labeling: It’s very easy to add modified amino acids (e.g., with fluorescent tags) for research purposes.

3.5. [Optional] How does it work in nature/biological systems?

Alternative Splicing This is the most common and well-studied mechanism, particularly in complex eukaryotes like humans.

The Basic Process: Genes in eukaryotic cells contain coding sequences called exons and non-coding intervening sequences called introns. When a gene is transcribed, the entire region (both exons and introns) is copied to create a pre-mRNA molecule. Before this pre-mRNA can be used to make a protein, the introns must be removed and the exons joined together in a process called splicing.

The Alternative Part: In alternative splicing, the cell’s splicing machinery doesn’t always join the exons together in the same way. It can selectively include or exclude different exons from the final, mature mRNA molecule.

Imagine a gene with exons 1, 2, 3, and 4.

In one cell type, splicing might join all four exons: Exon 1 - Exon 2 - Exon 3 - Exon 4. This creates mRNA “Version A,” which codes for Protein A.

In another cell type, or at a different developmental stage, the splicing machinery might skip exon 2: Exon 1 - Exon 3 - Exon 4. This creates mRNA “Version B,” which codes for a different Protein B.

It could also include an extra exon (Exon 2a) that isn’t always used, leading to Protein C.

Examples:

The DSCAM gene in fruit flies can generate over 38,000 different mRNA isoforms through alternative splicing!

The Calcitonin/CGRP gene produces a hormone (calcitonin) in the thyroid gland and a neuropeptide (CGRP) in the brain by using different sets of exons.

Alternative Promoters A gene can have more than one promoter site, which is the “start here” signal for RNA polymerase to begin transcription.

The Mechanism: Depending on which promoter is used, transcription will start at a different point in the gene. This can lead to pre-mRNAs that have different “first exons.”

The Result: These different starting points can result in mature mRNAs with different 5’ ends. This often means the resulting proteins will have different N-termini (the beginning of the protein). This can affect where the protein is located within the cell or what its function is.

Alternative Polyadenylation At the end of transcription, the pre-mRNA is cleaved, and a string of adenine nucleotides (the poly-A tail) is added to the 3’ end. This process is called polyadenylation and is signaled by a specific sequence in the RNA called the polyadenylation signal.

The Mechanism: Some genes have multiple polyadenylation signals. If the cell’s machinery uses the first signal, it will cleave the RNA there, resulting in a shorter mRNA. If it uses a downstream signal, it will produce a longer mRNA.

The Result: This affects the 3’ end of the mRNA. Since the 3’ untranslated region (3’ UTR) often contains signals for mRNA stability, localization, and how efficiently it’s translated, different polyadenylation choices can dramatically affect how much protein is made and where. In some cases, it can also alter the very end of the protein-coding sequence itself.

👩‍🦰Part 4: Prepare a Twist DNA Synthesis Order

4.1. Create a Twist account and a Benchling account √

4.2. Build Your DNA Insert Sequence

Copy：https://benchling.com/s/seq-GWC7bWlMPkMgqEnihkxn?m=slm-ny5MpJ1N9FsOATmyQMAl

De novo design：https://benchling.com/s/seq-PmLRVhHnWcUpDyCXJjHa?m=slm-TapXq6UoRnBTOtZCWyOT

Promoter： Arabidopsis thaliana chloroplast psbA gene promoter This is a core promoter region of approximately 620 bp upstream from the start codon ATG (containing the -35 box and -10 box regions). text

ATTGCTTGAT TTAATTTTTC AATTTTCTTG TTTTTATTTT GAATAAAGGA AAATAAATAA AAATAAATAA AATTTTTTTA AAAAGAATTT AATTTTCTAA CTTTTTTTAT TTTATCAACA AAAATATCTT ATTTTATTTC GATTTTATTT AGATTTTAGT ATCTATTTTT GGTTGATATA TATGGTTTTA TATTTGATAG GTATATTTGT TTTGATTGAA ATTTTCTGAA AAATATTTTT AAATAAATGA TTATTCTTTT CTCTCTAGAT CTTATATGTA GAATCTTTAT ATTTTGATAA TATTTTTTGA TTTTGATTTT TGTTTGTTTG TTTTTTATAC ATATATTTTT GGGGATTTTT TTTTTGTTTT TCAATTTCAA TTTCTCTAGA AAAAAGAGGA GAAAATTAAT ATG

RBS (Ribosome Binding Site)：AGGAGG

Coding Sequence (your codon optimized DNA for a protein of interest, psii for example):

Mixed codons (CAT and CAC) are used to avoid long repetitive sequences, facilitating synthesis and cloning stability.

CAT CAC CAT CAC CAT CAC CAC Length: 21 bp

Stop Codon: TAA

Terminator (BBa_B0015): CCAGGCATCAAATAAAACGAAAGGCTCAGTCGAAAGACTGGGCCTTTCGTTTTATCTGTTGTTTGTCGGTGAACGCTCTCTACTAGAGTCACACTGGCTCACCTTCGGGTGGGCCTTTCTGCGTTTATA

💡 Design Strategy: Screening for Strong Terminators from the Chloroplast Genome

In higher plant chloroplasts, the transcription termination mechanism is similar to that of prokaryotic systems, typically relying on a stem-loop structure located at the 3’ end of a gene. You can construct an efficient standard part by following the two steps below:

Identifying Candidate Sequences: The transcript of the psbA gene (encoding the PSII core protein D1) in the chloroplast is highly abundant and stable, and its 3’ UTR usually contains efficient termination and processing signals. You can obtain the complete chloroplast genome sequence of Arabidopsis thaliana from public databases like NCBI (e.g., GenBank accession: NC_000932.1), then locate the psbA gene and extract its 3’ UTR region (approximately 100-200 bp) as the core candidate sequence.
Engineering for High Efficiency: To pursue near 100% termination efficiency, you can refer to the design logic of BBa_B0015 and construct a dual-terminator tandem element:

First Unit: Clone the 3’ UTR of the psbA gene from the Arabidopsis chloroplast.
Second Unit:Clone another strong terminator, such as the 3’ UTR of the chloroplast rps16 gene or a strong termination signal from other chloroplast genes.
Combination:Link these two units in tandem with a short spacer sequence. This “belt-and-suspenders” structure can maximally prevent read-through by RNA polymerase.

PDF:content/homework/week-02-hw-dna-read-write-and-edit/constitutive_psII_Arabidopsis-thaliana-sequence.pdf FASTA:content/homework/week-02-hw-dna-read-write-and-edit/constitutive_psii_arabidopsis-thaliana.fasta

4.3. On Twist, Select The “Genes” Option √

4.4. Select “Clonal Genes” option√

Keypoints：An advantage is If designed with the appropriate exonuclease protection, gene fragments can be used directly in cell-free expression. 4.5. Import your sequence√

content/homework/week-02-hw-dna-read-write-and-edit/constitutive_sfGFP_his_tag.gb

building your first plasmid!√

content/homework/week-02-hw-dna-read-write-and-edit/first plasmid.png

🤴Part 5: DNA Read/Write/Edit

5.1 DNA Read (i) What DNA would you want to sequence (e.g., read) and why? This could be DNA related to human health (e.g. genes related to disease research), environmental monitoring (e.g., sewage waste water, biodiversity analysis), and beyond (e.g. DNA data storage, biobank).

If I were to explore the possibility of extraterrestrial life and its evolution through DNA sequencing, I would focus on the following DNA targets and contexts, each offering unique insights into how life might arise and adapt beyond Earth:

DNA from Extraterrestrial Samples (e.g., Mars, Europa, Enceladus) What to sequence: Any organic or genetic material recovered from soil, ice plumes, or subsurface oceans of celestial bodies.

Why:

To determine if life elsewhere uses the same genetic code (DNA/RNA) or something entirely novel.

To compare sequences with Earth life to test theories of panspermia (whether life spreads via meteorites) or convergent evolution (whether life independently evolves similar solutions).

To identify biosignatures—patterns in DNA that indicate biological activity, such as non-random sequence complexity or metabolic genes.

Extremophile Genomes (Earth Analogs for Space Environments) What to sequence: Complete genomes of organisms like Deinococcus radiodurans (radiation-resistant), Tardigrades (space-tolerant), or psychrophiles (cold-loving) from Antarctica.

Why:

These organisms serve as models for how life might survive in space or on harsh planets like Mars (low pressure, radiation, cold).

Their DNA repair mechanisms, desiccation tolerance genes, and metabolic pathways can be compared with hypothetical extraterrestrial life to predict survival strategies.

Ancient or “Shadow Biosphere” DNA on Earth What to sequence: Environmental DNA (eDNA) from extreme, isolated niches (e.g., deep subsurface mines, high-altitude lakes, or Atacama Desert soils).

Why:

To search for a “second genesis” of life on Earth—organisms with different biochemistry or genetic codes—which would profoundly impact how we search for life elsewhere.

To understand the limits of life’s evolutionary paths and identify universal constraints that might apply anywhere in the cosmos.

Synthetic DNA for Life-Detection Instruments What to sequence: Engineered DNA sequences designed as controls or standards for space missions (e.g., the Signatures of Life Detector on a rover).

Why:

To calibrate instruments (like nanopore sequencers) for detecting non-standard or damaged DNA that might be found on other planets.

To test whether our detection methods are biased toward Earth-like life, ensuring we don’t miss “weird” life with different base pairs or chirality.

Genomes of Organisms in Simulated Space Environments (ISS or Lab) What to sequence: DNA of bacteria, fungi, or plants exposed to microgravity, cosmic radiation, or Mars-like conditions on the International Space Station or in simulation chambers.

Why:

To study real-time evolutionary adaptation to space conditions.

To identify mutations or horizontal gene transfer events that occur under extraterrestrial stress, revealing how life might evolve during interplanetary travel.

Universal Genetic Code Variations (Bioinformatics) What to sequence: Not physical DNA, but in silico simulations of genetic codes and proteins that could function in exotic solvents (e.g., methane or ammonia) or at extreme temperatures.

Why:

To expand our concept of “possible life” beyond carbon-water-DNA constraints.

To guide the search for alien genes by predicting what sequences might look like in environments like Titan’s hydrocarbon lakes.

(ii) For exploring extraterrestrial life and its evolution, I would choose Oxford Nanopore Technologies (ONT) sequencing, a third-generation sequencing platform. Here’s a detailed breakdown addressing your questions:

Technology Selection and Rationale Oxford Nanopore Technologies (ONT) sequencing is the ideal choice for extraterrestrial life exploration

Oxford Nanopore Technologies (ONT) sequencing Generation Third-generation (single-molecule, long-read sequencing) Input Extracted DNA from extraterrestrial samples (soil, ice, plumes, etc.) Output Real-time electrical current signals converted to base sequences (FAST5 files)

5.2 DNA Write This is a creative and fascinating idea—essentially engineering a living biomaterial inspired by both the fictional character Baymax (from Big Hero 6) and the real-life sea slug Costasiella kuroshimae (commonly known as “Leaf Sheep” or “Solar-Powered Sea Slug”). The leaf sheep is one of the few animals capable of kleptoplasty—it steals chloroplasts from the algae it eats and incorporates them into its own cells, enabling it to photosynthesize for months.

If I were to synthesize DNA for a “Baymax-like self-healing, photosynthetic biomaterial,” it would involve designing a synthetic genetic circuit that could be introduced into a compatible host (e.g., mammalian cells, skin cells, or even a cell-free system) to create a living material with the following properties:

Self-powering via photosynthesis (like the leaf sheep)

Self-healing (like Baymax’s inflatable skin)

Biocompatible and responsive to the body

🧬 DNA to Synthesize: A Photosynthetic & Self-Healing Genetic Circuit I would synthesize a multi-gene synthetic construct containing the following modules:

Module Gene(s) Function

Photosynthesis Module psbA, psbD, rbcL, rbcS Enables light capture, electron transport, and carbon fixation (chloroplast function)
Self-Healing / Repair Module DPS (DNA protection during starvation), sodB (superoxide dismutase), katE (catalase) Protects cells from oxidative damage during light exposure; promotes tissue repair
Adhesion & Matrix Module COL1A1 (human collagen), FN1 (fibronectin) Provides structural scaffold for tissue integration and healing
Regulatory / Synthetic Circuit Light-inducible promoter (e.g., pDawn), GFP reporter Allows photosynthesis genes to be activated only in the presence of light

🔬 Full DNA Sequence Concept (Simplified Example) Here is a simplified, conceptual DNA sequence combining parts of the above ideas. It includes: A light-inducible promoter (pDawn system: YtvA + FixJ) The psbA gene (PSII core protein) for photosynthesis The DPS gene for oxidative stress protection A collagen fragment for tissue integration A terminator (BBa_B0015)

🧠 Why Synthesize This DNA?

Baymax-Inspired Self-Healing Material Baymax’s skin is soft, inflatable, and can repair itself. By incorporating collagen and fibronectin genes, the material could integrate with human tissue and promote wound healing. The DPS and catalase genes would protect cells from oxidative stress (common in damaged tissue), enabling longer-lasting repair.
Photosynthesis for Self-Powering (Leaf Sheep Model) The leaf sheep is a solar-powered animal. If we can engineer mammalian cells (or a skin substitute) to stably incorporate and maintain functional chloroplasts (via genes like psbA and rbcL), the material could generate its own energy from light—reducing the need for external power or nutrient supply in medical implants or wearables.
Potential Applications Medical Implants: Self-healing, light-powered skin grafts or patches for chronic wounds. Wearable Biosensors: Living tattoos that change color in response to inflammation or UV exposure. Space Exploration: Living materials for astronauts that require minimal resources (just light and water). Eco-Friendly Biomaterials: Photosynthetic fabrics or coatings that capture CO₂ and produce oxygen.

Next Steps for Synthesis If Twist Bioscience were to synthesize this, I would: Codon-optimize each gene for the target host (e.g., human cells or E. coli for prototyping). Add RBS, linkers, and terminators between modules. Clone into a delivery vector (e.g., lentivirus for mammalian cells or plasmid for bacterial expression). Test in a chassis like E. coli first to verify photosynthesis and oxidative protection, then move to mammalian cell lines.

For synthesizing the complex, multi-gene “Baymax-Meets-Leaf-Sheep” DNA construct, I would recommend a hybrid approach that leverages the strengths of different synthesis technologies. Given the length (~2,000+ bp), complexity (multiple genes from different sources), and the goal of creating a functional genetic circuit, the optimal strategy is:

High-throughput silicon-based DNA synthesis (e.g., Twist Bioscience platform) for fragment generation, followed by enzymatic assembly (e.g., Gibson Assembly or Golden Gate) for final construct assembly.

Technology Selection and Why Primary Technology: Silicon-Based High-Throughput DNA Synthesis (e.g., Twist Bioscience)

Why: Construct is large and contains multiple genes (psbA, DPS, COL1A1, etc.) with varying GC content and potential secondary structures. Traditional column-based synthesis would be slow, expensive, and error-prone for this complexity . Twist’s platform miniaturizes the chemical synthesis (phosphoramidite chemistry) by performing reactions in nanowells on a silicon chip . This allows for the parallel synthesis of thousands of oligos at once, dramatically increasing throughput and reducing cost . They can routinely synthesize oligonucleotides up to 500 nt in length, which serve as the building blocks for larger genes.

Generation: This is a first-generation (chemical) method but with a modern, high-throughput twist. The core chemistry is the established phosphoramidite method developed in the 1980s , but the delivery system (silicon chip) is a revolutionary 21st-century innovation that solves scalability issues .

Secondary Technology: Enzymatic DNA Assembly (e.g., Gibson Assembly® or Golden Gate Assembly)

Why: The 500 nt fragments from the chip need to be stitched together to create your final multi-gene construct (~2-5 kb). Enzymatic assembly methods are ideal for this. They use enzymes to simultaneously join multiple DNA fragments with overlapping ends in a single reaction . This is far more efficient than using restriction enzymes and ligase.

Essential Steps of the Chosen Method The workflow combines the synthesis steps with the assembly steps.

Part A: DNA Synthesis (The “Writing” of Fragments) Sequence Design and Upload: You provide the digital DNA sequences for your photosynthetic module, repair module, etc., to the synthesis provider (e.g., Twist).

Silicon Chip Manufacturing: A silicon chip with thousands of nanowells is prepared. Each well is designated for the synthesis of a specific oligonucleotide .

Cyclic Nucleotide Addition (Phosphoramidite Chemistry): The chip undergoes repeated cycles to build the oligos base-by-base from the 3’ end to the 5’ end. Each cycle for each base consists of four core chemical steps :

Deprotection (Detritylation): Acid removes a protecting group (DMT) from the 5’ hydroxyl of the last nucleotide, making it reactive.

Coupling: The next nucleotide (phosphoramidite monomer) is activated and added, forming a bond with the exposed 5’ hydroxyl.

Capping: Any unreacted 5’ hydroxyls are acetylated to prevent them from reacting in future cycles, which would cause deletions.

Oxidation: Iodine and water are used to stabilize the newly formed bond into a natural phosphate backbone.

Cleavage and Deprotection: After all cycles are complete, the synthesized oligos are cleaved from the chip, and all remaining protecting groups are removed using ammonium hydroxide .

Amplification and QC: The single-stranded oligos are amplified (often via PCR) to create double-stranded DNA fragments. These fragments are then purified and quality-controlled to ensure the correct sequence.

Part B: DNA Assembly (Building the Final Construct) Fragment Design: You design the ~500 bp fragments so that their ends have short, overlapping sequences (20-40 bp) that are complementary to the adjacent fragment.

Assembly Reaction (e.g., Gibson Assembly): All fragments, along with a linearized vector backbone, are mixed in a single tube with an enzyme master mix containing three activities:

Exonuclease: chews back nucleotides from the 5’ ends of the fragments, creating single-stranded overhangs that allow the complementary overlapping regions to anneal.

DNA Polymerase: fills in any gaps in the annealed regions.

DNA Ligase: seals the nicks in the sugar-phosphate backbone, creating a fully circular plasmid.

Transformation: The assembled plasmid is transformed into competent E. coli cells.

Screening and Verification: Colonies are screened for the correct insert, and the final plasmid is verified by Sanger sequencing to ensure 100% accuracy.

Limitations of the Method (Speed, Accuracy, Scalability) While this hybrid approach is the best available, it has inherent limitations.

Aspect Limitation Explanation Speed Not real-time. The entire process, from design to receiving a verified plasmid, typically takes 2-4 weeks. This is due to synthesis run times, shipping, assembly, cloning, and final sequencing verification. It is a batch process, not an instantaneous one. Accuracy Error accumulation in long, complex sequences. While the synthesis coupling efficiency is high (>99.5% per step) , errors (deletions, insertions, substitutions) are inevitable. For a long construct like yours, the probability of having at least one error in the final assembled product is significant. High-GC content, repetitive sequences, and strong secondary structures (like those found in some photosynthetic genes) can further increase error rates . This often necessitates sequencing multiple clones to find a perfect one. Scalability Assembly becomes a bottleneck. While silicon-chip synthesis is highly scalable for making millions of oligos , assembling them into many different, large, and complex constructs remains a manual and low-throughput process. Scaling up to make hundreds or thousands of different versions of your Baymax circuit is currently a significant bioengineering challenge.

5.3 DNA Edit

Although, in principle, gene editing has created many advantageous genes and aligns with the Darwinian principle of “survival of the fittest” in terms of survival and development—which is also very consistent with the basic principle of gene silencing or loss during long-term natural selection—I feel that, compared to human-directed evolution, natural random mutation actually shows greater respect for the individual will of living beings. Therefore, I do not like gene editing.

Based on thoughtful reflection on the ethical dimensions of gene editing, I will proceed with the technical analysis as requested while acknowledging the important philosophical considerations.

If I were to perform DNA edits—specifically to create the photosynthetic, self-healing “Baymax” biomaterial described earlier—I would choose the following technology:

Technology Selection: CRISPR-Cas9 CRISPR-Cas9 (Clustered Regularly Interspaced Short Palindromic Repeats) is the most suitable technology for this application because it offers precision, flexibility, and efficiency for introducing multiple genes into a target genome.

Why CRISPR-Cas9? Requirement/Why CRISPR-Cas9 Fits Multi-gene insertion Can target multiple loci simultaneously or sequentially Mammalian cell compatibility Well-established protocols for human cell lines Precision Can insert genes at specific “safe harbor” loci (e.g., AAVS1 in human cells) Efficiency High editing rates in many cell types

How CRISPR-Cas9 Edits DNA: Essential Steps Mechanism Overview CRISPR-Cas9 uses a guide RNA (gRNA) to direct the Cas9 nuclease to a specific DNA sequence, where it creates a double-strand break (DSB). The cell’s natural repair mechanisms then introduce edits: DNA Recognition: The gRNA contains a 20-nucleotide spacer complementary to the target DNA, adjacent to a PAM sequence (NGG) required for Cas9 binding. Double-Strand Break: Cas9 cuts both DNA strands, creating a DSB. DNA Repair: The cell repairs the break via: Non-Homologous End Joining (NHEJ): Error-prone repair that creates insertions/deletions (indels) to disrupt genes. Homology-Directed Repair (HDR): Precise repair using a DNA template, allowing gene insertion or correction.

Essential Steps for Your Project

Design Phase (Preparation) Input Required: Target genome sequence (e.g., human cell line reference) Donor DNA template (containing your photosynthetic genes) gRNA design tools Design Steps: Select Target Locus: Choose a “safe harbor” site (AAVS1, CCR5, or HPRT) where gene insertion won’t disrupt essential genes . Design gRNA: Use tools (CRISPOR, Benchling) to select 20-nt sequences adjacent to PAM sites with minimal off-target matches . Design Donor Template: Create a DNA fragment containing: Your photosynthetic gene cassette (psbA, rbcL, etc.) Left and right homology arms (500-800 bp each) matching sequences flanking the cut site Optional selection marker (e.g., GFP or puromycin resistance)
Delivery Phase Input Required: Cas9 protein or mRNA gRNA (synthetic or expressed from plasmid) Donor DNA template (for HDR) Target cells (e.g., human fibroblasts or induced pluripotent stem cells)

Delivery Methods: Transfection: Lipofection or electroporation of Cas9-gRNA ribonucleoprotein (RNP) complexes—preferred for efficiency and reduced off-target effects. Viral Delivery: Lentivirus or AAV for hard-to-transfect cells. Nucleofection: Electroporation-based method for primary cells.

Editing Phase Cellular Process: RNP complex enters nucleus gRNA guides Cas9 to target DNA Cas9 creates DSB If donor template present, cell may use HDR to insert your gene cassette If no template, NHEJ causes gene disruption
Screening and Validation PCR Screening: Test for correct integration using primers flanking the insertion site Sanger Sequencing: Verify precise sequence of edited locus Functional Assays: Confirm photosynthetic protein expression and activity

This approach, while technically challenging, represents the current state-of-the-art for introducing complex synthetic circuits into human cells. The limitations—particularly low HDR efficiency for large inserts—mean that success would require significant optimization and screening, but the technology exists to make OUR vision possible.