I’m a synthetic biology researcher with a strong interest in genome engineering, programmable biological systems, and responsible innovation. My background combines wet lab work, computational biology, and project leadership across academic research, iGEM, and early-stage biotech.
I’m currently completing a Master’s in Synthetic Biology, where my thesis focuses on developing an RNA-only gene writing platform based on engineered retroelements (LINE-1 / ORF2p) for precise genome insertion in mammalian cells. I’m particularly interested in how we can design powerful biological tools while embedding safety, controllability, and ethical considerations from the start.
Alongside research, I’m actively involved in building the synthetic biology ecosystem through community building, conferences, and entrepreneurship initiatives. I’m motivated by bridging fundamental biology with real-world applications in human health.
Class Assignment Week 1 HW 1. First, describe a biological engineering application or tool you want to develop and why. Programmable LINE-1–based gene insertion for safe, locus-specific genome engineering The biological engineering tool I am developing as part of my research is a programmable LINE-1 (L1)–based gene insertion system that enables targeted, large-payload integration into mammalian genomes without relying on double-strand breaks. This system builds on the natural target-primed reverse transcription (TPRT) mechanism of LINE-1 retrotransposase, combined with programmable Cas9 nickases to direct integration toward predefined genomic loci.
Part 1: Benchling & In-silico Gel Art For this exercise, I imported the lambda DNA reference sequence on Benchling, and performed in-silico restriction digests using the required enzymes: EcoRI, HindIII, BamHI, KpnI, EcoRV, SacI, and SalI.
Using Benchling’s digestion simulation tool, I iteratively tested different enzyme combinations to generate fragment patterns with distinct size distributions.
I attach below the image of the resulting gel electrophoresis:
Subsections of Homework
Week 1 HW: Principles and Practices
Class Assignment Week 1 HW
1. First, describe a biological engineering application or tool you want to develop and why.
Programmable LINE-1–based gene insertion for safe, locus-specific genome engineering
The biological engineering tool I am developing as part of my research is a programmable LINE-1 (L1)–based gene insertion system that enables targeted, large-payload integration into mammalian genomes without relying on double-strand breaks. This system builds on the natural target-primed reverse transcription (TPRT) mechanism of LINE-1 retrotransposase, combined with programmable Cas9 nickases to direct integration toward predefined genomic loci.
Current genome engineering methods face trade-offs between payload size, efficiency, and genomic safety. Viral vectors have limited cargo capacity and integration risks, while CRISPR-based homology-directed repair struggles with efficiency and cell-type specificity. A controlled, RNA-mediated L1 platform could enable safer gene insertion for cell therapy, synthetic circuits, and long-term recording applications, particularly in post-mitotic or hard-to-edit cells.
However, because L1 elements are naturally mobile, mutagenic, and evolutionarily active, engineering them raises legitimate concerns around genomic instability, misuse, and unintended propagation. This makes the technology a strong candidate for proactive governance design alongside technical development.
2. Governance and Policy Goals
Primary goal: Enable beneficial use while preventing harm
To ensure that engineered LINE-1–based genome writing technologies contribute to an ethical future, I focus on three overarching governance goals. Each goal is broken down into concrete, actionable sub-goals.
Goal 1: Prevent harmful or uncontrolled genomic integration
The technology should minimize biological risks associated with unintended genome modification.
Sub-goal 1.1: Reduce off-target insertions and genomic instability
Sub-goal 1.2: Prevent autonomous, self-sustaining, or uncontrolled retrotransposition activity
Goal 2: Reduce misuse or repurposing for harmful applications
Safeguards should limit the potential for misuse while maintaining accountability.
Sub-goal 2.1: Limit the use of engineered LINE-1 systems to clearly defined research, therapeutic, or diagnostic contexts
Sub-goal 2.2: Ensure traceability and responsibility for engineered constructs and their downstream use
Goal 3: Preserve accessibility for constructive applications
Governance should support innovation rather than restrict legitimate research.
Sub-goal 3.1: Encourage use within regulated therapeutic, academic, and diagnostic settings
Sub-goal 3.2: Avoid governance frameworks that unnecessarily block basic research, creativity, or responsible innovation
3. Next, describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”).
Below, I outline three complementary governance actions that involve technical, institutional, and regulatory actors. Each action is described in terms of its Purpose, Design, Assumptions, and Risks of Failure or Success.
Option 1: Technical containment in the design
Purpose
Currently, much of the safety of engineered retroelements relies on best practices and trust. I propose embedding molecular containment features directly into L1-based systems to reduce the risk of uncontrolled genome modification.
Design
Technical safeguards could include:
Splitting ORF2p into components that only function together
Making retrotransposition activity dependent on exogenous factors
Encoding sequence barcodes or self-limiting elements (e.g. kill switches)
Companies developing clinical or industrial applications
Assumptions
Molecular safeguards remain stable over time
Researchers implement safeguards faithfully
Risks of failure or “success”
Failure: Safeguards could fail or be bypassed, particularly under loss of selective pressure
Success risk: Effective containment may create false confidence and reduce oversight
Option 2: Tiered access and licensing for engineered retrotransposon systems
Purpose
At present, access to genome engineering tools is largely unrestricted once published. I propose a tiered access model, similar to approaches used in pathogen research or dual-use chemicals.
Design
Open publication of concepts and safety data
Controlled access to full constructs via material transfer agreements (MTAs)
Licensing tied to institutional biosafety approval and defined project scope
Actors involved
Universities and technology transfer offices
Funding bodies
Journals and data repositories
Assumptions
Access restrictions meaningfully reduce misuse
Institutions enforce licensing and access rules consistently
Risks of failure or “success”
Failure: Informal or “black-market” sharing persists
Success risk: May disadvantage low-resource laboratories or institutions
Option 3: Traceability and post-use monitoring
Purpose
Most governance efforts focus on prevention rather than what happens after deployment. I propose post-use monitoring to detect unintended spread or misuse of engineered L1 systems.
Design
Standardized sequence barcodes embedded in engineered L1 constructs
Genomic monitoring in clinical or industrial settings
Shared reporting standards for insertion sites and activity
Actors involved
Clinical developers
Regulatory agencies
Standards organizations
Assumptions
Barcodes remain stable and detectable over time
Monitoring data are shared transparently
Risks of failure or “success”
Failure: Monitoring may be incomplete or inconsistently applied
Success risk: Privacy, consent, or data governance concerns in clinical contexts
4. Next, score (from 1-3 with, 1 as the best, or n/a) each of your governance actions against your rubric of policy goals.
Policy goal / Criterion
Option 1: Technical containment
Option 2: Tiered access & licensing
Option 3: Traceability & monitoring
Enhance biosecurity
1
2
2
• Prevent misuse or unintended spread
1
2
2
• Enable response to incidents
2
2
1
Foster laboratory safety
1
2
2
• Prevent laboratory incidents
1
2
2
• Support post-incident investigation
2
2
1
Protect the environment
1
2
2
• Prevent unintended persistence or spread
1
2
2
• Detect environmental release
3
3
1
Minimizing costs & burden to stakeholders
1
3
2
Feasibility & scalability
2
2
2
Does not impede legitimate research
1
3
2
Promotes constructive applications
1
2
1
5. Last, drawing upon this scoring, describe which governance option, or combination of options, you would prioritize, and why. Outline any trade-offs you considered as well as assumptions and uncertainties.
Based on this analysis, I would prioritize Option 1 (technical containment) as the basic governance strategy, complemented by Option 3 (traceability and post-use monitoring) in higher-risk or clinical contexts.
I prioritize technical containment because it integrates safety directly into the biological system, reducing reliance on user behaviour or downstream oversight. This approach fits well with how SynBio is actually practiced in academic labs, where most control exists at the design stage. It is also scalable and low-burden, allowing safety features to propagate naturally as tools are shared and reused.
As containment alone is not sufficient, option 3 addresses what happens after deployment by allowing detection, and accountability if unintended outcomes occur. This is particularly important in clinical or industrial contexts, where engineered systems may persist long-term. The main trade-off is added complexity and potential privacy concerns, but these can be managed with clear standards and limited monitoring scopes.
I would apply Option 2 (tiered access and licensing) more selectively. While appropriate for especially powerful systems, broad restrictions risk slowing basic research and disadvantaging smaller labs, and may not effectively prevent misuse.
Overall, this recommendation assumes that molecular safeguards remain stable over time. It is primarily aimed at academic labs, funding agencies, and the braod SynBio community, where early incentives and expectations can shape responsible development without impeding innovation.
Personal Reflection
One thing that stood out to me this week is how easy it is for governance to become either too abstract or too restrictive. As someone working directly on developing genome engineering tools, I feel the balance between wanting freedom to innovate and recognizing that some tools need extra regulation.
A governance action I’d add is encouraging or requiring short ethical impact statements alongside new genome writing tools, especially at the funding or publication stage, as a way to slow down to think.
Assignment (Week 2 Lecture Prep)
Professor Jacobson
1. Error rate of DNA polymerase, genome size, and how biology deals with it
The intrinsic error rate of DNA polymerase is approximately 10⁻⁶ errors per nucleotide incorporated. With proofreading activity (3’→5’ exonuclease), this improves to around 10⁻⁷, and after post-replicative mismatch repair, the final error rate is reduced to roughly 10⁻⁹–10⁻¹⁰ per base per replication. (Source: https://www.nature.com/articles/cr20084)
The human genome is approximately 3 × 10⁹ base pairs, meaning that without error correction, thousands of mutations would occur every time a cell divides. Biology resolves this through multiple layers of quality control:
Polymerase proofreading during replication
Mismatch repair systems acting after replication
Cell-cycle checkpoints and apoptosis to eliminate heavily damaged cells
2. How many different DNA codes can specify an average human protein, and why most don’t work in practice
An average human protein is approximately 350 amino acids long. Because many amino acids are encoded by multiple codons, there are many DNA sequences that could encode the same protein sequence.
In practice, however, most of these sequences do not work well. Codon choice affects translation efficiency, mRNA folding and stability, ribosome pausing, and co-translational protein folding. Some sequences can create splice sites, polyadenylation signals, or regulatory motifs that disrupt expression. Although the genetic code is redundant, biological constraints strongly limit which DNA sequences are functionally viable.
Dr. LeProust
3. Most commonly used method for oligo synthesis
The most widely used method for oligonucleotide synthesis is solid-phase phosphoramidite DNA synthesis, in which DNA is built one nucleotide at a time on a solid support through sequential chemical coupling steps.
4. Why oligos longer than ~200 nt are difficult to make
Each chemical coupling step has a small probability of failure. As oligos increase in length, these small errors accumulate. Beyond approximately 200 nucleotides, the fraction of molecules that are full-length and error-free drops sharply, making synthesis inefficient and error-prone.
5. Why a 2000 bp gene cannot be synthesized directly
Direct synthesis of a 2000 bp gene would require thousands of consecutive chemical reactions. The cumulative error rate would be so high that almost none of the molecules would be correct. Instead, long genes are typically assembled enzymatically from shorter oligos using methods such as PCR-based assembly or Gibson assembly.
George Church (chosen question)
What are the 10 essential amino acids in animals, and how does this affect the “lysine contingency”?
The ten essential amino acids in animals are:
Histidine
Isoleucine
Leucine
Lysine
Methionine
Phenylalanine
Threonine
Tryptophan
Valine
Arginine
Animals cannot synthesize these amino acids de novo and must obtain them through their diet.
In Jurassic Park, the lysine contingency is presented as a biological fail-safe: the engineered dinosaurs are unable to synthesize lysine and are therefore assumed to be unable to survive outside a controlled environment where lysine is supplemented. However, in my opinion this logic does not hold up biologically. All animals, including mammals and birds, are naturally lysine-auxotrophic and survive perfectly well by acquiring lysine through their diet. Herbivores obtain lysine from plants or plant-associated microbes, and carnivores obtain it by consuming herbivores.
Given this, lysine auxotrophy does not constitute a meaningful biocontainment strategy. Rather than creating a biological dependency, it simply makes the dinosaurs metabolically equivalent to normal animals. In this sense, the lysine contingency treats a normal dietary limitation as if it were a true safety mechanism, when in reality it offers no real containment. (Source: https://jurassicpark.fandom.com/wiki/Lysine_contingency)
Week 2 HW: DNA Read, Write, & Edit
Part 1: Benchling & In-silico Gel Art
For this exercise, I imported the lambda DNA reference sequence on Benchling, and performed in-silico restriction digests using the required enzymes: EcoRI, HindIII, BamHI, KpnI, EcoRV, SacI, and SalI.
Using Benchling’s digestion simulation tool, I iteratively tested different enzyme combinations to generate fragment patterns with distinct size distributions.
I attach below the image of the resulting gel electrophoresis:
Part 3: DNA Design Challenge
3.1 Choose your protein
For this assignment, I selected the human LINE-1 retrotransposable element ORF2 protein (ORF2p) (UniProt: O00370). ORF2p is the catalytic core of LINE-1 retrotransposition and contains both an endonuclease domain and a reverse transcriptase domain. These enzymatic activities enable target-primed reverse transcription and genomic integration, making ORF2p central to LINE-1 mobility in the human genome.
I chose this protein because my research focuses on engineering LINE-1–based gene writing systems. ORF2p is the key effector molecule that determines integration efficiency, specificity, and safety. Understanding its sequence, structure, and functional constraints is therefore essential for developing programmable genome engineering tools in mammalian systems.
The full-length ORF2p protein is approximately 1275 amino acids in length.
3.2 Reverse Translation
To determine a nucleotide sequence corresponding to the ORF2p amino acid sequence, I used Benchling’s reverse translation tool guided by human codon usage preferences. Because the genetic code is degenerate, multiple codons can encode the same amino acid, meaning that many distinct DNA sequences can theoretically encode ORF2p.
Reverse translation produces one valid open reading frame that corresponds to the protein sequence. Given the length of ORF2p (~1275 amino acids), the coding DNA sequence is approximately 3.8 kilobases in length. At this stage, the sequence is functionally correct but not yet optimized for efficient expression in a specific host system.
3.3 Codon Optimization
Although multiple DNA sequences can encode the same protein, not all synonymous codons are used equally across organisms. Codon optimization is therefore necessary to improve expression efficiency in a chosen host.
I optimized the ORF2 coding sequence for expression in human cells, specifically HEK293T cells, which are commonly used in LINE-1 retrotransposition assays. Codon optimization enhances translation efficiency, improves mRNA stability, and reduces the likelihood of ribosomal pausing caused by rare codons. It also allows removal of undesirable sequence features such as cryptic splice sites, extreme GC content, internal polyadenylation signals, or problematic restriction enzyme recognition sites.
In addition, optimization can eliminate Type IIS restriction enzyme sites (such as BsaI or BsmBI), facilitating modular cloning strategies and downstream synthetic assembly.
3.4 From DNA to Protein
Once synthesized, the codon-optimized ORF2 sequence can be expressed using either cell-dependent or cell-free systems.
In a cell-dependent system, the ORF2 coding sequence would be cloned into a mammalian expression plasmid under a strong promoter, such as the CMV promoter. Following transfection into HEK293T cells, the host RNA polymerase II transcribes the DNA into mRNA. The mRNA is then exported to the cytoplasm, where ribosomes translate it into ORF2 protein. The protein folds and, when expressed in the context of a full LINE-1 element, participates in retrotransposition via target-primed reverse transcription.
Alternatively, the optimized ORF2 sequence could be expressed in a cell-free transcription–translation system. In this case, the DNA template is transcribed in vitro into mRNA, which is subsequently translated into protein in a controlled biochemical environment. This approach enables direct characterization of ORF2 enzymatic activity, such as reverse transcriptase function, without the complexity of genomic integration.
Together, these approaches illustrate how a protein sequence can be reverse translated, optimized, synthesized, and ultimately expressed for functional investigation.
Part 4: Prepare a Twist DNA Synthesis Order
For this exercise, I designed a bacterial expression construct to express the human LINE-1 ORF2 protein in E. coli.
Insert Design (Benchling)
In Benchling, I created a linear DNA insert containing:
Constitutive promoter (BBa_J23106)
RBS (BBa_B0034)
Start codon (ATG)
Codon-optimized ORF2 coding sequence
C-terminal 7×His tag
Stop codon (TAA)
Double terminator (BBa_B0015)
Each region was annotated, and the final sequence was exported as a FASTA file.
Twist Order
On Twist, I selected:
Genes → Clonal Genes
I uploaded the nucleotide sequence and chose the pTwist Amp High Copy backbone, which provides:
Ampicillin resistance
High-copy origin of replication
This results in a circular plasmid ready for transformation into E. coli for ORF2 expression and purification.
Final Construct
The final plasmid contains the ORF2 expression cassette cloned into the pTwist Amp High Copy vector.
Plasmid map attached below:
5.1 DNA Read
(i) What DNA would you want to sequence and why?
I would aim to sequence LINE-1 (L1) retrotransposon insertions in engineered human cell lines expressing synthetic L1-based gene writing systems. The objective would be to map insertion sites genome-wide, quantify insertion frequency, and detect potential off-target integrations or structural rearrangements. Because L1 retrotransposition can generate full-length insertions, 5′ truncations, and genomic rearrangements, detailed genomic characterization is essential to assess both efficiency and safety. Understanding the integration landscape at high resolution would provide insights into insertion bias, genomic context preferences, and risks associated with programmable genome writing systems.
(ii) What technology would you use and why?
To achieve this, I would use a hybrid sequencing strategy combining third-generation long-read sequencing (Oxford Nanopore Technologies) with second-generation short-read sequencing (Illumina). Nanopore sequencing is particularly suitable for detecting large insertions and structural variants because it generates long reads capable of spanning entire retrotransposon insertions together with their flanking genomic regions. Illumina sequencing would complement this approach by providing high base-calling accuracy and reliable validation of insertion breakpoints and small variants.
For Nanopore sequencing, the input would consist of high molecular weight genomic DNA extracted from HEK293T cells transfected with L1 constructs. Preparation includes DNA extraction preserving long fragments, optional size selection, end repair and dA-tailing, adapter ligation, and loading onto a flow cell. As DNA passes through a nanopore, changes in ionic current are measured and decoded by neural network-based base-calling algorithms.
For Illumina sequencing, genomic DNA would be fragmented, end-repaired, ligated to adapters, PCR amplified, and loaded onto a flow cell for cluster generation. Sequencing-by-synthesis uses fluorescently labeled reversible terminator nucleotides, and fluorescence signals are recorded at each cycle to determine the incorporated base.
The output includes FASTQ files containing base calls and quality scores, alignment files (SAM/BAM), and downstream analyses identifying insertion sites and structural variants.
5.2 DNA Write
(i) What DNA would you want to synthesize and why?
I would synthesize a full-length LINE-1 retrotransposition reporter construct designed to compare the activity of species-specific ORF2 sequences (for example, primate versus whale variants) in HEK293T cells. The construct would contain the canonical L1 architecture, including the 5′ UTR with its promoter, ORF1, and a species-specific ORF2 encoding the endonuclease and reverse transcriptase domains, followed by the 3′ UTR.
Within the 3′ UTR, in antisense orientation, I would incorporate an eGFP-based retrotransposition cassette disrupted by a γ-globin intron under the control of a CMV promoter. In this system, GFP expression occurs only if the L1 transcript is correctly spliced, reverse transcribed, and integrated into chromosomal DNA, allowing quantitative measurement of retrotransposition events. The plasmid backbone would include bacterial propagation elements and a puromycin resistance marker for mammalian cell selection.
The goal of synthesizing this construct is to systematically compare retrotransposition efficiency across different ORF2 variants while keeping the reporter system constant, isolating the contribution of ORF2 sequence variation.
(ii) What technology would you use and why?
Given the length and repetitive nature of the L1 construct, I would use solid-phase phosphoramidite oligonucleotide synthesis combined with enzymatic DNA assembly methods such as Gibson Assembly or Golden Gate cloning. Because multi-kilobase constructs cannot be synthesized directly due to cumulative error rates, the sequence would be divided into smaller fragments for chemical synthesis. These fragments would be assembled enzymatically into a plasmid backbone, transformed into bacteria for amplification, sequence-verified by full plasmid sequencing, purified, and transfected into HEK293T cells.
Limitations of this approach include increased error rates with fragment length, instability of repetitive elements in bacterial hosts, assembly challenges in highly repetitive regions, and increasing cost with construct size and complexity.
5.3 DNA Edit
(i) What DNA would you want to edit and why?
I would edit both engineered LINE-1 ORF2 sequences and specific genomic loci in human cells to investigate determinants of retrotransposition efficiency and integration specificity. At the construct level, I would introduce targeted mutations into ORF2, including alterations within the endonuclease and reverse transcriptase domains or reconstructed ancestral variants, to identify residues and motifs that modulate activity.
At the genomic level, I would edit defined safe-harbor loci, such as AAVS1-like regions, to introduce landing pads or sequence contexts that might bias L1 integration. This would allow controlled testing of whether retrotransposition can be redirected or made more predictable, contributing to the development of programmable genome writing systems.
(ii) What technology would you use and why?
I would primarily use CRISPR-based genome editing technologies. For precise insertions or locus engineering, I would use CRISPR-Cas9 combined with homology-directed repair (HDR). For subtle sequence modifications within ORF2, base editing or prime editing would be preferable due to their ability to introduce targeted changes without generating double-strand breaks.
In the CRISPR-Cas9 approach, a guide RNA is designed to target a specific genomic locus. Cas9 is delivered together with the guide RNA, creating a double-strand break. If a donor DNA template is provided, the cell repairs the break via HDR, incorporating the desired sequence. Base editors or prime editors use modified Cas9 variants fused to enzymatic domains that directly convert or rewrite specific nucleotides.
Preparation includes guide RNA design with off-target analysis, donor template design when required, preparation of CRISPR components, transfection into HEK293T cells, and validation using PCR and sequencing.
Limitations of CRISPR-based editing include variable HDR efficiency, potential off-target effects, locus-dependent variability, and technical challenges in achieving high efficiency for large insertions. Despite these constraints, CRISPR technologies offer the most versatile and precise framework for engineering and interrogating retrotransposon-based genome systems.