Zeynep Begüm Kara — HTGAA Spring 2026

About me
computer science master’s student in intelligent user interfaces: enthusiast of intelligent and delightfully non-intelligent things. ((still figuring out why (almost) EVERYTHING excites me))

computer science master’s student in intelligent user interfaces: enthusiast of intelligent and delightfully non-intelligent things. ((still figuring out why (almost) EVERYTHING excites me))
Week 1 HW: Principles and Practices
Class assignment 1. First, describe a biological engineering application or tool you want to develop and why. My brother has autism, which is why this area is personal for me. As a CS/AI master’s student, I find it exciting that I can use AI protein design tools like AlphaFold to work on something that actually matters to me and my family.
Week 2 HW: DNA Read, Write and Edit
Part 1: Benchling & In-silico Gel Art For this exercise, the full genome of Bacteriophage Lambda (GenBank accession J02459.1, 48,502 bp) was imported into Benchling from NCBI. A virtual restriction enzyme digestion was performed using seven enzymes: EcoRI, HindIII, BamHI, KpnI, EcoRV, SacI, and SalI. Each enzyme was applied individually to identify its recognition sites across the Lambda genome. The digest results were visualized using Benchling’s simulated gel electrophoresis tool. To get a nice visual, I tried different ladders and different lane orderings. The final output consisting of for each enzyme’s fragment pattern is below.
Part 1: Python Script Opentron Artwork Opentrons Colab for Source Code My Jelly Smiley Design Part 2: Post-Lab Questions Question 1: Find and describe a published paper that utilizes the Opentrons or an automation tool to achieve novel biological applications.
My brother has autism, which is why this area is personal for me. As a CS/AI master’s student, I find it exciting that I can use AI protein design tools like AlphaFold to work on something that actually matters to me and my family.
A recent study by Trudler et al. (2024) at Scripps Research used patient-derived brain organoids (“mini-brains”) to show that mutations in the MEF2C gene — responsible for a severe form of ASD — disrupt the expression of specific microRNAs (miR-9, miR-124, miR-128). These miRNAs normally guide developing brain cells to become the right type of neuron; when they are dysregulated, the balance between excitatory and inhibitory neurons is lost, leading to hyperexcitability associated with autism.
Reference: Trudler, D. et al. “Dysregulation of miRNA expression and excitation in MEF2C autism patient hiPSC-neurons and cerebral organoids.” Molecular Psychiatry 30, 1479–1496 (2025). DOI: 10.1038/s41380-024-02761-9
I want to use AI protein design tools (AlphaFold, ESMFold, Rosetta — covered in HTGAA Weeks 4-5) to design a small protein or peptide that can bind to and modulate these autism-associated miRNAs. The designed protein would be expressed using cell-free synthesis (Week 9) and characterized with mass spectrometry (Week 10). The gene would be ordered from Twist Bioscience. I want to develop this because it sits right at the intersection of my CS/AI background and the wet lab techniques taught in HTGAA, and because AI-designed proteins targeting neurodevelopment could open up new research directions for autism.
My primary goal is to make sure AI-designed proteins targeting neurodevelopment are safe, responsibly used, and beneficial to the autism community. The sub-goals are:
AI-designed proteins have the possibility to be used for great benefit in understanding and eventually treating neurodevelopmental conditions, but could also be misused or cause harm if not properly governed.
Action 1: Mandatory biosafety screening for AI-designed proteins at DNA synthesis providers.
Action 2: Institutional ethics review for AI-designed therapeutics targeting neurodevelopment.
Action 3: Open sharing of AI protein design methods and results through conferences and preprints.
| Does the action: | Action 1: Synthesis Screening | Action 2: Ethics Review | Action 3: Open Sharing |
|---|---|---|---|
| Enhance Biosecurity | |||
| • By preventing incidents | 1 | 2 | 2 |
| • By helping respond | 1 | 3 | 1 |
| Foster Lab Safety | |||
| • By preventing incident | 1 | 2 | 2 |
| • By helping respond | 2 | 2 | 1 |
| Protect the environment | |||
| • By preventing incidents | 1 | 2 | 2 |
| • By helping respond | 2 | 2 | 1 |
| Other considerations | |||
| • Minimizing costs and burdens to stakeholders | 3 | 2 | 1 |
| • Feasibility? | 2 | 1 | 1 |
| • Not impede research | 3 | 2 | 1 |
| • Promote constructive applications | 2 | 2 | 1 |
I think a combination of Action 2 (ethics review) and Action 3 (open sharing) would work best as immediate steps, with Action 1 (synthesis screening) as a longer-term goal.
Action 1 scores best on biosecurity and safety, but it is the most burdensome and could slow down student research. I would not want my own Twist order for this course to get delayed or rejected by an overly aggressive screening algorithm. This is better pursued as a long-term industry-wide initiative rather than something individual institutions can do on their own.
Action 2 is the most feasible to implement right now. MIT already has IBC infrastructure, and adding a lightweight neurodevelopment checklist requires no new funding or legislation. Action 3 is also easy to start and promotes the kind of interdisciplinary collaboration that makes AI protein design safer through community oversight.
Here many assumptions are made — mainly that AI-designed proteins pose meaningfully different risks from traditionally designed ones, which may not be true yet but will become more relevant as the tools improve. There is also a tension between accessibility and oversight. As a CS student entering biology for the first time through HTGAA, I benefit a lot from open tools and low barriers. Over-regulation could discourage exactly the kind of interdisciplinary work this course promotes. But as someone with a family member with autism, I also understand why this community can be wary of researchers who study autism without engaging with autistic people. These uncertainties can be mitigated by keeping review processes lightweight and making sure they include input from the autism community itself.
Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome? How does biology deal with that discrepancy?
DNA polymerase has a raw error rate of about 10⁻⁴ to 10⁻⁶ per base pair during nucleotide insertion. With the built-in 3’→5’ proofreading exonuclease, accuracy improves to around 10⁻⁷ to 10⁻⁸. The human genome is roughly 3.2 × 10⁹ base pairs (about 6.3 billion for the diploid genome), so at that rate there would still be many errors per copy. Biology solves this through mismatch repair mechanisms (such as MutS) that catch errors proofreading missed, bringing the final error rate down to about 1 per 10⁹ to 10¹⁰ nucleotides — low enough to reliably copy a genome this large.
How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?
The genetic code is degenerate — 64 codons encode 20 amino acids plus 3 stop signals. The slides mention the average human protein is about 1036 base pairs, which is roughly 345 codons / 345 amino acids. With about 3 possible codons per amino acid on average, that gives roughly 3³⁴⁵ ≈ 10¹⁷⁹ possible DNA sequences for one protein — a huge number.
In practice most don’t work because organisms prefer certain codons that match their tRNA pools (codon usage bias), and using rare codons slows the ribosome and drops protein yield. Other issues include mRNA secondary structure and GC content blocking transcription or translation, accidental creation of regulatory signals like splice sites, and changes in translation speed that alter co-translational protein folding — affecting structure, solubility, or stability even when the amino acid sequence is identical. This is why codon optimization is standard practice in protein engineering.
What’s the most commonly used method for oligo synthesis currently?
The phosphoramidite method — solid-phase chemical synthesis originally developed by Caruthers. Oligos are built stepwise on a solid support like controlled pore glass (CPG), adding one nucleotide at a time through cycles of detritylation, coupling, oxidation, and capping. It is highly automatable and forms the basis of all modern oligo synthesis platforms.
Why is it difficult to make oligos longer than 200 nt via direct synthesis?
Each coupling step has an efficiency of about 99%, but these small errors compound exponentially. A 200-mer at 99% efficiency gives only ~13% theoretical full-length yield, and in practice it’s much lower. Longer sequences accumulate more deletions, truncations, and depurination from the repeated harsh chemical cycles. Longer strands also form secondary structures that hinder reagent diffusion and coupling on porous supports like CPG. The result is that failure sequences (n-1 mers, mutations) dominate the output, and purifying the correct full-length product becomes impractical.
Why can’t you make a 2000 bp gene via direct oligo synthesis?
The above answer explains why longer oligos become dramatically harder to make. At 2000 bp, cumulative coupling inefficiency and side reactions cause full-length product yield to approach zero. Unlike enzymatic replication, chemical synthesis has no proofreading, so deletions, insertions, and substitutions build up rapidly. Long sequences also form stable hairpins that block reagent access. Instead, modern gene synthesis assembles shorter oligos (~60–200 nt) into longer fragments using enzymatic methods like Gibson Assembly, followed by sequence verification.
What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?
The 10 essential amino acids are: lysine, methionine, tryptophan, threonine, valine, isoleucine, leucine, arginine, histidine, and phenylalanine. Animals cannot synthesize these and must obtain them from dietary sources.
The “Lysine Contingency” is from Jurassic Park — a genetic modification to make the dinosaurs unable to produce lysine so they would die without human-provided supplements. However, lysine is already one of the 10 essential amino acids, so animals cannot produce it anyway. The dinosaurs could simply get lysine by eating plants, meat, or bacteria, making this a scientifically dubious plot point.
That said, the concept highlights something real about food security. Lysine is a limiting amino acid in cereal-based diets — staple crops like maize, rice, and wheat are all lysine-deficient relative to animal nutritional needs (Galili G, 2002). Growth and health can be constrained by lysine availability even when total protein intake is sufficient. This is why biotechnological interventions like microbial lysine production or high-lysine crops can have outsized impacts on food security and animal productivity. The lysine contingency, while fictional, illustrates how molecular-level biochemical constraints shape global food systems and ecological dependencies.
Applied AI support for early-stage project ideation through informal, conversational brainstorming (including: “My background is [X]. I’m interested in autism in the HTGAA course—what can I do?”), as well as subsequent formatting, structural organization, and language refinement of written outputs.
For this exercise, the full genome of Bacteriophage Lambda (GenBank accession J02459.1, 48,502 bp) was imported into Benchling from NCBI. A virtual restriction enzyme digestion was performed using seven enzymes: EcoRI, HindIII, BamHI, KpnI, EcoRV, SacI, and SalI. Each enzyme was applied individually to identify its recognition sites across the Lambda genome. The digest results were visualized using Benchling’s simulated gel electrophoresis tool. To get a nice visual, I tried different ladders and different lane orderings. The final output consisting of for each enzyme’s fragment pattern is below.

Experiments with DNA Gel Art Interface. My initials below figure B. K.

Note: As a possible extension, it could be cool to build an interactive version of Ronan’s website where you can pick genome, enzymes, drag lanes around, swap ladders, add timing, etc. Could be a nice tool for future HTGAA classes.
Note: As a possible extension, it could be cool to build an interactive version of Ronan’s website where you can pick genome, enzymes, drag lanes around, swap ladders, add timing, etc. Could be a nice tool for future HTGAA classes.
I chose PIGH (Phosphatidylinositol N-acetylglucosaminyltransferase subunit H), a human protein encoded by the PIGH gene on chromosome 14q24.1. PIGH is a 188-amino-acid subunit of the GPI-GnT complex, which catalyzes the first step of GPI-anchor biosynthesis — transferring N-acetylglucosamine to phosphatidylinositol on the cytoplasmic side of the endoplasmic reticulum. GPI anchors tether many important proteins to the cell surface.
I chose this protein because mutations in PIGH cause GPIBD17 (Glycosylphosphatidylinositol Biosynthesis Defect 17), a rare autosomal recessive disorder characterized by developmental delay, seizures, and autistic features. The gene was only linked to disease in 2018, so it is likely underdiagnosed. Its small size (188 aa) also makes it practical for the synthesis and expression exercises in this homework.
I obtained the FASTA sequence from UniProt.
Using the Sequence Manipulation Suite reverse translation tool from bioinformatics.org, I converted the 188-amino-acid protein sequence into a 564 bp DNA sequence using the most likely codons:

I used the GenSmart Codon Optimization Tool to optimize the PIGH coding sequence for expression in E. coli and Staphylococcus aureus. Codon optimization is necessary because different organisms have different preferences for which codons they use most efficiently related to tRNA availability. Each organism has a different pool of tRNA molecules, and some tRNAs are abundant while others are rare. When the ribosome hits a codon whose matching tRNA is scarce in that organism, translation slows down or stalls. A human gene expressed directly in E. coli may use rare codons that slow down or stall translation. Optimization replaces these with codons preferred by the host organism, improving expression levels without changing the protein sequence. Optimization report available here.
Optimized sequence:
To produce the PIGH protein, the codon-optimized DNA sequence is incorporated into an expression construct containing regulatory elements — a promoter (BBa_J23106), ribosome binding site (BBa_B0034), start codon, coding sequence, 7×His tag for purification, stop codon, and terminator (BBa_B0015). This can be ordered as a synthetic plasmid from Twist Bioscience in a backbone like pTwist Amp High Copy.
Protein production follows the central dogma in two stages. During transcription, RNA polymerase binds the promoter and synthesizes a complementary mRNA strand. During translation, ribosomes assemble at the RBS and read the mRNA in three-nucleotide codons, with tRNA molecules delivering amino acids from the start codon (AUG/methionine) until the UAA stop codon is reached, producing the folded 188-amino-acid PIGH polypeptide.
Cell-Dependent Production: The plasmid could be transformed into E. coli BL21 via heat shock or electroporation, with ampicillin selection to identify successful transformants. The host cell’s machinery and metabolic resources would then drive protein expression. Since PIGH is relatively small and does not appear to require complex eukaryotic modifications, E. coli may be a suitable host. The His-tagged protein could then be purified via Ni-NTA chromatography. This approach tends to be scalable but requires time for cell growth and purification.
Cell-Free Production: Alternatively, a TX-TL cell-free system (e.g., PURExpress) could be used, where the linear expression cassette is mixed directly with ribosomes, polymerase, tRNAs, and amino acids from lysed cells. This would likely produce protein within hours without living cultures, making it potentially ideal for rapid prototyping. PIGH’s small size (~564 bp) should be well within the efficient range for cell-free systems, though yields are generally lower and cost per reaction tends to be higher.

I would want to sequence the PIGH gene in patients presenting with developmental delay, seizures, and autistic features who lack a molecular diagnosis. Sequencing PIGH in such individuals could potentially help identify pathogenic variants and improve diagnostic rates for GPI-anchor biosynthesis disorders.
Since the PIGH coding region is relatively short (~564 bp), Sanger sequencing would likely be a suitable approach. Sanger is generally considered a first-generation method, reading one sequence at a time rather than in parallel like next-gen methods. I would extract genomic DNA from a patient sample, PCR-amplify the PIGH region using forward and reverse primers, and use the purified product as input for the Sanger reaction. A forward and reverse read should be sufficient to cover the full sequence with overlap.
The Sanger reaction mix contains normal dNTPs along with fluorescently labeled ddNTPs. During extension, the polymerase occasionally incorporates a ddNTP, terminating the chain at that position. This produces fragments of various lengths, each capped with a color for A, T, G, or C. Capillary electrophoresis separates them by size, and a detector reads the color at each position. The output is a chromatogram with color-coded peaks and a derived nucleotide sequence that could be compared against the PIGH reference to look for mutations.
I would want to synthesize the codon-optimized PIGH expression cassette from Parts 3–4. It could also be useful to order variant versions carrying known disease-associated mutations alongside wild-type to compare their effects on protein function. I would likely use phosphoramidite oligonucleotide synthesis as offered by companies like Twist Bioscience. Short oligos are built base-by-base on a solid support through repeated cycles of de-protection, coupling, capping, and oxidation, then assembled into the full-length gene and sequence-verified. Error rate tends to increase with length, so longer constructs may need to be built from smaller fragments. For PIGH (~564 bp), this should be well within a comfortable range.
I would want to correct the M1L mutation (c.1A>T) in the PIGH gene. This change appears to disrupt the start codon and may prevent normal protein production. An adenine base editor (ABE) could be a reasonable choice for this kind of single-base correction. ABE generally uses a nickase Cas9 fused to an adenosine deaminase. A guide RNA directs the complex to the target site, the deaminase converts the target base without creating a double-strand break, and the cell’s repair machinery completes the correction. Inputs would include the ABE protein or mRNA, a guide RNA targeting the mutation region, and cells carrying the variant. Some limitations include the narrow editing window of the deaminase, dependence on a nearby PAM sequence, possible bystander edits on other adenines in the window, and variable delivery efficiency across cells.
Opentrons Colab for Source Code

Question 1: Find and describe a published paper that utilizes the Opentrons or an automation tool to achieve novel biological applications.
Fushimi, K., Nakai, Y., Nishi, A., Suzuki, R., Ikegami, M., Nimura, R., Tomono, T., Hidese, R., Yasueda, H., Tagawa, Y., & Hasunuma, T. (2025). Development of the autonomous lab system to support biotechnology research. Scientific Reports, 15, 6648. https://doi.org/10.1038/s41598-025-89069-y
This paper by Fushimi, Nakai et al. (2025) tackles the problem of optimizing how engineered bacteria produce glutamic acid, a commercially valuable amino acid used in MSG, medicine, and cosmetics. The researchers engineered E. coli by knocking out four genes and overexpressing ten others to redirect its metabolism toward glutamic acid production. However, finding the right growth medium recipe is extremely difficult because the bacteria tightly regulate their own glutamic acid levels to protect themselves from stress like pH changes and osmotic pressure. With dozens of possible nutrient combinations to test, manual experimentation is too slow and the biology is too complex for human intuition alone.
To solve this, they built the Autonomous Lab (ANL) using an Opentrons OT-2 liquid handler, an incubator, centrifuge, plate reader, mass spectrometer, and a robotic arm connecting them all. A Bayesian optimization AI suggests which combinations of calcium, magnesium, cobalt, and zinc concentrations to test, the robots autonomously prepare the media, culture the bacteria, and measure both growth and glutamic acid output, then feed the results back to the AI for the next round.
The system found a medium that nearly doubled E. coli growth using only 24 AI-picked experiments, outperforming a brute-force search of 256 conditions. This is the first fully autonomous closed-loop system for culture medium optimization, demonstrating that affordable tools like the Opentrons can power self-driving biology labs capable of navigating complex biological regulation faster than any human could, opening the door to optimizing production of many other valuable biomolecules the same way.


