Homework

Weekly homework submissions:

  • Week 1 HW: Principles and Practices

    1. Describe a biological engineering application or tool you want to develop and why At my lab, we came across the problem that, after doing the analysis of the best peptides of interest, we're unable to correctly asses the complete presentation of such peptides in the trained T cells, and we don't know if it is because of the lack of translation (peptide is not being produced), or if it's a lack of presentation (peptide is not going through the ER to an MHC, or not going through the Golgi apparatus to the outside of the cell). Thus, it would be very usefull to be able to use a circuit to express fluorescence when peptide is presented, and also, inside the cell, to know with fluorescence microscopy, the specific location of the peptide if it's inside the cell, or just not at all translated. This is also a general problem we're finding in T Cell Therapy, and could also target the problem where we don't usually have standards for Mass Spec analysis. In views of this problem and, after a little bit of research, I found a good inspiration paper.
  • Week 2 HW: DNA Reading, writing and editing

    Part 1 DNA reading, editing and design Gel Art: Restriction Digests and Gel Electrophoresis Opening an account in Benchling and importing Lambda To open my account in Benchling I used my institutiona e-mail. To Import the Lambda DNA, I visited the website provided and looked at the FASTA file. Then I selected DNA/RNA Sequence in Benchling and copy-pasted the FASTA information in there. Simulating Restriction Enzyme Digestion with the following Enzymes For this excercise, I did the digestion one by one and then all along. This to be able to understand the hole process and how different digestions create different patterns in the digital gel. The results are the following:

Subsections of Homework

Week 1 HW: Principles and Practices

Drawing Drawing

1) Describe a biological engineering application or tool you want to develop and why

At my lab, we came across the problem that, after doing the analysis of the best peptides of interest, we're unable to correctly asses the complete presentation of such peptides in the trained T cells, and we don't know if it is because of the lack of translation (peptide is not being produced), or if it's a lack of presentation (peptide is not going through the ER to an MHC, or not going through the Golgi apparatus to the outside of the cell). Thus, it would be very usefull to be able to use a circuit to express fluorescence when peptide is presented, and also, inside the cell, to know with fluorescence microscopy, the specific location of the peptide if it's inside the cell, or just not at all translated. This is also a general problem we're finding in T Cell Therapy, and could also target the problem where we don't usually have standards for Mass Spec analysis. In views of this problem and, after a little bit of research, I found a good inspiration paper.

I would like to develop a circuit similar to the one described by Ayano, Mohammad et al. (2025), in their paper "High-throughput discovery of MHC class I- and II-restricted T cell epitopes using synthetic cellular circuits". My ideal circuit would use the peptides of interest that have gone through the first filters of utility before being implemented into a cancer vaccine for clinical trials. I could USE APCs as well as we're also interested in understanding MHCII expression and would help to understand better the self regulation and long memory of the immune system.

2) Describe one or more governance/policy goals related to ensuring that this application or tool contributes to an “ethical” future, like ensuring non-malfeasance (preventing harm). Break big goals down into two or more specific sub-goals.

1 non-malfeasance
- Standarization and validation of the circuit - Confirmation with ortogonal methods - Trazability and quality control of the circuit
2 Responsible use of personalized medicine
- Protecction of the genetic and immunological data of each patient. - Specific Informed Conscent
3 Access equity
- Promote technology transfer - Look for models of licencing and collaboration between labs
4 Prevention of dual use of the technology
- Define limits of what types of antigens can be evaluated - Ethic filtering through outside institutions

Governance of this technology should prioritize patient safety, responsible handling of biological data, equitable access, and prevention of misuse.

3) Describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”)

Action: Regulation requisit for a validation prior to clinical impact

Purpose:

Currently, there's not always a required and standard validation method before results of epitope research gets to clinical decisions. The proposal or getting a multicentric of validation of the peptide prior to clinical impact witll help prioritize will help tackle this problem.

Design:

Main actors would be the FDA, Health Canada or EMA, as well as ethics committees, hospitals and private sector.

- We should define benchmarks that are used as minimal performance indicators (sensibility, specificity, reproducibility, robustness, etc.) - Reference datasets should be used. - Translation lab-clinic evidence should be necessary - Should be added in guides as "companion tools for diagnostics/treatments"

Assumptions:

  • Regulators have the availability of rapid analysis
  • Standars are not stopping innovation
  • There’s a consensys about which is “enough evidence” of success for translaiton of epitopes to vaccines.

Risk of failure and success:

  • Failure: Too much burocracy.
  • Success: Can create barriers of entry that will make it difficult for small research groups to join.
Action: Incentives for open source data and reproducibility proof

Purpose:

Promote data sharing, generate open standards and promote transparent reporting.

Design:

Main actors would be financing agencies (NIH, CIHR), NGOs, Fundations, journals and universities.

- Require to have a open source data center with data and genetic constructs. - Financing groups that generate open data. - Bring benefits during grant evaluation.

Assumptions:

  • Researchers have the willing to share
  • Openess enhances quality and trust
  • Intellectual Property won’t be affected

Risk of failure and success:

  • Failure: Some companies might step back or patent aggressively before sharing.
  • Success: There could be some actors with bad intentions that could use information to optimize harmful peptides.
Action: Ethics-by-design

Purpose:

Incorporate ethical limits from the design of the project, instead of after the technology has been developed.

Design:

Main actors would be bioengineers, companies, institutional committees, providers.

- Restricted libraries of approved antigens. - Require registration to have access to more information. - Auditable use log.

Assumptions:

  • Technical filters really limit misuse of information.
  • Users won’t look for alternative access.

Risk of failure and success:

  • Failure: Systems might be easy to jump / false security.
  • Success: Could limit research.

4) Score governance actions

Does the option:Action 1Action 2Action 3
Enhance Biosecurity
• By preventing incidents122
• By helping respond122
Foster Lab Safety
• By preventing incident122
• By helping respond122
Protect the environment
• By preventing incidents331
• By helping respond331
Other considerations
• Minimizing costs and burdens to stakeholders223
• Feasibility?223
• Not impede research123
• Promote constructive applications223

5) Describe which governance option, or combination of options, you would prioritize, and why.

For the proposed actions mentioned above, I would combine the first and the third, and then implement the second option: 1) require peptide validation prior to the clinical stage and promote experimental design alongside the development of ethical guidelines to be met, and 2) encourage transparency.

To design experiments in the best possible way and ensure compliance with all necessary regulations, it is imperative to develop the research in parallel with ethical requirements. Within these ethical requirements, the importance of peptide validation prior to clinical trials should also be emphasized. If ethical standards and validation are met, moving on to clinical and industry stages should be faster and more appropriate, while also facilitating the bureaucratic steps for acceptance.

Finally, implementing transparency incentives would reduce time in peptide/antigen development, as it would prevent redundancy in research (i.e., repeating tests on antigens already shown to be ineffective) and would also save significant resources. Moreover, making this information publicly available could improve accessibility for laboratories with limited capacity for antigen discovery, enabling the application of already known antigens.

------------------------------------------------------------------------------

Homework questions: Professor Jacobson

1. Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?

The error rate of DNA polymerase is approximately 1 error per 10⁵ nucleotides. This demonstrates that DNA polymerase is incredibly accurate; however, it does have its limitations.

The human genome contains about 3.2 billion base pairs and, if we consider the polymerase error rate with these numbers in mind, we would realize that it would make around 32,000 errors every time a cell divides. This level of mutation, for a complex organism such as a human being, would be concerning.

However, to prevent this, nature has developed mechanisms to avoid such high levels of error. First, there is base selection, in which only the correct nucleotide fits geometrically into the active site. Second, there is a proofreading system in which a 3′→5′ exonuclease activity detects errors, backs up, removes the incorrect base, and replaces it. Finally, there is also the mismatch repair system, in which specialized proteins scan the newly synthesized DNA after replication to correct errors that may have escaped base selection and proofreading.

2. How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?

A human protein can contain around 400 amino acids. There are 64 possible codons encoding only 20 amino acids, meaning that most amino acids have multiple codon options; on average, there are about three codons per amino acid. The reason not all codons function equivalently is due to codon usage bias, the presence of mRNA secondary structures, critical splice sites, specific folding kinetics, and regulatory signals.

Homework questions: Dr. LeProust

What’s the most commonly used method for oligo synthesis currently?

The most widely used method today is the phosphoramidite method. This approach was developed by Caruthers in the 1980s and, instead of synthesizing DNA in the 5′→3′ direction, it proceeds in the reverse 3′→5′ direction.

Why is it difficult to make oligos longer than 200nt via direct synthesis?

There are several main reasons:

Exponential yield problem. Even with a coupling efficiency of 99.5%, the final yield of the correct product decreases exponentially as the chain grows. For a 20-mer oligonucleotide, the purity is approximately 90%; for a 100-mer, about 60%; and for a 200-mer, around 36% purity. Beyond 200 nucleotides, most of what comes out of the synthesizer is not the desired full-length sequence, but rather a mixture of truncated sequences that are nearly impossible to separate in order to obtain the intended final product.

Depurination. This is an acid-induced form of damage that can accidentally break the bond between the adenine or guanine base and the DNA sugar. Since each synthesis cycle uses acid to remove the DMT protecting group, by the time a strand reaches 200 bases and has been exposed to acid 200 times, the risk that at least one base has undergone depurination becomes high. This can lead to mutations.

Accumulation of mutations from failed coupling events. Some chains that fail to couple properly can become “reactivated” in later cycles. This generates mutations similar to deletions and, in long sequences, purification techniques have greater difficulty separating the affected strands.

Physical limitations of the solid support. The pores in controlled pore glass (CPG) beads used for these reactions can become filled or occupied as the strand grows. This can block the pores and cause entanglement, leading to decreased purity and reduced reaction efficiency.

Why can’t you make a 2000bp gene via direct oligo synthesis?

There are physical and chemical limitations that make this impossible. There is a yield collapse, since each step has approximately 99.5% efficiency. For a 2,000-base chain, the probability that only about 0.00004% of the molecules are synthesized correctly is extremely low, meaning that 99.996% of the product would be a mixture of erroneous sequences.

This leads to the next limitation: purification. At that point, it would be virtually impossible to purify the obtained molecules and isolate the correct full-length product.

Homework Question: George Church

[Using Google & Prof. Church’s slide #4] What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?

The 10 essential amino acids in all animals (i.e., those that animals cannot synthesize de novo and must obtain from their diet) are:

  • Histidine (His, H)
  • Isoleucine (Ile, I)
  • Leucine (Leu, L)
  • Lysine (Lys, K)
  • Methionine (Met, M)
  • Phenylalanine (Phe, F)
  • Threonine (Thr, T)
  • Tryptophan (Trp, W)
  • Valine (Val, V)
  • Arginine (Arg, R) (essential in many animals and conditionally essential in adults, but required in all animals at some stage)

Taking into account that the Lysine contingency is that making an organism dependent on an amino acid like lysine or a lysine analog for survivalcan work as a biocontainment strategy, this makes the lysine contingency an interesting point of view.

If we use lysine as a contingency factor when doing genetic engineering, lysine becomes metabolically central. Because animals cannot synthesize lysine, its availability is already tightly controlled by diet. So it can be used perfectly as a correct way of avoiding that the lab experiments -cells or organisms- cannot scape the lab as they won't be able to find the intake needed for Leucine if they're not in a lab environment.

Week 2 HW: DNA Reading, writing and editing

Part 1

DNA reading, editing and design

Gel Art: Restriction Digests and Gel Electrophoresis

Opening an account in Benchling and importing Lambda

To open my account in Benchling I used my institutiona e-mail.

To Import the Lambda DNA, I visited the website provided and looked at the FASTA file. Then I selected DNA/RNA Sequence in Benchling and copy-pasted the FASTA information in there.

Lamda_benchling Lamda_benchling

Simulating Restriction Enzyme Digestion with the following Enzymes

For this excercise, I did the digestion one by one and then all along. This to be able to understand the hole process and how different digestions create different patterns in the digital gel. The results are the following:

Digital_digestion Digital_digestion

Part 2

Digestion art

I really like butterflies, and I thought that something symmetrical might be the easiest one to generate a propper art piece with gels. So I created this butterfly using the Lambda sequence:

GelArt GelArt

Part 3

DNA design challenge

In recitation, we discussed that you will pick a protein for your homework that you find interesting. Which protein have you chosen and why? Using one of the tools described in recitation (NCBI, UniProt, google), obtain the protein sequence for the protein you chose.

Protein selection

For this, I selected the TAP1 protein, as it is important in my context of work. I'm currently working with the MHCI and II associated peptides for AML (Acute Myeloid Leukemia) and I'm trying to understand how two different drugs affect the transport of MHCI and II peptides for their presentation in the cell surface.

The sequence is:

>tr|A0A8V8TM76|A0A8V8TM76_HUMAN ABC-type antigen peptide transporter OS=Homo sapiens OX=9606 GN=TAP1 PE=1 SV=1

MASSRCPAPRGCRCLPGASLAWLGTVLLLLADWVLLRTALPRIFSLLVPTALPLLRVWAV

GLSRWAVLWLGACGVLRATVGSKSENAGAQGWLAALKPLAAALGLALPGLALFRELISWG

APGSADSTRLLHWGSHPTAFVVSYAAALPAAALWHKLGSLWVPGGQGGSGNPVRRLLGCL

GSETRRLSLFLVLVVLSSLGEMAIPFFTGRLTDWILQDGSADTFTRNLTLMSILTIASAV

LEFVGDGIYNNTMGHVHSHLQGEVFGAVLRQETEFFQQNQTGNIMSRVTEDTSTLSDSLS

ENLSLFLWYLVRGLCLLGIMLWGSVSLTMVTLITLPLLFLLPKKVGKWYQLLEVQVRESL

AKSSQVAIEALSAMPTVRSFANEEGEAQKFREKLQEIKTLNQKEAVAYAVNSWTTSISGM

LLKVGILYIGGQLVTSGAVSSGNLVTFVLYQMQFTQAVEVLLSIYPRVQKAVGSSEKIFE

YLDRTPRCPPSGLLTPLHLEGLVQFQDVSFAYPNRPDVLVLQGLTFTLRPGEVTALVGPN

GSGKSTVAALLQNLYQPTGGQLLLDGKPLPQYEHRYLHRQVAAVGQEPQVFGRSLQENIA

YGLTQKPTMEEITAAAVKSGAHSFISGLPQGYDTEVDEAGSQLSGGQRQAVALARALIRK

PCVLILDDATSALDANSQLQSLMKQRVCGEVLRMGNVGVLGVVSRASSDPVRWSSSCTKA

LSGTPAQCFSSPSTSAWWSRLTTSSFWKEALSGRGEPTSSSWRKRGATGPWCRLLQMLQN

ESLLRPAHSISLPFLLSVVENHSCRVGSCLQDELLEICLECVTSFPSSS

Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.

For this, the step by step can be found in the image below, but the explanation is: I used the resources in NCBI for proteins and selected BLAST. In BLAST, I selected tblastn, which translates protein to the translated nucleotide. I copied the FASTA sequence in above for the TAP1, and selected the Human RefSeqGene sequences(RefSeq_Gene). I did the search and selected the first sequence as it had the best match. The origin of this protein is:

1 ctcaggtgga gcagctcctg tacgaaagcc ctgagcggta ctcccgctca gtgcttctca

61 tcacccagca cctcagcctg gtggagcagg ctgaccacat cctctttctg gaaggaggcg

121 ctatccggga ggggggaacc caccagcagc tcatggagaa aaaggggtgc tactgggcca

181 tggtgcaggc tcctgcagat gctccagaat gaaagccttc tcagacctgc gcactccatc

241 tccctccctt ttcttctctc tgtggtggag aaccacagct gcagagtagg cagctgcctc

301 caggatgagt tacttgaaat ttgccttgag tgtgttacct cctttccaag ctcctcg

ReverseTrans ReverseTrans

Codon optimization

Once a nucleotide sequence of your protein is determined, you need to codon optimize your sequence. You may, once again, utilize google for a “codon optimization tool”. In your own words, describe why you need to optimize codon usage. Which organism have you chosen to optimize the codon sequence for and why?

For this optimization, I used human (Homo sapiens) as a organism, as the cells I'm currently working with are cell lines derived from human patients and will be more usefull for me in the context of my current work. The tool I used was benchling and the step by step is in the figure below.

CodonOp CodonOp

You have a sequence! Now what?

What technologies could be used to produce this protein from your DNA? Describe in your words the DNA sequence can be transcribed and translated into your protein. You may describe either cell-dependent or cell-free methods, or both.

To have a good production of this selected protein (TAP1), I would choose a cell-dependent approach. For this, I would use recombinant expression in mammalian cells. This can be done using HEK293 cells, CHO or HeLa. I would use cDNA cloning optimized in a expression vector like a CMV promoter, we can also do transfecction by lipofection or electroporation and we can also use a stable selection or a transcient expression.

There are more approaches like the Baculovirus expression system, or the bacerial expression using E. coli, but it has some cons as some practices to extract it are incompatible with further experimentation with AML cell lines.

How the DNA sequence becomes TAP1? First we have a transcription step, where the optimized TAP1 DNA sequence is inserted into an expression vector under the control of a strong promoter (CMV for example). Inside the cell, the RNA polymerase binds to the promoter, then the DNA template strand is read from 2'-5' and a complementary mRNA strand is synthesized from 5'-3'. In eukaryotic systems, the pre-mRNA undergos for 5' capping, polyadenylation and splicing. Then, the mature mRNA is exported to the cytoplasm. The Second Step is translation. In this, the ribosome binds to the 5' cap and scans for the start codon (AUG), and tRNAs match codons with anticodons, and peptide bonds are formed in the ribosome, using the peptidyl transferase center. Finally, the growing polypeptide emerges co-translationally. After this step, the protein folds, associates with TAP2 and becomes part of the antigen-processing machinery.

Part 4

Prepare a Twist DNA Synthesis Order

By following the instructions, I created my expression cassette. I then followed the instructions to create the Twist file for the protein of interest and generated the final product. You can find the final construct here. Also, you can see the results in the figures below.

Twist Twist costitutive_TAP1_Twist costitutive_TAP1_Twist

Part 5

DNA Read/Write/Edit

DNA Read

(i) What DNA would you want to sequence (e.g., read) and why? This could be DNA related to human health (e.g. genes related to disease research), environmental monitoring (e.g., sewage waste water, biodiversity analysis), and beyond (e.g. DNA data storage, biobank).

My main DNA of interest are genes coming from Tumor-Asociated-Antigens, in order to be able to quickly diagnose cancer in some patients. At the same time, I would also love to do rapid bacterial DNA sequencing (coliforms) for remote places like the countryside of my country (Guatemala) for rapid diagnosis for infections. Diarrhea is still part of the top1 reasos of infant death in my country and is normally coming from undiagnosed infections.

(ii) In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why? Also answer the following questions:

  • Is your method first-, second- or third-generation or other? How so?
  • What is your input? How do you prepare your input (e.g. fragmentation, adapter ligation, PCR)? List the essential steps.
  • What are the essential steps of your chosen sequencing technology, how does it decode the bases of your DNA sample (base calling)?
  • What is the output of your chosen sequencing technology?

Because I am interested in two different applications — (1) tumor-associated antigen (TAA) genes for cancer diagnostics and (2) rapid detection of coliform bacteria in remote areas of Guatemala — I would choose two complementary sequencing technologies, each optimized for a different context: high-accuracy clinical genomics and portable field diagnostics.

For cancer-related sequencing, I would use Illumina Sequencing-by-Synthesis (SBS) whidely implemented in platforms like NovaSeq or NextSeq. This is a second-generation sequencing technology. The input would be Genomic DNA from tumor biopsy or cfDNA from liquid biopsy.

The steps needed to do this are:

  • DNA extraction
  • Fragmentation (mechanical or enzymatic)
  • End repair and A-tailing
  • Adapter ligation
  • PCR amplification (optional depending on protocol)
  • Cluster generation on flow cell (bridge amplification)

The ilumina technology works with sequencing-by-synthesis with reversible terminators. Each nucleotide has a fluorescent tag and a reversible terminator blocking extension (each base has a different color). In the first step of the sequencing with Illumina, one base is incorporated and the cell is imaged. Thanks to the individual fluorescent base, the fluorescent signal is identified and the terminator and dye are cleaved. This cycle repeats until we have the sequence, because the base calling is performed by detecting the fluorescence at each cycle.

As an output, we obtain millions to billions of short reads and FASTQ files with high-accuracy variant detection. This is what makes this technology to identify mutations in TAA genes. The technology is perfect for detecting SNVs, low-frequency tumor mutations and high quality sequencing.

For rapid bacterial DNA sequencing in remote areas, I think the perfect technology to use is the Oxford Nanopore Technologies "MinIon". The input used for this technology are bacterial genomic DNA from water samples and possible 16S amplicons or the whole metagenomic DNA needed. This technology is a third generation sequencing, as it sequences single molecules, doesn't require clonal amplification, can read very long DNA fragments and happens in real time.

To do this, it is important to first prepare a library (depending on the protocol to use). For the library preparation, it is important to do a DNA extraction with a rapid field kit, use optional PCR amplification for the 16S, prepare the adapter ligation and load it onto a flow cell.

The steps for this sequencing technology are:

  • A DNA molecule passes through a protein nanopore.
  • An electric current runs across the pore.
  • Each k-mer alters the ionic current in a characteristic (unique) way and this current change is measured, which leads to the detection of the specific base.
  • Machine learning algorithms are used to decode the current signals into base sequences, which allows a direct reading of the DNA in real time, detection of modified bases, and generating very long reads.

As an output, we will obtain long-read sequences, real-time data, FASTQ files and immediate taxonomic classification. This is ideal for coliform detection as it allows rapid pathogen detection, field use and can work in low infrastructure envirnments.

DNA Write

(i) What DNA would you want to synthesize (e.g., write) and why? These could be individual genes, clusters of genes or genetic circuits, whole genomes, and beyond. As described in class thus far, applications could range from therapeutics and drug discovery (e.g., mRNA vaccines and therapies) to novel biomaterials (e.g. structural proteins), to sensors (e.g., genetic circuits for sensing and responding to inflammation, environmental stimuli, etc.), to art (DNA origamis). If possible, include the specific genetic sequence(s) of what you would like to synthesize! You will have the opportunity to actually have Twist synthesize these DNA constructs! :).

In line with what I would like to accomplish in the future, I would like to synthesize a genetic sensing circuit capable of detecting if a tumor-associated-antigen peptide in an RNA-based cancer vaccine is being correctly expressed and presented on MHC class I (or II) molecules. The specific purpose of this construct would be to function as a biological validation system and confirm the correct antigen processing and presentation of the antigen of interest. This would help labs (like mine) as a quality-control step and quantify the functional antigen presentation of the peptide of interest.

(ii) What technology or technologies would you use to perform this DNA synthesis and why? Also answer the following questions: 1. What are the essential steps of your chosen sequencing methods? 2. What are the limitations of your sequencing method (if any) in terms of speed, accuracy, scalability?

DNA Edit