Pre Week 2 Lecture Questions

Professor Jacobson’s Questions

Q1: Polymerase Error Rate vs. the Human Genome

Raw polymerase error rate: DNA polymerase III (the baseline replicative polymerase) misincorporates roughly 1 in 10^4 to 10⁵ nucleotides during synthesis.

I fyou factor in built-in proofreading checkpoints this error rate reduces to about 1 in 10⁷.

After mismatch repair (MMR) and other post-replicative repair pathways, the final observed mutation rate drops to approximately 1 in 10⁹ - 10¹⁰ per base pair per cell division.

The human genome is ~3.2 x 10⁹ bp (diploid: ~6.4 x 10⁹ bp). So even with the correction systems ,so with the above rate you could predict 0.3-6 new mutations per human cell division

Q2: How Many combinations to DNA Codes for an Average Human Protein?

Number of possible DNA sequences: For a 400-AA protein:

~3⁴⁰⁰ ≈ 10¹⁹¹ different DNA sequences

Average human protein length: ~480 amino acids , round to 400.

Codon degeneracy: The genetic code has 61 sense codons encoding 20 amino acids, giving an average redundancy of ~3 codons per amino acid. The geometric mean of the degeneracy factors across all 20 amino acids is approximately 2.8-3.2.

Why don’t all of these “synonymous” sequences work in practice?*

Codon usage bias: Every organism has preferred codons matched to its tRNA abundance. Rare codons cause ribosome stalling, reduced translation rate, and lower protein yield.
mRNA secondary structure: Certain sequences fold into stable hairpins or structures that block ribosome scanning or translation initiation.
GC content effects: Extreme GC or AT content affects transcription efficiency, mRNA stability, and chromatin structure.
Cryptic regulatory signals: Random sequences may inadvertently create splice sites, polyadenylation signals, transcription factor binding sites, or promoter elements.
CpG dinucleotide methylation: In mammals, CpG sites are targets for methylation and subsequent deamination, leading to mutational hotspots.
Codon pair bias: Adjacent codon combinations affect translation speed and accuracy beyond individual codon frequency.
mRNA half-life: Sequence composition influences mRNA decay rates via AU-rich elements or other destabilizing motifs.

This is why codon optimization is a critical step in synthetic biology and heterologous gene expression.

Dr. LeProust’s Questions

Q1: Most Commonly Used Method for Oligo Synthesis

Phosphoramidite chemistry on controlled-pore glass (CPG) solid supports, performed in a 3’→5’ direction. Developed by Marvin Caruthers in the ’80s, this method is the current standard for commercial oligonucleotide synthesis.

–

Q2: Why Is It Difficult to Make Oligos Longer Than 200 nt?

The fundamental problem is compounding coupling inefficiency. Even with an excellent per-step coupling efficiency of ~99.5%, the yield of full-length product drops exponentially:

Beyond ~200 nt, the full-length product becomes a minority species in a sea of truncation products. Additional failure modes compound the problem:

Depurination accumulates with each acid-catalyzed detritylation step, creating abasic sites.
Branching and deletion mutations increase with sequence length.
Steric Hindrance Synthesis is usually performed on solid supports like Controlled Pore Glass (CPG). As the oligonucleotide grows longer, it can clog the pores of the support, inhibiting the diffusion of reagents to the reactive 5’-end and decreasing coupling efficiency
Purification becomes intractable it becomes nearly impossible to purify out the target sequences from similar sized failed sequences (-1 or -2bp )

Q3: Why Can’t You Make a 2000 bp Gene via Direct Oligo Synthesis?

At 99.5% coupling efficiency over 2000 steps:

The 2000-mer Problem: For a 2000-mer synthesis, assuming an average stepwise yield of 99.7%, the overall yield of the full-length product would be only 0.25%.

Failure Sequences: The majority of the product in a 2000 bp synthesis would be truncated sequences (shorter than 2000 bp) capped at the growing end, making them extremely difficult to separate from the desired full-length product

So you would recover essentially zero full-length product. The synthesis would just yield a soup of truncated fragments.

Prof. Church’s Questions

Q1: The 10 Essential Amino Acids & the “Lysine Contingency”

The 10 essential amino acids (those that animals cannot synthesize and must obtain from diet):

#	Amino Acid	3-Letter	1-Letter
1	Histidine	His	H
2	Isoleucine	Ile	I
3	Leucine	Leu	L
4	Lysine	Lys	K
5	Methionine	Met	M
6	Phenylalanine	Phe	F
7	Threonine	Thr	T
8	Tryptophan	Trp	W
9	Valine	Val	V
10	Arginine*	Arg	R

*Arginine is semi- essential — required during growth and stress but synthesizable in limited quantities by adults.

The “Lysine Contingency” (of Jurassic Park): They engineered their dinosaurs to be lysine-deficient, so the animals would die without exogenous lysine supplementation as a plot device for a biological “kill switch.”

But this would not actually work in real world as a bio- containment strategy:

All vertebrates are already lysine-auxotrophs. Lysine is essential for every animal on the planet. Making the dinosaurs “lysine-dependent” is no different from their natural state.
Lysine is abundant in the environment. Meat, fish, insects, and many plants are rich in lysine. Any escaped dinosaur with a carnivorous or omnivorous diet could get plenty of lysine from their diet
A true contingency would require dependence on unavailable. — something not found in the wild environment or at least not at levels found in natural envrioment. A synthetic or unnatural cofactor, or an severe nutrient or possibly insulin dependency would be a far more realistically applicable approach.

Q3: The DARPA GO project

This is an exceedingly interesting mission as it seems it would require template free nucelotide synthesis with orthogonally light activated polymermase-like complexes for each nucleotide or perhaps a super responsive differentially activated complex dependent on the wavelength or pulse pattern? I wonder if it could be some super huge multi unit complex with the activation under secondary system based optogenetic control ? would it be fast enough?

Its a very cool problem and I am still deep in the rabbit hole of it, if you have recommended papers on this do send them my way!