Exam 2: FINAL: Topics 14-23
NUCLEIC ACIDS
Unit C
NUCLEIC STRUCTURE
Topic 14
Nitrogenous Bases
- two types of bases
1. Purine
- adenine
- guanine
2. Pyrimidine
- cytosine
- thymine/uracil (differ by methyl)
- the ring structure of all nitrogenous bases are essentially flat
- flat surfaces of the bases are relatively hydrophobic
- edges have hydrogen bonds
Nucleoside
- a molecule which nitrogenous base attached to a sugar
- the base is attached at the anomeric carbon
- two sugars used are ribose and deoxyribose
Nucleotides
- add one or more phosphate groups to a nucleoside to obtain a nucleotide
- up to 3 phosphates are added to 5' carbon
Nomenclature
Base | Nucleoside | Nucleotides |
Adenine (A) | Adenosine | Adenosine monophosphate (AMP) |
Cytosine (C) | Cytidine | Cytidine monophosphate (CMP) |
Guanine (G) | Guanosine | Guanosine monophosphate (GMP) |
Thymine (T) | Thymidine | |
Thymidine monophosphate (TMP)Uracil (U) | Uridine | Uridine monophosphate (UMP) |
Polynucleotides
- individual nucleotides can be joined together to make polynucleotides
- individual nucleotides joined together by singe phosphate that connects 5' carbon of one nucleotide to the 3' carbon of the next
Phosphodiester bond
- 5' end - 5' carbon sugar not attached to another
- 3' end - 3' carbon sugar not attached to nucleotide
Watson-Crick Base Pairing
- hydrogen bonding groups on edge can be brought together in different combinations
- to allow bases to form base pair
- adenine-thymine and guanine-cystine pairs have complement H-bond groups
- each base pair contains one purine and one pyrimidine
- nitrogenous bases come together other ways and combinations to form pairs with at least two H bonds
- Watson-Crick base pairing is the most common in DNA
- assume all is watson crick in DNA
DNA Double Helix
- two polynucleotides with complementary sequences coming together to form H bonds between bases in two strands
- no matter what sequence, sugar backbone will stay the same distance apart
- if backbones were completely straight, base pairs will be held too far apart for flat surfaces of base pairs to touch each other
- therefore backbones curve in double helix
- to bring surfaces of base pairs into contact with each other
- base pairs on inside of double helix and sugar phosphate backbones are on the outside
- curved in right handed fashion
- be able to tell the difference between left and right
B- DNA Double Helix
- base pairs almost but not quite horizontal relative to long helix axis
- bases come into contact with each other
- sugar-phosphate groups are on outside
- avoids bringing negative phosphate groups into close contact
- two strands are anti-parallel
- 5' to 3' and 3' to 5'
- each complete turn of the helix contains 10 base pairs and spins distance of 34 angstroms = 3.4nm
- double helix diameter - 20 angstroms = 2.0nm
- sugar phosphate backbones form ridges that spiral around center double helix
- spaces between ridges where base pairs are exposed to solvent, are called grooves
- one groove is wider than the other
- wider groove - major groove
- narrow groove - minor groove
- proteins can interact with the edges of the bases in both types of groove but bases more easily accessed through major grooves
- the diameter of protein alpha helix just right to fit into the major groove
- proteins make base-specific contact with the major groove more often than with the minor groove
Reason for Major and Minor Grooves
- sugar-phosphate backbones are not symmetrically oriented relative to the base pairs
Factors Stabilizing Double Helix (DH)
1. Base Stacking*
- major force hold together DH
- between flat surfaces of base pairs
- cause of twist of double helix base pairs don't overlap completely on top of each other
- they do partially overlap
- pi electron clouds of bases will form transient dipoles that attract each other (london forces)
- stacking interactions hide hydrophobic surfaces of the bases from water and are the strongest forces holding helix together
2. Base Pairing*
- H bonds with base pairs contribute to the stability of the helix
- most stable if two strands perfectly complementary to each other to maximize the number of H bonds that can be made
*RNA-DNA structure not B-DNA its A-DNA*
*These forces apply to RNA helix too*
Nucleic Acid Hybridization
- during natural biochem processes (rep, transcript) strands in DNA must separate
- denaturation must happen
- then come back together (renature)
- many technologies in biotech rely on denature and renature
- heat DNA high temperature non covalent interactions between strands broken and strands separate
- can follow denaturation DNA by monitoring the ability of DNA solution to absorb light
- with wavelength of 260nm
- single strand absorbs light more strong
- as DNA denature, absorbance increases
- change in absorbance is not instantaneous
- occurs over a range of temperatures
- temperature of DNA 50% double stranded (native) and 50% single stranded (melted)
- melting temperature (Tm)
- process is reversible
- if you cool DNA two strands can anneal to each other (renature)
- if you cool single strand DNA too quick bases one strand might not find their proper base-pairing partner in the other strand
- results in jumbled mess of improperly annealed DNA
- proper base pairings - cool DNA slowly or to temperature bit below melting temperature at first
- this will allow improperly base-paired strands to seperate from each other now and then giving them a chance to find the correct base-pairing partners
- often want to anneal small fragments of DNA to each other or to longer pieces of single stranded DNA
- DNA synthesis machines been invented that allow us to easily make oligonucleotides several dozens of bases in length
- can denature long sections double-stranded DNA and cool them the presence of shorter synthetic oligonucleotides
- some shorter ones will anneal longer single-stranded DNA
- process also works for DNA and RNA
- if you synthesize DNA that is complementry to RNA molecule can create DNA-RNA hybrid
Factors Affecting Tm
- major factors influence temperature at which oligonucleotides will anneal to longer nucleic acid
Factors | Higher Tm |
1. Length of oligonucleotide | Longer |
2. GC Content: caused by base stacks and H-bonds | GC - Rich |
3. Degree Complementarity | No Mismatches |
4. Salt Concentration: increases polarity environment making less favourable hydrophobic surface be exposed (melted) | High Salt |
5. Organic Solvent: decreases polarity easier for single strand | No Organics |
6. H-bonding Compounds: urea, formamide interfere base pairings (destabilizes) | No Compounds |
7. pH: extreme changes protonation states of bases hindering ability to base pair | Neutral pH |
Differences Between RNA and DNA
1.
- RNA contains ribose
- DNA contains 2' deoxyribose
- extra OH on ribose prevents double helices containing RNA from adopting B-DNA structure
- RNA forms double helices but geometry is not the same
- 2' OH in RNA causes instability in solution at high pH
- because OH can be deprotonated and attack the backbone phosphate causing break in backbone
2.
- RNA contains Uracil
- DNA contains Thymine
3.
- RNA is much shorter than DNA in cells
- few 1000 nucleotides
- DNA 100 000s and millions of nucleotides
4.
- DNA exists in complementary base paired while RNA synthesized single strand
- areas base pairing forms within RNA strand
- genes dictate which areas
- strongly influences 3D model
5.
- RNA bases modified more frequently
- DNA has just few modifications
RNA Structure
tRNA
- example of specific folding
- three stem loop structures with self-complementary regions separated region not base paired
- stem loop very common in RNA
- "secondary structure" RNA
- 3D folding - "tertiary structure"
- tRNA adopt L shape (hair-dryer)
- weird symbols = modifications
- secondary and tertiary of longer RNAs are more complicated
rRNA
- complex
- 1500+ nucleotides
- many stem loops and bulges
- very far regions come together
- sequence RNA strongly influences overall shape
- DNA usually same structure
DNA must be condensed but accessible
- the human genome is 3 billion+ base pairs
- one base pair of 0.34nm thick
- 3x10a bp x 0.34nm/bp = 102cm (total)
- and two copies of each chromosome
- nucleus diameter about 6μm = 6 x 10-4cm
- DNA must be condensed yet accessible for transcription
- highly orderly system
Heterochromatin: highly packed DNA
Euchromatin: less condensed regions
- transcription likely takes place
Chromatin: complex of chromosomal DNA plus all associated packing proteins
Model Chromosome Packing
- basic unit of packing - nucleosome core particle
- consists of DNA wrapped around octamer of eight histone protein
- 4 types of histones present in 2 copies
- 1. H2A
- 2. H2B
- 3. H3
- 4. H4
- H1 exists but is not part of the octamer
- histones small protein each between 100-200 amino acids
Histones Rich in Arginine and Lysine
- both appear more than normal proteins
- both positive at neutral pH
- makes them good to react with a negative backbone of DNA molecules
Conservation of Histones
- histones high degree conservation through evolution
- sequence between humans, flies, yeast
- barely different, chemical mutations (where mutation similar to original amino acids)
- conservation unusual but common for histones
- residues conserved carry important functions
- every position histone protein important for function
Nucleosome Core Particles
- crystal structures DNA wrapped around histone octamer shows 146 base pairs of DNA make 1.65 turns around outside of protein complex
- most histone proteins are compact within the structure, ends of some histone proteins are protruding from middle of nucleosome core particle
- ends important in the regulation of gene expression
- packing DNA into nucleosomes tends to make promoter regions inaccessible which means transcription genes generally inhibited by presence of nucleosome core particles
Beads on String
- if treat chromatin that all proteins except histone octamer released
- DNA looks like series of beads on string
- each bead nucleosome core particle (NCP)
- string linker regions are naked DNA that exist between NCP
Nucleosome = NCP + one linker
- linker regions range length 3-80 base pairs
- each nucleosome about 200 base pairs on average
30nm Chromatin Fiber
- requires H1
- one H1 binds to NCP
- binds at point where double helix leaves core particle, controlling angle of DNA relative to core particle
- H1 allows the organization of nucleosome core particles into a regular structure that is 30nm thick
Higher Order Packaging
- beyond 30nm less regular structure
- 30nm fibers form loops varying in size from about 30 000 to 200 000 bases
- loops organized by nuclear matrix (scaffold contains RNA and proteins)
DNA REPLICATION
Topic 15
- DNA replication is very important
- DNA replication is semiconservatitive
- each double helix produced contains one strand from original double helix and a newly synthesized one
- each original strand template that's copied to make new complementary strand
- sequence one strand dictates sequence of others should be
DNA Replication Four Steps
- Initiation
- Priming
- DNA Synthesis - proofreading
- Ligation
Initiation in E. Coli
- bacteria like E Coli have small circular genomes
- E Coli has 1 replication origin with defined sequence that's recognized be set initiator proteins
- when its time to begin replication initiator proteins separate two strands in AT rich region
- At easier pull apart less H bonds
- forms replication bubble
- points at double stranded DNA becomes single stranded called replication forks
- after region unwound SSBPs associate with DNA to keep strands separate
- one copy helicase binds at each replication fork
- Helicase Function
- separate strands further each direction moving rep fork along DNA and enlarging the replication bubble
Initiation in Eukaryotes
- generally have more DNA than bacteria and multiple chromosomes
- for DNA replication happen proper time need multiple replication orgins
- replication can happen many places at same time
- sequences of replication origins in eukaryotes are generally less well defined than those of bacteria
- hard locate eukaryotic replication origins just by analyzing DNA sequence
- helicase binds replication origins along with initiator proteins
- upon activation helicase and initiator proteins will separate strands at the origin to create replication bubble
- many copies of SSBPs also exist in eukaryotes to prevent DNA from re-annealing
After initiation structure
- after initiation in eukaryotes or bacteria essentially same structure
- replication bubble with helicase at both replication forks
- ready use separate strands as templates for DNA synthesis
Priming
- during DNA rep existing strand of DNA used as template that dictates which was joined together in new strand
- DNA polymerase unable begin new strand where no strand exist
- enzyme able to add nucs to end of existing but cant join first few nucs together to form new strand
- to start synthesis enzyme - primase can create short RNA strand from individual nucleotides using DNA as template
- primer 10-20 nucs in length
- primase not very accurate often incorporated wrong base (not problem cause RNA changes to DNA)
- after primer made DNA polymerase can add nucleotides to 3' end of primer
- cause 5' end new strand made first and nucs added 3'
- synthesis in 5'-3'
- DNA rep always 5'-3'
DNA polymerase
- all of them have similar structures
- two bacterial polymerases are commonly shown
- "palm region'
- catalytic site for DNA synthesis
- DNA sits in cleft between fingers and thumb regions
- incoming nucs bind to fingers and thumb helps hold DNA in place
- DNA polymerase itself has tendency to dissociate from template during synthesis
Sliding Clamp
- to keep DNA from dissociating and making sure stays on template protein complex used
- sliding clamp
- it binds to polymerase while encircling double stranded DNA created by polymerase
- DNA goes through whole in sliding clamp said to make DNA polymerase more processive making it possible for polymerase to synthesize very long DNA strands without dissociating from template
DNA Synthesis
- nucleotides added to 3' end of growing strand
- deoxynucleoside triphosphates bind at active site in a position to make base pair with the base on template strand
- if structure of base pair is wrong then catalysis will usually not occur and dNTP released
- if proper base pair it will enable 3' OH on end of strand to attack 1st phosphate on incoming nucleotide, resulting removal of pyrophosphate from nucleotide
- and formation of new phosphodiester bond that incorporates incoming base into strand
- polymerase moves one base along template and use OH on new base to add next base to new strand
- release of pyrophosphate in reaction in energetically favourable
- so energy that drives DNA synthesis forward comes from cleavage of pyrophosphate from nucleoside triphosphate substrate of reaction
Proofreading
- occasionally DNA polymerase makes a mistake
- can incorporate mismatches base can insert more bases than it should or skip base
- structure of DNA double helix will be abnormal due to any of these mistakes
- causes DNA polymerase to stall or pause
- often incorrect base or bases will dissociate from template and enter nearby catalytic site on DNA polymerase with 3'-5' exonuclease activity
- nuclease - enzyme that degrades nucleic acids there are two main types
- Exonucleases - which cut nucleotides off the end of a nucleic acid strand one at a time
- Endonucleases - which cut sugar phosphate backbone in middle of a strand
- exonuclease activity of DNA polymerase is a proofreading function that allows the enzyme to correct its mistakes
- after misincorporated nucleotide has been removed growing strand can reenter the DNA synthesis active site and try again
- proofreading function not perfect
- sometimes misincorporated bases will result during DNA synthesis
- DNA rep produce incorrect base once every 10^6 and 10^7 bases synthesized
Some Antivirals Target Viral DNA Synthesis
- some drugs target virus enzymes that carry out replication
- Acylovir and Zidovudine or AZT both work by same mechanism
- AZT resembles nucleoside except has N3 (azide) at 3' carbon
- Acyclovir looks a bit like nucleoside
- except 3' and 3' carbons completely missing
- when our cells take up these compounds they will phosphorylate them on -OH of carbon that's in position of 5' carbon of deoxyribose
- then viral DNA polymerase will use drug during DNA synthesis
- when either compounds incorporated into strand DNA replication must stop cause no 3' OH onto which nucleotide can be added
- called chain terminators
- prevent the virus from copying genome and stop its proliferation in our cells
- human DNA polymerase more selective and will not incorporate these compounds nearly as frequently when we are copying own genome
DNA Synthesis
- 5' ends are not extended
- as helicase unwinds double helix to make more single-stranded DNA
- new primers periodically created on the problematic template strand
- cause more and more of DNA in unwound this moves rep fork along DNA toward zipped DNA
- DNA polymerase simply able follow helicase synthesizing DNA in 5' to 3' direction
- DNA on 5' new strand and 3' template strand said to be continuous and its leading strand
- DNA synthesis on other strand proceeds in a direction opposite to which helicase moving
- as helicase unwinds DNA it exposes single-stranded DNA on template that can't be copied using first primer that was made
- therefore new primers are made on this strand that allow DNA poly to synthesize comp strand
- the process is repeated many times resulting in discontinuous DNA synthesis along lagging strand
- result is creation many relatively short pieces of nucleic acid base paired to template strand
- pieces called Okazaki Fragments
E. Coli Replication Fork
- helicase unwinds DNA moving left to right
- DNA poly III - does most DNA synthesis
- DP3 makes new polynucleotide on each template
- sliding clamp helps keep DNA polymerase associated with the template
- on leading strand DNA synthesis easy cause new strand made same direction as helicase moving
- on lagging strand primase must repeatedly create new primers to allow DNA polymerase to create a complementary strand to template strand that has been unwound by helicase
- helicase (2-3), primase and DNA poly are part of large protein complex at rep fork
- lagging strand DNA polymerase starts out bound to helicase as well
- sometimes remain bound while making DNA but sometimes its released from helicase
- sometimes third copy of DNA polymerase binds to helicase to be ready to start synthesis on the lagging strand when the next primer ready
- single stranded binding proteins keep the DNA from re-annealing or making stem loop structures
- SSBPS binds loosely enough that DNA poly can push them out of the way when it comes time to use the template to synthesize new strand
- DNA poly 1 and ligase involved in replacing RNA primers
Lagging Strand Synthesis Discontinuous
- results in a series of Okazaki fragments that must be joined or ligated together to create single stranded complementary to original
- RNA primer that starts each okazaki fragment must be removed
- removal of primers in accomplished diff in bacteria and eukaryotes
- Bacteria
- DNA poly III is enzyme that adds DNA to 3' end
Ligation in Bacteria
- DNA poly III enzyme DNA 3' end when enzyme (DP3) meets primer from previous okazaki fragment it dissociates
- leaving behind nick (break in one of back bone)
- DNA fully base paired either side nick but sugar phos backbone not intact
- after DNA poly III dissociates, DNA poly I takes over
- DNA poly I - have ability to add nucs to 3' ends of strands and 3' - 5' proofreading activity but also 5'=3' exonuclease activity that allows it to degrade primer from previous okazaki fragment
- as DP1 moves replace primer with DNA resulting in moving or translating nick from one place to the other on lagging strand
- when primer been replaced DNA D1 stops degrading and synthesizing DNA and dissociates from template leaving behind nick
- nick can be sealed by enzyme DNA ligase
- joining more recently synthesized Okazaki fragments to rest of new strand
Primer Removal Eukaryotes
- many organisms have enzyme called RNase H that degrades RNA that's based paired with DNA
- as newer okazaki fragments synthesized RNase H can degrade primer on older fragments such that by time replicative DNA polymerase fills in gap, RNA primer completely gone
- DP then dissociates leaving nick
- alternative mechanism RNA primer not degraded in advance
- replicated DNA poly doesn't dissociate from template strand when it replicates and reaches the primer
- it pushes forward and displaces the primer synthesizing DNA in its place
- primer exists as single stranded flap endonuclease can come along and specifically cleave the backbone at point where flap emerges from double helical structure
- also leaves behind nick that must be joined together
DNA ligase seals nick
- closing nick by DNA ligase requires energy input
- in bacteria obtained energy from cleaving middle phosphodiester bond in nuc NADH to split molecule into two mononucs
- in humans and other eukaryotes DNA ligase uses ATP
- after ligase done both daughter double helices are intact and replication is complete
When Replication Bubbles Meet
- remember that in eukaryotes each chromosome has multiple origins
- when replication forks from adjacent replication bubbles meet
- leading strand one fork is lagging strand of other strand
- removal RNA primers and ligation occurs normal
ENd of Replication Problem
- at very 3' end of template strand always nucs that cant be copied into new strand
- even if primer could be synthesized at very end of chromosome RNA would have to be removed leaving some uncopied template strand
- called 3' overhang - cause 3' end extends beyond 5'
- impossible to copy each template at extreme 3' end
- linear chromosomes shorten with each cell generation -> resulting in loss of DNA
Telomerase
- to deal with issues of DNA shortening embryonic cells express enzyme - telomerase
- purpose of enzyme is to extend 3' ends of chromosomes
- if 3' overhand long enough then there will be enough room to add another primer during replication and creating double stranded DNA to lengthen chromosome
- telomerase have both protein and RNA subunits
- short section RNA used as template for synthesis of DNA at 3' end
- enzyme the shifts position and adds another 6 bases TTAGGG
- cycle repeats many times lengths of chromosomes are maintained during embryo development
- lengths of chromosomes are maintained during embyro development
- most cells expression telomerase stops shortly after birth such that DNA rep from that point on results in loss of DNA from ends of chromosomes
- cells that have divided many times have shorter chromosomes length one factor that contributes to cells entering senescent state after which they don't divide
- shorter chromosomes may play some role in ageing process
- most cancer cells, telomerase expression has been reactivated which helps remote limits on how many times cancer cells can divide
Telomeres
- are protein - DNA structures formed at ends of chromosomes
- cause of way which ends chromosomes are extended by telomerase
- sequence of DNA in telomeres consists of many repeats of same six nucleotides
- in humans telomeres extend for 2000 to 10 000 base pairs of repeated sequence
- telomeres do not contain genes or other functional DNA elements so nothing lost
- chromosomes end in 3' overhang to prevent ends of chromosomes from being mistaken for unwanted double strand breaks in DNA
- 3' overhang folds back to displace earlier section of same strand
- creating structure called T loop
- structure stabilized by proteins
- T loop is way of arranging natural ends of chromosomes so that cells can distinguish them from double strand breaks, which cell generally wants to repair as quick as possible
DNA REPAIR
Topic 16
DNA Damage
- any unintended physical or chemical change in DNA
- information contained in sequence of DNA represents genetic instructions
- RNA continually being synthesized and degraded - transcription
- DNA in cells is permanent
- chromosomes are large and subject to damage
- sometimes changes DNA good often its damage
- effects DNA damage
- harmful to proper function
- reduced cell function
- cancer can even be caused
- this reasons cell strive preserve/protect DNA
- cells have evolved repair mechanisms to identify and correct damage to DNA
Types of DNA Damage
1. Copying Mistakes
- DNA makes error everyone in 106 nucleotides
- could incorporate wrong base, extra base, or could skip one or more bases
- any of the mistakes will lead to DNA double helix having a distorted structure
- changes sequences could have functional consequence
Result of unrepaired replication Error
- when mutation paired with complement base cell has no reason to fix it
- progeny cell will contain point mutation
- mutations different effects depending on mutation type and location
Types of Mutation in Protein Coding Sequence
- mutation that occurs in protein coding region could make large changes to protein that is coded and produced
- protein coding genes organized - triplet codon that specify different amino acids
Silent:
- some mutations won't change which amino acid you are coding due to redundancy of genetic code
Missense:
- mutations that change amino acids at affected codon
Nonsense:
- mutations introduce stop codon which will cause premature termination of translation of mRNA
Frame Shift:
- insertion or deletion that causes reading frame to shift affecting all codons
- insertion or deletion of multiple of 3 bases said to be in frame, since does not change reading frame
2. Depurination: loss of A or G
- spontaneous hydrolysis of guanine or adenine base from deoxyribose sugar
- creates abasic site on DNA (site with no base)
- sugar phosphate backbones remains intact
- when DNA polymerase sees abasic, stops (need continue for cell division)
- to solve problem DNA poly dissociates
- translation DNA polymerase recruited
- to synthesize DNA past sites of damage
- catalytic sites of translation polymerase less demanding in terms of structural requirements
- such that can continue synthesis even if structure of DNA is abnormal
- but translation polymerase makes more mistakes than DNA polymerase
- at abasic site translation polymerase will
- skip site entirely - resulting in deletion
or
- will put random base new strand opposite abasic
- therefore translation polymerase is more likely to result in mutations in the DNA
- cell would rather use translation polymerase and risk mutation than not replicate at all
- need replication for cells to divide
3. Deamination: conversion of amine to carbonyl
- most often occurs at cytosine residues to produce uracil
- change does not hinder DNA replication but could affect sequence of protein or regulation of gene expression
4. Pyrimidine Dimers
- uv light causes ring form between adjacent pyrimidines
- causes formation of 4 membered carbon ring between adjacent pyrimidine residues resulting in pyrimidine dimers
- most frequent between adjacent thymines
- will block DNA replication such that translation polymerase must be recruited to synthesize DNA past site of damage
- mutations arising from pyrimidine dimers explain why high exposure to sunlight can increase your risk of skin cancer
5. Other Base Modifications
- ionizing radiation - chemical mutagens
- many other base modifications arise from exposure to ionizing radiation and chemical mutations
- examples are;
- oxidizing agents, polycyclic aromatic hydrocarbons (benzoate pyrene)
6. Strand Breaks
- single or double stranded
- ionizing radiation
- mechanical stress
- both strands breaks prevent complete transcription or replication but double stranded dangerous
- double stranded dangerous - lead to serious chromosomal abnormalities
- rearrangements - cause disease state
DNA Repair Systems
1. Proofreading during DNA Replication
2. Mismatch Repair
- repair rep mistakes
- when mismatch arises which one is wrong
Mismatch Repair - Bacteria
- solution stems from fact that most bacteria methylate certain bases on their DNA after DNA synthesized
- mature DNA methylated
- new strands take longer to methylate
- time window exists after replication fork has passed, during which one strand of DNA template strand is methylated and other is not
- mismatch repair operates during this time
- when mismatch is found, proteins in repair machinery scan DNA either direction looking for methylated site
- finding which strand mistake made
- endonuclease cuts DNA of strand to be repaired at site of methylation
- then exonuclease digests DNA from new created end from site of methylation back to site mismatch
- DNA polymerase fills in gap using intact strand as template for synthesis
- DNA ligase will seal backbone to create intact double-stranded DNA
- this process energetically costly for cell and site of methylation can be hundreds of nucs away from mismatch
Mismatch Repair - Eukaryotes
- process which eukaryotes distinguish between temp and new not understood
- may rely on nicks in new strand rather than methylation
- eukaryotes do have mismatch
- humans mismatch repair improves rep error rate to one in 109 bases
- 3 billion base pairs in humans
- about 3 mistakes genomes every rep
3. Direct Repair - repair specific modified bases
- small number common types damage enzymes exist specifically repair that type of damage and no others
- methyl transferase that removes methyl from O-methylguanine base will not repair any other type of damage
- cause wide range chemical modifications impractical to have specific enzymes for each type of damage
- helpful repair mechanisms for handful common
4. Base Excision Repair (BER)
- fixes damage that fairly localized in DNA generally affecting only a single base
- will repair cytosine deamination and other modified bases, abasic sites and single stranded breaks
- Two variants:
- short patch
- long path
Short Path Repair
- regular or default path way
- one damaged base
- Break N - glycosidic bond - joining base and deoxyribose
- removes damaged base - creates abasic site
- depurination abasic already exists skip step
- Endonuclease will cut - sugar phos backbone
- if damage single stranded break don't need step
- cutting backbone creates free 3' hydroxyl
- DNA poly can add new base at this site using undamaged strand as template
- After base added DNA Ligase seals backbone
- short patch requires synthesis only single base bit sometimes if damage affected sugar phos backbone won't be possible DNA ligase to seal back bone
- such case long stretch DNA replaced which will move nick further away from site of damage allowing DNA ligase work
Long Patch Repair
- involves replacement of no more than 10 bases on damaged strand
- this is variant of base excision repair path
5. Nucleotide Excision Repair (NER)
- repair path handles damage on single strand that expands beyond one nuc such as pyrimidine dimers
- covers at least two nucleotides or move bulky that will disrupt double helix
- helicase involved in NER
- unwinds strands around the site of damage
- endonuclease cuts damaged strands in two places about 30 bases apart
- results removal that stretch of nucleotides
- repair DNA polymerase use 3' end one side of damage to synthesize DNA to replace missing piece
- DNA ligase seals backbone and repairs complete
- major step in this pathway is removal of stretch of nucs all at once
Double Stranded Break
- double stranded breaks particularly serious events
- disrupt gene function
- if uncorrected they will result in incorrect distribution of genome to daughter cells during cell division
- two paths used to repair
1.Non-Homologous End-Joining
- most common in mammals
- goal is to join two ends of DNA back together
- by whatever means necessary without worrying about preserving or recovering segments
- protein KU
- binds both ends of broken DNA and recruits nuclease and polymerase
- ends NDA usually trimmed back
- regions of comp found or created and two strands made to anneal to each other
- polymerase fills any gaps and DNA ligase joins backbones on both strands
- no other DNA involved so does not take too long
- can repair quickly
- con: DNA never same as before and many mutations likely
2.Homologous Recombination
- requires second double helix with homologous or identical sequence to one broken
- endonuc trims ends of broken DNA and then overhangs invade attacked copy
- base pairing to complete sequence
- for both strands intact DNA used as template for new DNA synthesis
- sequence in entire area copied then crossed over strands are resolved to give two complete double helices
- sequence of broken strands restored proper
- cell must find appropriate section copy from
organisms with large genomes very time consuming finding right one (except after DNA replication has taken place)
MOLECULAR BASIS OF CANCER
Topic 17
- cancer can strike victims of all ages
- cancer more prevalent as pop ages attributed to the accumulation of errors in our cells genomes
- cancer is characterized by genetic and biochemical defects
Cancer as a Genetic Disease
- susceptibility to cancer can be inherited
- retinoblastoma (eye cancer)
- xeroderma pigmentosa (skin cancer)
- some forms of breast cancer, prostate, ovarian, and intestinal
- this is why doctors take your family history
Genetic Predisposition
- means carrier of a certain mutation more likely but not certain to develop certain cancer associated with mutation
DNA Damage
- can not change inherited genes but additional DNA damage must be prevented
- RNA and protein can be replaced but DNA must be preserved
- different causes of DNA damage
- UV light, x-rays and replication errors
- unrepaired DNA has dire consequences
Apoptosis (cell death)
- used as a last resort to get rid of cells with DNA beyond repair
- if apoptosis fails cells can become cancerous
Mutations by Error in Replication
- polymerase often introduces mutations that are repaired
- if repair fails, mutation may alter gene function and potentially cause cancer
Avoidable Causes of Cancer
1. Radiation/Chemical Mutagens
- increases chances of cancer
- increases chances of mutations which increase chances of cancer
- most common is cigarette smoke (69 cancerous chemicals)
- arsenic (rat poison)
- benzene formaldehyde - denaturant
- radioactive Polonium - 210 (in cigarette smoke)
2. Viruses
- hepatitis linked to liver cancer
- HPV - linked to cervical cancer and other cancers
- HIV - Kaposi sarcoma, non-hodgkins lymphoma
- some viruses cannot be vaccinated
- vaccine linked to reduction of cancer prevalence
Cancers Increased Risk with Age
- probably happens due to a combination of genetic predisposition and an accumulation of more than one mutation over time
- exposure to chemicals, radiation and viruses over your lifetime increases risk for cancer
- one mutation usually not significant enough to cause cancer
- overtime we accumulate more and more
Several Mutagens Cause Cancer
- tumour progression involves successive rounds of mutation and selection
- at each round descendants cells acquire another mutation allowing it to grow faster or in abnormal places
- 1st mutation grows faster
- 2nd mutation grows slower
Cancers Derived From Single Cell
- most cancerous cells are derived this way
- accidental production of one mutation cell
- or genome already has one mutation in it
- 1st mutation allows cells to grow more quickly
- cells proliferate (more cells with mutation)
- one of these cells then gets second mutation
- 2nd mutation happen through chemical damage, through radiation or through mutations that occurs when cells get replicated
- second mutation may allow cells to grow in absence
- cells can grow in a different place
- can get third mutation due to damage
- cells can now metastasize elsewhere (cancerous growth)
Properties of Cancer Cells
- many different mutations in many different genes can cause cancer
- no one, cancer cell or type cancer
- therefore harder to solve
- treatment for one cancer can be ineffective for other
- Divide in absence of growth factors
- do not need growth factors
- Cancer Cells and Immortal
- do not respond to signals that normally trigger cell death
- usually cells able to apoptosis, cancer cells do not respond to those signals
- Lost Control over their Cell Cycle
- healthy cells will stop cell division if DNA damage is detected
- cancer cells do not stop at normal cell check points
- divide even though they are damaged
- Cancer Cells are genetically unstable
- have more point mutations
- more copy number variations
- have major chromosome abnormalities
- one gene may not have mutation having dup means double gene expressed (maybe cause cancer)
- Cancer Cells Multiply Abnormal Places
- which causes metastasis
- cancer can spread from original (could be easily surgically removed site) to other sites where its hard to target
Cancer Causing Genes
1. Oncogenes
- mutant form of normal gene
- whose presence causes cancer
- examples: msy, ras, fos, jun and abl
A) Overactivity Mutation (gain of function)
- oncogenes can cause cancer when they have mutation that causes overactivity
- normal cells have two active copies of proto-oncogene
- single mutation event no creates oncogene
- normal gene = proto-oncogene
- mutated gene = oncogene
- activating mutation enables oncogene to stimulate (example) proliferation
Genetic Changes that cause Oncogenes
- Mutation in Protein Coding Region
- normal gene sequence of proto-oncogene mutated in DNA
- mutation carried on in the RNA leads to hyper-active protein
- hyperactive protein made in normal amounts but gain of activity - increases activity in cell
- Gene Amplification
- protein is exact same but made in larger amounts
- instead of having 1 copy chromosome, have 3
- 3x amount RNA and protein
- protein normal, overproduction lads to increase functionality
- Chromosomal Rearrangement
- protein coding sequence unchanged
- no change in copy number
- location different
- may be produced in high amounts when nearby regulatory DNA sequence causes normal protein to be overproduced
- another case, may be fused to activity transcribe gene nearby that produce hyper active fusion protein
2. Tumor Suppressors (loss of function mutation)
- genes whose absence causes cancer
- mutations usually recessive and are loss of function
- normal cell two active copies of a certain gene
- mutation event inactivates one tumor suppressor genes - usually has no effect
- 2nd mutation event then inactivates second copy
- 2 inactivating mutations eliminate tumor suppressor gene - which then proliferates
- example: apoptosis/cell cycle genes
- tumor suppressed and proto-one have opposite functions
- mutation in either cause cancer
Functions of Cancer Causing Genes
- most oncogenes and tumor supp code for proteins that act in or regulate: 1) Cell Division 2) Cell Differentiation
- oncogenes accelerator pedal (accelerates cancer growth)
- tumor supp are breaks (when mutated brake fail)
1. Growth Factors and Cell Receptors for Growth
- oncogenes stuck in on position
- example: epidermal growth factor receptor (EGFR)
- receptor that depends on external growth factor to be active
- absence of growth factor receptor-inactive
- upon binding growth factor EGfR activated and start cell division
- when growth factor mutated and stuck oN position - becomes an oncogene
- now active even absence of growth factor
- promoting unneeded cell division
2. Molecular Cell - Cell Interactions
- example: protein promotes cell adhesion to basic lamina - may be mutated to stabilize interactions with other cells
- making cell growth independent of cells position near basal lamina
3. Regulators of normal/programmed Cell Deaths - p53
- example: P53
- called guardian of genome
- P53 can then promote cell cycle arrest, apoptosis or DNA repair among other processes
- if P53 is mutated then it cannot fulfill function
- its examples of tumor suppressor protein
- cancer cells do not respond to normal signals that trigger cell death
4. Transcription Factors
- proteins that initiate transcriptions of other genes
- mutation one transcript factor can affect expression of many other proteins
- cancer caused by too much or too little expression of genes that regulate cell growth diff and apoptos
- transcription factors are master regulators of gene expression
- have large effect on cell proteome
5. DNA Repair Proteins
- if DNA repair proteins mutated cells quickly accumulate large number of mutations
- Example : Xeroderma Pigmentosa (XPC)
- cancer of skin
- melanomas develop from exposure of skin to UV light
- UV light (pyrimidine dimers)
- predisposition of xeroderma pigmentosa defects in DNA repair mechanism
Breast Cancer - tumour suppressors BRAC1/2
- tumour suppressors proteins BRAC1/2
- 10% breast cancer hereditary from mutations of BRAC1 and 2
- BRAC1 - tumour suppressor that usually functions in DNA repair
- repairs double stranded breaks
- women with certain BRAC1 mutation have 80% risk developing breast cancer and 40% ovarian cancer
- involved in non homologous and homologous end joining
- leads accumulation DNA mutations in genome
- known as BRCA mutations
- many known BRCA mutations
- some germline (present when 1st egg is fertilized)
- somatic mutations (occurred when later acquired)
- some mutations cause early onset cancer some cause late onset cancer
- mutations in promotor sequence BRCA1 and 2 associated with high risk of cancer
Classic Cancer Treatments
1. Surgery
- excises tumor
- capable taking large amounts of cells out at a time
- some cancer cells maybe left behind
2. Radiation
- so damaging to DNA that it stops replicating
- also damages adjacent healthy cells
- sometimes radiation causes secondary cancers
3. Chemotherapy
- stops the replication of cells by damaging DNA or by interfering with mitotic machinery
- sometimes reduce replication substrates
- try to target fast dividing cells
- human cells divide rarely, cancer cells always
- by targeting DNA replication (example) we only damage cancer cells
Breast Cancer Treatments
- Poly ADP Ribose Polymerase 1 (PARP1)
- repairs single stranded breaks in DNA
- more common than double stranded breaks (BRCA)
- cells with just BRCA mutations still viable
- cells with just PARP mutations still viable
- cells with both mutations are dead due to the accumulations of a lot of mutations
- BRCA1 mutated cells die when exposed to PARP1 inhibitors
- if cancer not caused by BRCA1 mutation and we inhibit PARP1 (because of the accumulation of mutations)
- in order to use PARP1 inhibitors have to sequence BRCA1
Difficulties with Cancer Treatments
- the specificity of treatment large problem in radiation and chemo treatments as these affect normal cells as well
- different types of cancers have different causes, genes and locations
- every cancer is treated as a different disease
- even one person - tumors heterogenous
- cancer cells always changing and mutating so treatments have to change too
Resistance
- cancer cells develop resistance to drug
- drug that worked may lose efficiency with time
- example:
- cells may overexpress transmembrane transporters that pump drugs out of the cells
- cells evolving and selection bias
- only cells that mutated against drug keep growing
Drug Delivery
- some cancers difficult to reach and drugs often do not pass blood brain barrier
- making brain tumors especially difficult to treat
- chemotherapeutics may also be quickly metabolized and ineffective
- problem addressed by combination therapy
- are developing greater drug specificity to avoid side effects
- genome sequencing helps with personalized approach
- prevention
- ads about smoking and sunscreen
PROKARYOTIC (BACTERIAL) TRANSCRIPTION
Topic 18
- bacteria and archea both prokaryotes
- transcription and translation are very different
Transcription
- purpose of RNA
- first step in expression of any gene (highly regulated)
- RNA modified copy information found in DNA
- RNA production important 1st step in defining how much certain protein product made at a given time
- RNA can assume many functions that DNA cannot
- transfer RNAs
- ribosomal RNAs
- micro RNAs
- provides amplification of material available for protein synthesis
- contributes differential gene expression
- RNA can be degraded when not needed
- gene expression stopped quick due to environment develop triggers
- also provides additional regulation step
- ex. delaying or modifying RNA process or transport
Consensus Sequence
- sequence that depicts the most frequent base at each position in a group of functionally related DNA elements
- closer sequence is to consensus sequence, the better the functionality of the DNA element
- ex. if sequence is meant to be bound to a certain protein
- more sequence close to consensus, the tighter it is
Bacterial Transcription
- Gene and Promoter Structure
- RNA polymerase
- Mechanism and Regulation
- Lac Operon
Promoter
- DNA sequence required to initiate transcription of gene or operon
Terminator
- DNA sequence required to stop transcription
Operon
- in bacteria, not eukaryotes
- set of bacterial genes transcribed from single promoter thus expressed from common RNA
- protein then translated from single mRNA
Bacterial Operon Structure
- double stranded DNA present
- green: promoter sequence and then transcription start site
- +1: transcription start site
- XYZ: 3 protein coding sequences
- red: terminator sequence (transcription ends)
- regular genes not organized and operons only one coding sequence
Key Featured Bacterial Promoter
- usually 100 bases long and localized upstream
- before transcriptional start site
- +1 is where the transcription starts
- two sections -35 and -10 (named approximately form +1)
- consensus sequence for these sites differ between bacteria
- E-coli consensus sequence
- -35 is TTGACA
- -10 is TATAAT
- can have any nucleotide in between
- -35 and -10 sites found by comparing different promoters of E-coli
- highly conserved sequence suggest importance
The Pribnow box (-10 box)
- is AT rich and ideal for unwinding DNA
Initiation Site
- can be any nucleotides
Determining Sequence Importance
- done experimentally
- make mutations to see if they still work
- example: -35 box from TTGACA (original) to GGGCCC (mutation)
- test if DNA is still transcribed
- transcription of original was 100 units and mutated 2 units
- efficiency goes way down
- not many promoters have sequences that are identical with the consensus sequence
- one way transcription regulated
Bacterial RNA Polymerase
- transcribes DNA to RNA and nucleotide triphosphate (NTA) as substrate
- core enzyme is minimal composition needed to transcribe RNA
- consists of two alpha, a beta and a beta prime unit
- core enzyme is always the same
- adds nucleotides and forms RNA strand using DNA as a template
- cannot recognize promoter region
- need another subunit recognize promoter
- sigma subunit are specific ones that can recognize specific promoters
Holoenzyme is a core enzyme and sigma subunit
- when bound to core enzyme, sigma recognizes -35 and -10 sequences
- protein - DNA interaction (sigma and DNA)
- protein - protein interaction (core and sigma)
*promoter specificity of RNA polymerase determined by sigma subunit*
Steps Initiation Transcription
1. RNAP Holoenzyme Binds Promoter
- closed complex - RNA polymerase on DNA double stranded
2. DNA Unwound
- happens at -10 site
- consensus sequences AT rich (easier to unwind and ideal location)
- open complex - DNA opened and RNA polymerase attached on
3. First Nucleotide Brought Template at +1
- no primer needed to initiate
4. Using ATP, GTP, CTP and UTP as Substrate Chain Elongation Must Proceed 5' to 3'
- polymerase must follow DNA 3'-5' direction to use it as a template
Transcription Bubble
- DNA unwound for transcription but closes again afterwards
- entire bubble covered by RNA polymerase
- no single stranded exposed
- 2 DNA strands and DNA-RNA hybrid
- new RNA released as single stranded RNA
- when RNA polymerase bound to DNA and RNA in polymerase small rudder separates two strands
- also channels to bring in new nucleotides and one for RNA to exist
RNA Synthesis
- nucleotides ligated together
- new phosphodiester band formed
- similar to DNA replication but using ribose not deoxyribose
- pyrophosphate released (DNA rep too)
- energy of rxn from breaking phosphate bond of the incoming nucleotide
5. After Addition 5-10 nucleotides sigma falls off holoenzyme
- transitioning from initiation to elongation
- elongation is the process of transcribing the rest of the RNA
6. Transcription Bubble moves downstream with DNA Template
- DNA template anneals after a section is transcribed
7. RNA Synthesis Proceeds until Terminator Reached
- RNA polymerase falls off
8. Sigma Rebinds Core and Cycle Repeated
Mechanisms Transcription Termination in Prokaryotes
- two types of Rho: dependent and independent
- Rho is a small protein that binds to RNA
1. Rho Dependent
- Rho travels along the RNA
- when polymerase stalls at terminator sequence, Rho catches up with the polymerase and causes it to fall off
2. Rho Independent
- relies on polymerase falling off on its own
- usually caused by terminator sequence in DNA
Transcription at Different Levels of Different Promotors
- first point of regulation in promoter level
1. Some Genes Better -10 and -35 Sequences
- how closely sequence resembles consensus sequence
- optimal promoter
- some -10 and -35 sequences better than others
- some more easily recognized by sigma (better)
- regulation is not dynamic (cannot react external factor)
- it is gene specific
2. More than one Sigma Factor
- each recognize different promoter sequences
- θ70 : housekeeping gene
- θ54 : nitrogen metabolism
- θ38 : starvation
- θ32 : heat shock
- example: Unexpected Nutrient Shortage
- needs a lot proteins help transport few remaining nutrients inside cell
- and able to metabolize nutrients would not use
- cell then expresses a lot of θ38
- more likely bind to RNA polymerase core enzyme
- can upregulate a lot of genes from same category
3. Gene Specific Regulatory Proteins
- this regulation is dynamic
1) negative regulation -> represses transcription
2) positive regulation -> activates transcription
- factors must be able to efficiently bind to DNA
DNA-Protein Interactions
- happens a lot and is very important
- essential components
- highly compatible
- one alpha helix fits nicely in major groove of DNA
- allows for sequence specific interactions
- proteins can also react with DNA backbone so it is not sequence specific
- example: Zinc Finger Protein
- zinc is required for protein to efficiently interact WDNA
- example:
- two helices interact with DNA
- Leucine Zipper
- because that region that zips two alpha helices together is highly leucine rich
Lac Operon
- three lactose metabolizing genes are encoded in RNA
- protein Z: beta-galactosidase
- protein Y: permease
- protein A: transacetylase
- in front of gene three regulatory sites
- I: regulatory or inducer binding site
- P: promoter
- O: operator
Obscuration of Transcription
- if protein (repressor) binds to region where promoter is
- RNA polymerase would not find its promoter and little to no expression
Lac Operon Expression Control
- negative control leads to repression: binds operator and blocks RNA polymerase
- positive control leads to activation catabolic activator protein
- activator binds to i site (inducer) and helps RNA polymerase bind more efficiently
Expression of Lac Operon
- E.coli only needs lac gene when lactose is available
- E.coli much prefers glucose over lactose because of glucose yields more energy for less work
- if glucose is present you do not need to spend energy on making lac metabolizing proteins
- shut transcription of these genes off
Catabolic Repression
- all cells prefer metabolize glucose
- they shut off genes required metabolize other carbon sources in presence of glucose
- glucose is catabolite
Players Regulation Lac Operon
Lac Repressor: DNA binding portion
CAP: catabolite activator protein
- transcriptional activator protein
Lac Operator: DNA element that binds to Lac repressor
CAP binding site: DNA element that binds CAP
Lac Repressor Binding to Lac Operon
- lac repressor binds metabolite lactose (allolactose)
- if this binds to repressor does not repress gene expression
CAP Protein Binding CAP Binding Site
- CAP protein sense glucose metabolism
- only binds under low glucose conditions
- helps RNA polymerase bind to promote region
Operon Activity Under Different Condition
EUKARYOTIC TRANSCRIPTION
Topic 19
- much of bacterial and eukaryotic is same
- DNA that is replicated is not transcribed at the same time
- replication and transcription spatially separated
Differences: Transcription of Eukaryotes and Bacteria
- In Proteins
- bacteria have one main RNA polymerase
- eukaryotes have 3, some have 5
- eukaryotes don't need sigma factor but use transcription factors
- Rho not required in eukaryotes
- No Operons in Eukaryotes
- every gene in eukaryotes have own promoter
- Promoter Structure
- eukaryotes highly complex promoter structure
- no -10, -35 and no sigma
- eukaryotes have transcription factor binding elements, initiator motifs, TATA
- in eukaryotes promoter recognition determined by set of proteins one which recognizes TATA
- very complicated and stepwise (in eukaryotes)
- Eukaryotes Regulatory Protein bind DNA several thousand base pairs from start site
- usually same chromosome
- in bacteria binding adjacent to gene
- How do these proteins influence transcript?
- eukaryote "mediator" protein complex
- mediates talk between enhancers and polymerase
- DNA binds over, loops over and mediator complex facilitates talk between enhancers and initiation complex
- DNA has to be looped for proximity
- mediator protein binds both polymerase and activator proteins and repressor proteins
- Combination Control
- groups proteins work together to determine the expression of the gene
- many enhancers or silencers work together to fine tune
- range of as much as 1000 fold regulation can be achieved
- Accessibility DNA (chromosome structure eukaryotes)
- to access eukaryotes DNA needs to be freed from high organized
- the promoter of gene may be covered by binding to nucleosomes
- to expose DNA cell uses 2 major mechanisms
- 1. Chromatin-Remodelling Complex→ mediate ATP dependent conform changes in the nucleic structure
- 2. Histone - Modifying Enzymes → introduce a covalent modification to N-terminal tails of histone core
Chromatin Remodelling Complex (Nucleosome Sliding)
- nucleosome position changed so promoter sequence can be exposed for transcription
- does not dissociate nucleosomes from DNA
- repositions them
- very large complex
Histone Modification
- goal to take DNA off histones
- the affinity between histones and DNA must be loosened
- DNA (-), Histone (+)
- histone acetylases - add acetyl group, main enzyme needed
- acetylations, methylations and phosphorylations
- change charge of histones
- Ex. Lysine Acetylation
- histone acetylases causes lysine lose positive charge which loosens its affinity to DNA
- histone deacetylases (takes off acetyl group)
RNA PROCESSING
Topic 20
- primary transcript kept at 5' end shortened
- polyadenylated at 3'
- and spliced to take out unneeded sections
5' Capping
- protects the 5' end of mRNA from degradation
- necessary for transport to the nucleus
- reverse methylated G nucleotide added to 5' end
- can't be easily cleaved by endonucleases so protects degradation
3' Capping
- tail of A nucleotides added
- these tails usually around 300 bases long
- can be longer or shorter
- special protein poly-A binding protein binds to this poly-A tail
- binding of the protein to tail prevents degradation
- many poly-A binding proteins can be bound to a single tail and extend lifespan of RNA
- polyadenylation happens in concert with transcription
- after stop codon on, which specifies end coding message
- AU-rich sequence found
- when RNA polymerase passes sequence (AU) another enzyme → poly-A polymerase attaches to mRNA and cleaves growing RNA chain approximately 30 bases downstream from AU-rich sequence signal
- poly-A polymerase then starts adding on templated A nucleotides to produce poly-A-tail
- Poly-A-Polymerase Functions
- Cleaving RNA
- adding A's
Importance 5' Cap and 3' Poly-A-Tail
- Mark 5' and 3' ends mRNA being intact
- Needed for mRNA export → translation
- Protect mRNA from degradation
RNA Splicing
- protein coding sequences (exons) interrupted by noncoding sequences called introns
- after capping and polyadenylation noncoding RNA sequences are excised from pre-mRNA → "splicing"
- introns spliced to give mature mRNA
- Exons → expressed, Introns → interruptions
Human Beta Globin/Factor 8 gene
- is a 200 000 nucleotide pre-mRNA
- sliced into 2 000 nucleotide-long coding sequence
- a lot DNA not needed
Purpose of Splicing
- splicing increases the coding capacity of our genome
- one pre-mRNA gene can be spliced into different variations
- making a number of related, not identical proteins
Splicing Pattern often Different in Different Tissues
- mRNA first transcribed as pre mRNA and then spliced into different forms mature mRNA
- 5' end first exon included in all splicing variants
- intron combination differs between tissues and so does 3' end
- results proteins similar but not same
- within one cell usually, only one variant found
Junctions Between Exons and Introns
- at 5' splice site exon junction flanked by AG sequence on exon side
- and GUAAGU sequence on the intron side
- at 3' splice site flanked by CAG on intron
- and G on exon
- joining two exons result in AGG sequence formation
- branch point adenine
- located 20-50 bases upstream 3' splice
Intron Removal Rxns
- introns removed in two consecutive transesterification rxns
- First Rxn between a branch point and 5' splice
- 2' hydroxo group of branch adenosine residue attacks phosphate end at 5' end of intron
- frees 3' end of first exon and generates lariat shaped intermediate
- Freed 3' splice site attacks 3' splice site
- free 3' hydroxo group of first exon attacks f1 phos of second exon
- rxn forms a phosphodiester bond between two exons to unite them
- excised intron diffuses and degraded
Yeast Splicesome
- consists of RNAs and protein
- binds to splice sites and facilitates the splicing process
Alternative Splicing
- exons can be skipped to give variants
- the general order of exons can't be changed
- we can only omit
Example: Of Alternative Splicing is A branch point modified
- if can't recognize A, splicesome can't recognize 3' splice site and skips next downstream site omitting one exon
- correct splicing essential if not → disease
- splicing, capping and polyadenylation happen while RNA transcribed
mRNA Export
- RNA transcribed in the nucleus
- transported to cytoplasm
- translation machinery decodes the message into protein
- in the cytoplasm, RNA go to the cytoplasm
- export happens AFTER processing
- requires proteins that interact with 5' cap and poly-A-tail and with specific protein carriers
- transport receptor binds to coding message
Nuclear Pore Complex
- transport receptor recognized by this complex
- huge multi subunit complex that acts as a funnel through nuclear envelope
- doesn't recognize mRNA but proteins associated
- mRNA → with cap protein, transport receptor and poly a passes through the pore
- proteins dissociate from mRNA
- transport proteins return to the nucleus
- new proteins → initiation factors bind to mRNA to start translation
mRNA Decay
- at end of lifespan, mRNA degraded
- can start decay from 5' or 3'
- deadenylase shortens poly A tail
- decapping enzyme cleaves 5' cap
- without cap and tail RNA quickly degrades by 5' and 3' exonucleases
- more ways degrade (micro mRNA decay)
Eukaryotic rRNA Processing
- most rRNAs transcribed from single pre-RNA
- pre-RNA spliced into three separate RNAs
- splicing yields separate RNA products
- three RNAs encoded one message
- sort of exception to no operon rule
- rRNA not capped or adenylated
- protected from degradation by tight binding to ribosomal proteins
- rRNAs highly modified
- nucleic building blocks altered after chained together
Example: Pseudouridine
- ribose part of nucleotide moved to a different section
- very common with tRNAs
tRNA's → also highly modified
- nucleotide modifications very important in tRNAs and rRNAs and can't function without
mRNAs → rarely modified
TRANSLATION
Topic 21
- during translation, genetic code deciphered as 3 RNA bases that code for one amino acid
Genetic Code
- is the dictionary that allows us to translate four RNA bases into 21 amino acids
- genetic code spells out amino acids in 3-letter word
- codons
- RNA read in sections of three bases → each section is one amino acid
- every organism has 20 amino acids
- not every organism:
- 21 → selenocysteine (in humans)
- 22→ pyrrolysine
How many Codons?4 x 4 x 4 = 64 codons
- 3 letter codes and 4 letters is possible
Genetic Code Universal → most organisms use
- always some exceptions, where organisms have reassigned codons
Non Overlapping
- genetic code non-overlapping
- 3 letter words and sentences that don't share letters
- if its overlapping → first base stays the same but every base after is very different
- overlapping would provide very short codes
- overlap would place significant restrictions on what amino acids residues could follow each other
No Gaps
- codons are not skipped ever
Redundancy
- some codons specify same amino acid
- 64 codes for 20 amino acids
- often occurs in third position
- first two bases usually same
- third base can be any of other letters
- Wobble → reason third based is usually different
- when tRNA and mRNA base pair third base on 3' of mRNA
- and 5' base on tRNA anticodon
- don't bind as tightly and allow wobbling
- tRNA can decode different base cause wobble
5' anticodon base | 3' codon base |
C | G |
A | U |
U | A,G |
G | U,C |
modified A base→ I | U,C,A |
Amino Acids NOT Equally used Proteins
- some used more than others
- generally, amino acids found less frequently in proteins have fewer codons
- Ex. Met and Trp only one codon each
Stop Codons → UAA, UAG, UGA
- signal protein stop
- do not have matching tRNA
Start Codon → AUG
- all proteins start with methionine
Functionally Related Amino Acids have Similar Codons
- this is because increases chance of functional protein in case of single base mutation
- codons starting with GA encode aspartate and glutamate
- both negatively charged
- often subbed for each other
- glutamate and glutamine vary first position
- same in side chain length
Types of Mutation
- can deduce what mutations in DNA can result in when translated
- silent mutations → due to redundancy of genetic code (several codons can encode same amino acid)
- three base codons read in sequence called frame
- frameshift (insert/delete) results non functional protein as all bases have bear moved over and frame has changed
- insert/delete → can change reading frame that base forward
- nonsense mutations → changed stop codon
- any amino acid downstream of mutation is lost
- disease formation depends on what the mutation effects (catalysis, can it be replaced?)
tRNAs (transfer) → decode genetic code
- the adapter molecules between mRNA and peptides
- they decode the message at anticodon where they base pair with mRNA message
- at 3' end (CCA end)
- amino acid attached
- tRNA physically link anticodons to amino acids
- bring amino acid to growing polypeptide chain
- all have similar structure
- cause must be recognized by translation machinery
- have different features
- to be triplet and amino acid specific
- have highly stable stem loop structure that specified by base pairing
- have 5' and 3' ends
- D and T loop
- anticodon loop
- 2D structure → clover leaf
- 3D structure → L shape
- highly modified molecule (bases)
- are required for function
Two Key Single Stranded Regions
- 3' end → binds amino acid
- anticodon loop → base pairs with codons mRNA
Wobble
- most organisms encode fewer than 45 different tRNAs
- have to decode 61 codons (3 stop ones)
- some tRNAs only have to match up two or three bases between codon and anticodon
- always read mRNA 5'-3'
Molecular Basis for Wobble
- Example: only one type tRNA for Phe
- anticodon GAA → decodes UUC and UUU
- sometimes first tRNA and third mRNA base not base pairing
- wobble → only base pair at first two mRNA bases
- does not work for all tRNAs
Aminoacyl tRNAs Synthetases
- specifies accuracy of decoding three letter code
- couple 3' end tRNA to correct amino acid
- at least one synthetase per amino acid
- amino acid ligated to 3' CCA end tRNA
- covalent link between tRNA and amino acid
Aminoacyl-tRNA Synthetase Rxn
- it is 2-part rxn that requires ATP
- amino acid activated by ligating to ATP
- then transferred to tRNA
- requires energy
Charing tRNAs
- ATP linked to amino acid then linked to tRNA
- results in a high energy bond
- energy ATP now stored in high energy ester link between tRNA and amino acid
- this is energy needed to form peptide chain in the ribosome
- many synthetases also have proofreading function
- tRNA quite large and bind large section synthetase
- Synthetases → pick out correct tRNA and amino acids from all available variants in cell and put them together
Ribosomes
- charged tRNA and mRNA come together in the ribosome
- other proteins help delivering tRNA
- ribosomes made two major subunits
Subunit Ribosome → large and small
- each subunit composed RNA and protein molecules
- bacterial and eukaryotic subunits differ in their complexity
- exploited for antibiotics that only target bacterial subunits
- most of core of subunit made RNA but protein is an important part
- mRNA bound tight:
- A→ site binds aminoacyl-tRNA
- P→ site that binds peptidyl-tRNA
- E → site which tRNA exit
Initiator tRNA
- in bacteria first MET is in a special form called fmet
- all other mets are regular
Step 1 of Translation → Initiation
- in bacteria fMet aminoacyl-tRNA binds P site
- requires base pairing between tRNA and codon
- next aminoacylated tRNA
Step 2
- amino acid from fMet tRNA transferred to second amino acid
- tRNAs migrate through ribosome into P and E sites
- energy from ester bond of peptidyl-tRNA in P site used to form new peptide bond between the amino acids in A and P sites and move them along
Step 3 → Movement tRNAs through Ribosome
- happens with energy provided from step 2
- peptide bond formation coupled to conformation change in ribosome
- also shifts large subunit forward in the ribosome
Peptidyl Transfer Rxn
- amino acid from P site tRNA moved to A site of tRNA → forming peptide bond
- product is tRNA bound growing peptide chain and uncharged tRNA which can be released from ribosome
- tRNAs moved through ribosomes vacating the A site
- small subunit moves forward too → exact 3 bases
Step 4
- uncharged tRNA leaves ribosome from E site
Eukaryotic Translation
Circularization
- mRNA circularize prior translation
- factors binding poly-A-tail and other initiation factors required to initiate translation
- ensures only complete mRNAs translated
Key Points
- Translation occurs in cytoplasm
- transcription in nucleus
- translation cytoplasm
- Translation occurs 5'-3' direction along RNA making protein from N to C terminus
- mRNA decoded one codon at a time
- Energy peptide bond synthesis comes from high energy amino acid-tRNA ester bond
- indirectly from ATP
- Translation Complex Rxn involving:
- both RNA and proteins
- conformational (shape) changes in ribosome
- Specificity comes from:
- aminoacyl tRNA synthetases
- requirement for base pairing in A site of ribosome
Start Site Translation Determination
- diverts between bacteria and eukaryotes
- in eukaryotes no fmet but regular Met
- initiation factors and small subunit bind to 5' cap of mRNA
- moves along RNA searching for first AUG start codon
- when AUG found initiation factors dissociate
- large ribosomal subunits then complete the ribosome
- next amino acid then recruited into A site and translation begins
Long Hair Pin Loop → if there is one it will stall or completely inhibit translation if between 5' and AUG codon
Bacteria uses mechanism other than 5' scanning because there are multiple proteins encoded from same mRNA in bacteria
- bacteria have no 5' cap
- ribosomes recognize internal ribosome binding sites that are upstream AUG start codon
Shine-Dalgarno Sequence
- specifies start site of translation and recognized by ribosomes
- it is just up stream of AUG start codon
- recognized by part of rRNA
Elongation
- ribosome moves along mRNA adding amino acids
- several elongation factors required
Terminating Translation
- message stops at one of three stop codons
- stop codons don't have tRNA
- ribosome stop and waits (stalls)
- instead tRNA, specific release factor found
- looks a bit like tRNA and goes A site
- release factor causes peptide chain to be transferred to H20 instead next amino acid and terminates peptide chain
- requires GTP as cofactor catalyze this step
After Translation
- proteins need to be folded
- proteins need to be modified
- proteins are degraded
Protein Folding
- Primary → amino acid sequence
- Secondary → α-helicies, β sheets and coils
- Tertiary → 3D fold
- Quaternary → complex with other proteins
- after translation proteins folded to an active shape
Three Main Ways Proteins Fold
- work for 85% of all proteins (1 and 2):
- Many proteins can fold properly without help as soon as released ribosome
- Some proteins need help from heat shock protein (HSP 70)
- HSP 70 bind to growing peptide chain to allow delayed folding
- works for 15% of protein (3):
- Significant Help to fold
- first bound heat shock
- turned over to chaperone Gro
- Gro like big tub proteins put in and inside tub protein folded properly before released
GroEL/GroES Complex (chaperonin)
- creates a hydrophobic environment allowing hydrophobic proteins to fold properly
Chaperonin Rxn Cycle
- GroEL ring binds 7 ATPS of unfolded polypeptides which associate with hydrophobic patches of GroEL subunit
- GroES cap binds triggering conformational charges that retracts hydrophobic patches
- releasing polypeptide into GroEL chamber where it can fold
- Within 10 seconds cis GroEL ring hydrolizes 7 ATP
- second polypeptide substrate and 7 ATP bind to trans GroEL ring
- cis ring releases GroES cap, 7 ADP and folded substrate polypeptide
Trans GroEL ring can now bind GroES cap and steps 3-5 repeat
- this process requires lots of ATP and very expensive for cell
Chaperons Help Some Proteins Fold
- chaperone proteins are found in all cells
- many of these are designated heat-shock proteins (HSPs)
- principal chaperones Hsp70, Hsp 60 (chaperonins), Hsp 90
- nascent proteins emerging from ribosome are met by ribosome-associated chaperones including:
- trigger factor (TF) in e.coli
- nascent chain associated complex (NAC) eukaryotes
Prions → caused by misfolded proteins
- known to cause mad cow disease and spongiform encephalopathies
- infectious disease not virus or bacterium but is a misfolded protein
- examples also creutzfeldt-Jakobs disease and new variant creutzfeldt-Jakobs disease which is related to bovine spongiform encephalopathy
- prions can't be transmitted through air or through touching or most other forms of casual contact
- maybe transmitted through contact with infected tissue, bodily fluids or contaminated medical instruments
Prion Propagation
- major prion protein (PrP, for prion protein or protease-resistant protein)
- expression of protein is most predominant in nervous system but occurs in many other tissues throughout body
- protein can exist multiple isoforms, normal PrPC, disease-causing PrPSc, and an isoform located in the mitochondria
- first we only have healthy green form PrPC
- Prion propagates:
- interaction between PrPC and diseased form PrPSC
- causes green form PrPC to misfold
- misfolded prion can cause healthy prions to misfold
In Creutzfeldt - Jakobs Disease
- faulty protein is protein PrPC
- normal PrPC → consists mainly alpha helicies
- diseased PrPSC → consists beta sheets and alpha helivies
- alpha helicies left side (normal) there are beta on infectious form
- while proteins have same sequence and primary structure structurally different enough cause infection
Post-translational Protein Processing
- many proteins undergo covalent alterations before they become functional
- in these post-translational modifications, the primary structure of protein may be altered
- and/or novel derivations may be introduced into its side chains
- hundreds of different post-translational amino acid modifications are known
- Examples: glycosylation and phosphorylation
- proteolytic cleavage most common form of post-translational processing
R Groups in Proteins Can Be Modified
- after protein is translated
- modifications called post translational (PTM)
- Examples:
- phosphorylation, glycosylation, hydroxylation, carboxylation, acetylation and ubiquitination
Modified Proteins Can Be Targeted
- glycoproteins → adding sugars can help targeting
- glycosylation helps bring proteins to destinations
Human p53 PTMs (Guardian Genome)
- its highly modified
- many modifying enzymes add PTMs to p53 and these interactions cause modifications
- if mis-modified → same effect as mutation
- lost of function
- can't detect DNA damage, can't stop cell division and can't induce apoptosis
- P53 mutated/mis-modified in 50% of all known cancer
Phosphorylations often required to turn a protein from off to on position
Modifications can change the activity, location, the interaction of protein in cells.
Functions of Proteins
- recognize external signals
- can be located to any place in cell
- growth and maintenance
- facilitate biochemical rxns (enzymes)
- act as messenger
- provide structure
- maintain pH
- targeted to functional site
- balance fluids
- Blostev immune health
- transports and stores nutrients
Signal Recognition Particle (SRP)
- proteins can recognize signals
- they can bind signal peptides by changing the conformation
- start cascade downstream rxns
- signal recognition particle, section 61 bound to ribosome and recognizes signal particle in growing peptide chain
Membrane Transolaction Eukaryotic Secretory Protein
- during translation when signal peptide is recognized
- translation halted until ribosome directed to ER membrane
- SRP delivers ribosome to membrane ER
- ribosome binds to translocon and growing peptide directly injected into ER lumen during translation
- signal peptide cleaved after function done
Insulin
- translated as long primary peptide chain
- to convert pro-insulin → insulin
- disulfide bridges between A chain and B chain formed
- middle section → chain cleaved and removed generate active insulin
Number of Active Protein Regulated Many levels
- disruption one regulatory node can sometimes but not always be compensated for at different node
- any disruption can lead to disease
Significance of Gene Expression
- most diseases cause altered expression of one or more genes
- therapeutically manipulating gene expression
- have potential prevent/reverse diseases
- producing molecules with tools of recombinants DNA technology or synthetic biology requires regulated gene expression
Gene Expression → very regulated humans
- human genome ~ 21 000 protein genes
- any cell > 10 000 expressed (less than)
- genes expressed different levels
- expression correct essential growth and differentiation
- varies between developmental stages, tissues and in response to environmental triggers
- can use technologies to show how different proteins expressed different parts of body
Regulation
1) Transcription
- can be regulated at any point
- init
- prop
- term
4) RNA processing (pre mRNA → mRNA) (regulate)
- RNA editing
- 5' capping
- spicing
- 4' polyadenylation
8) mRNA export from nucleus to cytoplasm
- RNA not exported can't be translated
9) mRNA degradation
- dictates how long an mRNA available to be translated
- many pathways lead quick degradation unneeded or unwanted mRNAs
- amount mRNA in cell dependent on synthesis and decay
10) Translation
- highly regulated
- init
- prop
- term
13) Protein Modification
- common modes regulation
- phosphorylation
- acetylation
- cleavage
- many signals transduced via phosphorylation cascades
- Ex. kinase AK t1 → only active with phosphate
- dephosphate → no function
14) Protein Inhibition or Degradation
- can be inhibited by other proteins or small molecules
- NSAIDs (ibuprofen/asprin) → protein inhibited
- can also be degraded when no longer needed → process usually slow
RECOMBINANT DNA TECH
Topic 22
- these techs allow us to engineer living organism to have desirable properties or abilities
- techs moved us into era synthetic biology
FoxP2 Gene → Guided Example
- patients struggle with complete sentences and tongue movements
- gene located on chromosome 7
- want to clone gene to:
- determine sequence
- purify protein and investigate properties
- clone → copy of original
- E.coli commonly used for cloning
Expressing FoxP2 in E.Coli
- determine which tissues express FoxP2
- isolate mRNA
- synthesize cDNA
- clone it into expression construct
- purify FoxP2 from E.coli
- two common methods check gene expression
- Quantitative Real Time PCR < short qRT-PC
- Northern Blotting
Blotting → separating DNA, RNA, proteins by size then transfer onto a membrane
- probing for species of interest and visualizing
Southern:
Looking For: DNA
Using: DNA
Northern:
Looking For: RNA
Using: DNA
Western:
Looking For: protein
Using: antibodies
Northern Blot
- Isolate RNA
- have ways of isolating → not important
- Separate RNA by gel electrophoresis
- RNA → contains all mRNA from cell
- mRNA's are different sizes
- gel electrophoresis → separates by size
- principle same SDS-PAGE (proteins) except RNA already (-) charged (cause phosphates backbone)
- another difference → don't need detergent to denature can just use warm temperature
- can use polyacrylamide or gels of agarose (polysaccarides) to separate RNA/DNA
- To Run Agarose Gel:
- need to put RNA samples in individual wells at one end of the gel
- cover gel with buffer → then apply electric current
- (-) charged RNA moves
- away from (-)
- towards (+) electrode
- smaller pieces RNA move faster than larger
- after some time fragments will be separated
- can't see with naked eye
- have to add dye (Ex. Ethidium Bromide)
- when shine UV at dye bound RNA/DNA glows
- Transfer RNA to membrane
- for northern blotting don't add dye
- instead transfer RNA from gel → membrane
- take gel with separated RNA and lay nitrocellulose or nylon membrane flat on top
- put gel and membrane on sponge sitting in salt solute
- stack some dry paper towels on top
- salt solution moves up paper towels by capillary action
- RNA moves along with it out of gel → membrane
- eventually, all RNA will be transferred to the membrane
- Probe Membrane for FoxP2
- strands denature → when heated
- stands anneal (renature) → when cooled
- after RNA on membrane no longer denature and can form secondary structures (hybrids DNA)
- first, denature any RNA secondary structures
- anneal DNA probe complementary to the sequence of interest to RNA
- forming DNA/RNA hyrbid
- to form hybrid, synthesize an oligonucleotide probe that's complementary to part mRNA your looking for
- assume sequence known (human genome sequenced)
- add radioactive phos group to 5' end probe
- incubate membrane (with RNA on it) with probe under condition allow specific annealing probe to complementary strands but not others
- wash away unbound probe and detect radioactivity on membrane
- if you see band in lane, the RNA of the desired gene is present
Once you have identified tissue type mRNA is expressed in
- use tissue as source make many copies of your mRNA
Microarray
- another way looking specific mRNAs
- allows look mRNAs different genes at the same time
- on 2D surface attach oligonucs complement to different mRNAs
- each oligonuc different spot on surface
- tag RNA prep with fluorescent dye
- RNA fluorescently labelled
- mRNA labelled not probe
- incubate surface with your RNA prep
- if certain mRNA present will bing oligenucs in spot that corresponds specific mRNA
- each bright spot → specific mRNA
- each dark spot → mRNA not present
cDNA - Complementary DNA to RNA Strand
- DNA copy mRNA, needed for cloning
- allows amplification sequence
- start with total mRNA prep
- synthesize cDNA using reverse transcriptase and poly (T) primer
- reverse transcriptase found RNA viruses (covid)
- synthesizes DNA using RNA, DNA poly can't
- use poly T oligonucleotide as a primer which is complementary to poly-A tail mRNAs
- end up with DNA and RNA hybrid
- to get double stranded DNA → add RNase enzyme to degrade original mRNA
- same tube you included DNA poly and deoxynuc triphosphate
- polymerase use partially digested to mRNA as primer to synthesize second strand DNA
- first mRNA strand add poly-T primer
- can anneal 3' end mRNA
- including reverse transcriptase in mix results in synthesis of DNA strand that comp to original mRNA strand
- reverse transcribe all mRNAs
- not gene specific
Gene Specific Part
- we add 3 enzymes
- RNase H specifically degrades RNA bound comp DNA strand
- will degrade RNA random spots
- partial segments RNA bound DNA
- RNA fragments act as primers for DNA polymerase
- DNA poly → synthesizes new strand
- RNA ligase join together all nicks as all RNA degrades to → DNA
- end up with double stranded cDNA
- representing all mRNAs in starting tissue
PCR → Polymerase Chain Reaction
- amplifies → many copies made from very small amounts of starting DNA
- little as single copy target DNA
- How it works:
- DNA presented (the cDNA or genomic) with target region to be amplified in darker colours
- dark region → entire coding sequence of gene (FoxP2)
- rxn mix → has two primers
- they are comp to template strands at either end of region to be amplified
- must include DNA polymerase to synthesize new DNA
- polymerase → must be thermostable
- include deoxynucleotide triphosphates
- as substrates for DNA polymerase
- appropriate buffer to provide conditions under which polymerase will be active
To Perform PCR change Temp → 3 Stages:
- and repeat many many cycles
- raise temp very high around 95 °C (Separate 95°C)
- causes all DNA in solution to denature or to separate into two strands
- need poly to withstand this
- cool DNA moderate temp (Anneal 50-60 °C)
- allows primers to anneal specifically to complement
- 3' ends both primers toward region to be amplified
- raise temperature optimal for activity of polymerase (Extend 72 °C)
- around 72 °C
- add nucs 3' end of both primers resulting in duplication of DNA in region to be amplified
- programable thermocyclers invented automatic
- DNA strands that were synthesized in the first cycle become templates for new DNA synthesis in the second cycle
- if everything perfect → double the number of copies of each cycle
- starting cycle 3
- precise target region grow in abundance until soon major product
- after 30 cycles will have 230 copies target DNA
- if you analyze PCR product using gel electrophoresis
- see single band representing cDNA you want
- can cut band out of gel and extract DNA from it
- gives you fairly pure sample cDNA
- next step cloning
Application of PCR
- Quantitative PCR (qPCR)
- molecule that becomes fluorescent presence double stranded DNA
- fluorescent gets brighter as more amplification
- can use to quantify amount specific mRNA
- DNA Fingerprinting
- humans repetitive regions in DNA that contain repeats short sequences
- number of short depends on the person
- more you check, less likely unrelated
- used crime scenes and paternity tests
- Site Directed Mutagenesis
- used introduce desired changes in DNA sequence
- design primer not completely comp to template sequence
- contains one or more mismatches
- ensure 3' end primer anneal template and used by DNA polymerase
- as PCR progresses altered version dominant product
Plasmids
- relatively small pieces DNA found in microorganisms
- not part of organisms genome
- usually circular, double stranded, less than 1000 base pairs in size
- multiple copies of single plasmid present in bacteria
- convenient vehicles/vectors → gene can be inserted
Plasmid Features
- must contain replication origin
- very helpful have selectable marker
- to help distinguish
- often gene that confers resistance to an antibiotic
- if cells grown media containing antibiotic cells that don't have die, and ones that do have gene survive
- many have multiple cloning regions
- for section DNA contains recognition sequences for many restriction endonucleases (restriction enzymes)
Restriction Endonucleases
- help with ability to manipulate DNA
- usually recognize specific motifs in double stranded DNA
- usually between 4-8 bases long
- common one can generate sticky ends which
- have 5' or 3' overhang or blunt ends
Creating Recombinant DNA
- multiple cloning site has many restriction endonucs recognition sites
- allow cut open circular DNA and past exogenous/foreign DNA (compatible ends)
- cut plasmid and DNA gene with restriction enzymes
- using overhangs/sticky ends
- incubate fragments together to allow sticky ends to anneal
- adding DNA ligase → backbone repaired
- needs 5' phos present in order to join strands
- result intact circular DNA that contains desired insert
Example: BamHI from Bacillus Mega HI site on Plasmid to insert cDNA Gene
- add BamHI cut site onto end of cDNA by ligating on short pieces DNA that contain that DNA → linker DNA
- OR design PCR primer containing BamHI recognition Site
- gives rise sticky ends compatible
- purify fragments away from restriction enzymes
- mix fragments
- add DNA ligases
- some frequent insert ligated into plasmid giving rise to desired recombinant plasmid
- plasmid introduced E.Coli for protein production
Transformation
- process which plasmid introduced to bacteria
- process bacteria takes up exogenous DNA
Two Main Ways Transform Bacteria:
- Electroporation
- mix plasmid and bacteria under certain conditions
- then expose cells to electric current
- causes some cells uptake plasmid
- like shooting little holes into bacteria cell wall
- Treating Bacteria with chemicals
- often divalent cations (ex. calcium) make them;
- competent → cells capable of taking up foreign DNA
- mix cells and plasmid together
- quickly warming cells up in "heat shock" induces some cells to take up plasmid
*for both methods: must be circular DNA*
- to be stable in bacteria
- linear doesn't survive in bacteria
Selection
- incubate cells on an agar plate containing antibiotics
- that corresponds selectable marker on plasmids
- cell contains plasmid → resistant antibiotic
- will grow on plate
- no guaranatee that plasmid contains insert wanted no guarantee that plasmid contains insert wanted
- maybe ends of plasmid cut were ligated back together without insert
- would be resistant but not have cDNA wanted
- must test colonies find one contains recombinant plasmid
- each method grow up culture of colony and isolate plasmid DNA from cells
Restriction Mapping (check insert technique)
- cute plasmid with particular restriction enzymes then run products on gel and check sizes
- know sequence plasmid → can predict fragment size
- can tell insert about right size
Example: Two EcoRI sites (500 bases apart)
- add 1000 bases insert between
- if you cut recombinant plasmid → expect 1500 bases
- visualize by separating pieces by size (like northern blot)
- have to stain gel to see DNA
- use dye called ethidium bromide
- binds DNA (the dye) and glows under UV light
Expect to See: 500 (Eco RI Sites), 1000 (insert) and 2500 (set of plasmid - total was 4000) base blot
Clones Must be Verified
- PCR → can be used to check for correct insertion
- use primers that flank predicted insert
- if you see PCR product appropriate size that evidence your plasmid contains insert
Third Way Check Insert Presence Clones
- denature plasmid and incubate it with oligonuc probe specific to insert sequence
- can be done by southern blotting
- if probe binds to plasmid →suggests insert is present
*all three methods are no guarantee*
Sanger DNA Sequencing (dideoxy)
- chain termination sequencing
- best way to test and verify plasmid is what you want it to be
- set up DNA synthesis rxn with primer that anneals to DNA thats to be sequences
- and DNA poly and deoxynuc triphophates
- include dideoxy NTPS → "chain terminators"
- lack hydroxyl at both 2' and 5' carbons sugar
- Effect DNA replication
- dideoxy NTP incorporated into growing chain
- DNA synthesis stops at that point
- no hydroxyl to add to
- include bit of all four dideoxy NTP
- DNA replication can stop anywhere
- each base different fluorescent label
*Method useful relatively in small number of samples to analyze over limited length sequence*
ddATP RXN
RXN Mix
- template, one primer (anneals to template), DNA polymerase and deoxy NTPs (so synthesis occurs)
- include 0.2% dedeoxy ATP, compared regular deoxy ATP
- dideoxy ATP modified by fluorescent label
- DNA poly binds to template and starts adding nucs to the primer
- when comes to T on template incorporates A
- 0.27 chance dideoxy incorporated
- DNA synthesis stops if that happens
- millions of DNA templates
- all newly synthesized strands NOT uniform
- all fluorescently labelled at the end (A)
- end up with series of newly synthesized
DNA fragments
- each stopping at different points in sequence and each fluorescently tagged according to last base added (differs)
Analysis
- denature strands and separate by capillary gel electrophoresis → in tiny tube
- smaller and quicker
- monitor bottom capillary for fluorescence
- can read complement DNA strand want sequence
- get output with peaks
- each peak represents signal from strand
- computer converts peak → sequence
- confirms whether cDNA in plasmid or not and sequence given to analyze
Purifying Protein from E.Coli
- Create Expression Plasmid
- Transform cells with plasmid
- Grow cell culture → Inducing OverExpression
- grow cells containing liquid culture and induce high level expression gen
- Lyse Cells
- when having a lot of cells, lyse them (break)
- use physical methods to disrupt cell wall/membrane
- Remove Cellular Debris
- use centrifugation to remove remanents
- Chromatography
- size, charge properties, affinity matrix
- left with complicated mixture
- to separate protein from other molecules use chromatography
- Check Purity
- of protein preparation
Anatomy Protein Expression Vector
- Vector → something used to move some DNA interest into desired location
- means introducing cargo DNA to system
- plasmid must have origin replication
- some origins have many copies. some only few copies
- Promoter
- bacterial RNA poly can recognize
- contain regulatory sequences allow transcripts turned on
- mRNA has ribosome binding site
- so mRNA can be translate
- affinity tags
- simplify purification
- put 3' or 5' and fount at N/C terminus
- tags are usually couple residues long
- His tags binds nickle ions - immobilize in chromatography column
- Tag Removal Sequence
- sometimes don't have to remove - so small
- usually sequence includes protease coding
- Terminator
- knows poly when stop
- Selectable Marker
- to identify cells with plasmid
Affinity Purification
- equilibrate material binds affinity tag to ensure everything same buffer
- polyHis tag → Nickle affinity chromatography
- load sample onto column
- only protein interest bound column
- elute protein interest by adding competitive (amidsole) binder
- protein now half pure
SDS-Page Analysis Example Purification
- Cell Extract
- some cells over express and broken open
- Flow through Fractions from Affinity Column
- Purified Protein
- analyze structure/function to see how it works
Transgenic Organisms
- organisms whose genomes have been permanently altered by genetic engineering
- plasmid not transgenic because not part of genome
- vital studying genes and proteins in context of whole organism
- Three main types changes to gene:
- gene replacement
- gene knockout
- gene addition (higher expression)
- observing changes → valuable information for function
CRISPR - Cas9 → Clustered Regulatory Interspaced Short Palindromic Repeats - Endonuclease
- Cas 9 → creates double stranded breaks
- cuts specific sites DNA by guide RNA (binds enzyme)
- without guide, no cuts
- researchers cut specific → provide right guide
- Guide → 30 nucs long, 20 nucs specify cutting
- very unlikely targeted 20 nucs present more than once in genome cause 420 random bases
After Double Stand Breaks:
- double strand machinery tries fix break
- non-homologous:
- likely bases lost at repair site
- if near start gene, good chance gene inactivated to create gene knock-out
- homologous:
- if replacement DNA provided chance organism will copy that during repair
- altered version gene created
- alternatively, entire gene could be added site repair
Mice with Altered FOXP2 genes
- Homozygous Knockout
- develop delay, motor abnormal → premature die 3-9 weeks
- fewer vocalizations upon separation from mother
- Heterozygous Knockout
- fewer vocalizations upon separation from mother
- different male courtship songs
SEQUENCING GENOMES
Topic 23
Steps in Genome Sequencing
- Create genomic DNA library
- Generate many independent sequencing reads
- Align independent reads into contiguous sequences (contigs)
- Fill gaps between contigs
- Annotate genome
Example: Haemophilus Influenzae
- does not cause influenza
- it is a human pathogen
- gram (-), free living, self-replicating bacterium
Genome Sequencing Strategies
Genome Sequence Consecutively
- get sequence one section of genome (limit number of bases read)
- back then 500 bases (one run)
- sequence specific 500 base pair section
- use known sequence → design sequence primer that would extend from known fragment to neighbouring unknown region
- allows sequence next 500 base pair stretch
- design another primer → allow us to extend another 500 bases and so on
- 1.8 million base pairs → need 3 600 sequence runs
- Advantages:
- know where the sequence fragment fits in genome
- Disadvantages:
- have to wait for previous sequence first
- takes a lot of time
- if did both directions 7/week → 5 years at least
- Sequence Random Fragments Simultaneously (Shot Gun Sequencing) cause location each sequence random
- need a lot of sequences to give a good chance of obtaining a particular spot of a genome
- need more than 3 600 sequence runs
- Advantages:
- don't have to wait any particular sequence
- simultaneous → saves time
- Disadvantages:
- have to figure out how pieces fit together
- like puzzle
- must assemble to genome
- computer can be used
Create Genome DNA Library
- grow culture source cells and isolate genomic DNA from cells
- generate smaller fragments of genome
- done by digesting with a restriction enzyme that gives fragments of appropriate size
- for influnzae:
- ran DNA fragments on agrose gel
- extracted fragments between 1600 and 2000 base lengths
- method gives them several fragments size range
- then ligated DNA fragments into vector to create collection circular plasmids
- each plasmid is a different fragment
- collects enough vectors containing different genomic inserts and genomic DNA → to library
- transforms genomic DNA library to E.coli
- obtained 20 000 clones
- each clone containing plasmid with different genomic insert in it
- E.coli clones can be grown separately and stored in a freezer
- desired plasmid in library can easily duplicate
- researchers obtain a lot of plasmid in library by growing each clone and extracting plasma DNA
Sequence Inserts
- plasma DNA sequence → sanger or dideoxy
- sequence parts of these fragments create large number of independent sequencing reads
- each read → about 500 bases
- computer used
Overlapping Reads Generate Contigs
- computer aligned sequences run → longer consecutive DNA → contigs
- buy looking for overlap in sequences
- each contig represents stretch of genome sequence from aligned reads
- contig → data file not plasmid
- without reference, genome doesn't know how to order contigs
Gaps between Contigs
- many combinations of contigs possible
- gaps → few or thousand bases
- have to be linked together
Fill Gaps
- if gap small → check clone genomic library
- can design primers sequence middle inserts
- inserts 1600-2000 np → could only do 500 ratios?
- only works with gap small
H Influenza
- 98/140 sequence gaps
- 42 gaps not present
Physical Gaps → working with real genome
- missing sequence NOT present in genomic library solution
- design PCR primers pointing into gaps
- perform PCR each possible pair
- see what pair primers gives product
- Example:
- 6 PCRS with primer 1
- only 1 rxn will give product
Illumina Sequencing
- no bacterial cloning required
- DNA amplified on solid surface by PCR - like process
- billions simultaneous parallel rxns possible
- sanger methods: hundreds
- analysis electrophoresis not needed
- automated, real-time detection
- shorter read lengths
- Results: sequencing
- much faster but only generates very short fragments of 150-300 bases
- trouble if need contigs after
- really good method when already have a sequence genome and are looking for mutations in a specific genome
Genome Annotation
Want to FInd:
- open reading frames (ORFS) encoding proteins
- RNA genes
- transcriptional regulatory regions
- origins of replication
- telomeres
- repeat sequences
Finding ORFS
- almost all ORFS start with ATG (methionire)
- will end with one of 3 stop codons
- in between will be triplet codon that specify which amino acids should be at which position
- will have associated regulatory regions (promoter)
ORFS that Encode Proteins
- Usually contain > 100 codons
- longer possible ORF extends without stop codon move likely protein coding
- Show codon bias
- most amino acids encoded more than one codon
- not all codons have the same frequency mRNA
- Example:
- leucine six codons
- TTA almost half all coded, other codonsless
- CTG most common
- when look ORF → codon usage reflect organism
- Preceded by promoter and have associated regulatory elements
- must have promoter near
- bacteria promoters easier to find than eukaryotes
- Related genes other organisms
- due to the evolutionary relatedness of different organisms and a lot of genomes available now
- genome protein-coding ORF possible identify similar genes other organisms
Full genome annotation involves looking for other important sequences too not just protein ORFS
- annotation is an ongoing process
Blast → Basic Local Alignment Search Tool
- exist allows compare possible gene or protein sequence with nucleotide of amino acid sequences of known genes/proteins organisms
- program will look for similarities and report back highest matches found
- Expectation value → estimate likelihood match this good occur by chance
- lower number, less likely two sequences actually unrelated
Multiple Sequence Alignment
- related proteins different organisms
- can align many sequences simultaneously
- closely related → evolution → more similarities
- parts proteins particularly important for function conserved in different organisms
5.Produce MRNA
- if protein coding should direct mRNA in the organism
- only expressed certain conditions and cell types
6.Produce Protein → check mass spec
- may only present certain conditions/cell types
- commonly use mass spec to analyze
Give kudos to your peers!