Saturday, February 10, 2007

Am I Repeating Myself? (corrected)

Having committted to the Week of Science I feel, well, committed. Every day a post must go out. With other presssures & poor sleep habits, the danger of quality slippping is always present. Last night, I got paraaranoid I had already done a post on multiple test corrrrrection.

There are other risks. The Director of Ergonomics (though most people still refer to her by her previous position, Director Of Genomics) sometimes feels inspired to contribute. Since the keyboard isn't exactly optimized for her digits, the result can be a mess. If I leave the computer and then come back, I might overlooook some rather strange output.

Losss of sleep might also lead to non-sequiturs, especially with careless or accidental cut-and-paste operations. With other pressures & poor sleep habits, the danger of quality slipppping is always present.

I do write well, I think. Think! I, well write. Do I? My ambitions might not be those of Teddy Roosevelt -- "A man, a plan, a canal: Panama!" -- but I do try to do things well. There's always the risk I will have writers block & nothing to say. Was the first sentence ever spoken "Madam, I'm Adam!"?

Welcome to the crazy world of repeats: your genome is full of them. When we talk about repeats there are really a bunch of related but different phenomena.

Simple repeats are the repetition of a simple pattern -- or is that repetitititition of a simple pattttttttern? Mono, di, tri, tetra & so on patterns of nucleotides are in large arrays. These probably originate through various slippage mechanisms during DNA copying -- the polymerase loses track of where it is and copies the same stretch multiple times. Simple nucleotide repeats have been used extensively as genetic markers (though SNPs have largely supplanted them), since there is variation across populations in the number of repeats AND one can measure the length of these repeats in a PCR assay. Repeats are also phenotypically relevant -- changes in repeat number can alter what is coded. The most spectacular, and devastating, versions of these are the trinucleotide repeat expansions in diseases such as Huntington's Disease. Over several generations, the normal repeat of around a dozen CAG can expand to over a thousand -- with devastating effects -- somehow the string of glutamines generated by that are a problem.

Repeats can be direct or inverted. Direct repeats are just that -- direct repeats direct repeats direct repeats. Palindromes, such as the Panama and Adam bits above, are the closest analog to inverted repeats that we have in language, but there is a difference. When a biologist talks palindromes, they mean that the forward and reverse strands are the same -- the reverse complement of GATC is GATC.

Other repeats are dispersed throughout the genome. Some of these are functional: in order to generate a lot of ribosomal RNA, there are multiple rRNA genes distributed across the genome (often in tandem arrays). Others are remnants, pieces of genes that have been copied. This is sometimes through reverse transcription of mRNA, so the new copy lacks introns and possesses a poly-A region (a simple repeat!). Most of these copies are non-functional, but occasionally the new copy can be expressed. Those that do not acquire a new job decay from mutations, until they are no longer recognizable.

Another class of repeated sequences are various transposable elements, genetic sequences which make copies of themselves. Some of these, such as Alu elements, are present in gazillions of copies. Now each copy is not identical, as most were replicated long ago and have since acquired mutations. But, certain hallmarks remain. Some of these elements replicate through an mRNA mechanism, but there are also purely DNA transposons which copy themselves within the genome.

Still other repeated elements are copies of other genomes -- such as pieces of the mitochondrial genome which have been copied over. This is the same process which moved genes out of the mitochondria and into the nucleus, though no genes have retained function after the move in a long, long time. There are also oodles of copies of defunct retroviruses.

Because of Watson-Crick basepairing, repeats can do all sorts of things. In a messenger RNA, an inverted repeat structure (palindromes) can lead to a hairpin structure, as the first repeat binds to the second repeat. Recombination between repeats, if not paired carefully, can create even more interesting effects; in the diagram below, start at 1a, trace through to the x and then end at 2b; then do the same from 2a to 1b. The - characters are for display and do not represent genes.

1a >AB-CCD> 1b
x
2a >ABCC-D> 2b

yields

>ABCCCD>
>ABCD>

Viola! One repeat has expanded, while the other has shrunk.
Even more spectacular results can ensue if the repeats are on different chromosomes

>ABCDE>
x
>RSCTU>

yields

>ABCTU>
+
>RSCDE>

-- now you have translocations of pieces of one chromosome to another. Suppose S contained the centrosome of one chromosome and D the centrosome for the other one. Centrosomes are required for correct chromosome segregation during cell division, and only one per customer. Now one chromosome has two & will be yanked apart; the other has zero and will be lost.

Here's what I get for making jokes about making mistakes: a mistake. My inverted repeat example really shows direct repeats

Recombination between direct repeats on the same chromosome can also be radical:

>ABCDECF>
+--+ recombine

yields

>ABCF>

Now we've lost DE -- if those were critical genes, or worse the centromere (required for cell division), things are going to go haywire.


Okay, here is the correct version for an inverted repeat


>ABCDEcF> top strand
<abcdeCf< bottom strand
+--+

which can be rewritten as single strands as

>ABCDEcF>
x
>f-Cedcba>

yielding

>ABCedcba>
and
>fCDEcF>


Let's hope the repair is better than the problem!

All of these effects can also happen on a micro scale. Direct or inverted repeats cloned into E.coli can often trigger recombination or select for deletion of the repeats (E.coli contains far fewer repeats & is less tolerant of them), turning your beautiful plasmid into genetic hash.

There are worse things then to repeat yourself, and there are lots of ways to repeat yourself. But I'll still worry about repeating myself: With other pressssssures & poor sleep habits, the danger of quality slippping is always present...

No comments: