Rediscovering Life

Pervasive Transcription: Using Genomes “Every Which Way”

Stephen L. Talbott

RSS Feed for Rediscovering Life: RSS Feed for this page   

This article is part of “Rediscovering Life”, a news portal for the Biology Worthy of Life Project. Copyright 2013 The Nature Institute. All rights reserved.
By placing your cursor on many scientific terms such as “signaling” (try it here), you may find them to be clickable links into a separate glossary window (or tab, if your browser is set that way). You can in any case open the glossary for browsing by clicking here.
Posted: July 4, 2013   (Article 6)

“There appears to be a whole universe out there in the genome to be explored” (Dinger et al. 2009*).

In the beginning (1953), the DNA double helix was an elegant linear sequence of computer-like code. It would, in the ensuing years, be seen to consist of neatly discrete elements with consistent functions — functions that could be added one to the other rather like the functions of a machine. The two strands of the double helix spiraled around each other in a geometrically satisfying way, with genes and some regulatory sequences laid out along the structure’s length. The genes were transcribed from one of the strands by a specialized enzyme, producing messenger RNAs (mRNAs), each of which was in turn translated by a sophisticated “molecular machine” into a protein, which finally folded into a precise, three-dimensional structure in order to perform its well-defined function.

In this way, exactly one protein was produced from each gene. And everyone — without worrying too much about what the details would eventually look like or whether the machine and computational models could possibly encompass what we already knew about organic processes — simply assumed that all the parts would somehow come neatly together at the right times, in the right places, and in the right quantities to produce the basic architecture, the living substance, and the behavior of the organism.

It was a clean and nearly perfect picture in an airily abstract sort of way. Perfect, that is, apart from the oddity that researchers would subsequently declare all but a minuscule portion (one or two percent) of DNA to be parasitic gibberish (“junk”) of no particular use to the organism — and apart from the fact that genes had as yet only the vaguest connection to any of the larger processes or full-bodied traits they were supposed to explain. But it seemed there were no obstacles to future understanding; DNA, as the master engine of the machine-organism’s logic, promised all necessary answers. How could such an intricate and beautiful code fail to contain the instructions for building an organism?

The surprises would unfold one by one over a few decades, as the new field of molecular genetics gained in technical sophistication. For a long while the unexpected discoveries were outshone by the blinding light and the triumphant certainties of the new science. But more recently, brought into ever clearer focus by the Human Genome Project and its aftermath, the gathering revelations have been coalescing into a discipline-reshaping “field” of new awarenesses.

One sign of the change has been the hopeless unraveling of the old picture of the double helix with its genes. Among other things: the “gibberish” has proven to be differentiated, versatile, and profoundly significant for the organism; and, on the other hand, the clean logic has become a total mess — but only in the way a Shakespearean play is (at least for some minds) a total mess compared to Principia Mathematica.

To put it a little differently: both the parasitic nonsense and the computational program are being transformed by current research into richness of meaning — but a richness that is realized only in the context of the larger organism. DNA is, however slowly, being brought alive by being linked to wider organic processes, thereby losing its misconceived character as a bearer of self-contained logic.

Some Technical Terms

Many of the technical terms in this article (and in this box) are links to a glossary. By clicking on the terms you may find more complete explanations.

Transcription is the process by which RNA molecules are produced from DNA sequences. An RNA that codes for proteins is called a “messenger RNA” (mRNA), while an RNA that does not code for protein is a noncoding RNA. The mRNAs and other types of RNA resulting from transcription are known as transcripts. The process by which mRNAs are used for producing proteins is called translation.

Gene expression refers most immediately to the transcription of (protein-coding) genes. But it can also embrace the later consequences of transcription, all the way to production of protein via translation. And “expression of DNA sequences” is a broader phrase, necessitated by the kind of findings discussed in this article, and embracing the transcription of any and all sequences, including non-genic ones.

A promoter is a regulatory DNA sequence where elaborate and variable protein complexes assemble as one of the prerequisites for initiation of transcription. An enhancer is a regulatory sequence, more or less distinct from a promoter and often located much further from the transcription sites it helps to regulate.

Here I will briefly mention just one aspect of what’s been going on — an aspect captured in one of the hottest phrases of the contemporary literature: “pervasive transcription”.

A good place to begin is with the recent disclosure that even an organism as simple as yeast manages to generate from its 8000 or so genes, not merely 8000 unique messenger RNAs (mRNAs), but rather hundreds of thousands of different mRNAs. These mRNA variants, participating with many other molecules and processes, can produce vastly more than 8000 proteins, but this is not all. Each variant may also, in a distinctive way, influence other things: its own stability, its localization within the cell, and the different sorts of molecular processing to which it will be subject on down the line.

Moreover, all this variable use of genomic material in yeast was discovered by looking at just one of the many methods by which cells orchestrate different outcomes from the “same” genes. This method involves varying the starting and ending points of gene transcription, so that what we used to think of as a single gene now effectively becomes dozens or even hundreds of distinct genes, depending on the organism’s choices. Lars Steinmetz, a researcher at the European Molecular Biology Laboratory (EMBL) and the Stanford Genome Technology Center, led the yeast project, and had this to say about the new finding:

We knew that transcription could lead to a certain amount of diversity, but we were not expecting it to be so vast. ... Based on this diversity, we would expect that no yeast cell has the same set of messenger RNA molecules as its neighbour. (Quoted in EMBL press release 2013*)

If genes have been losing their distinct identities, the loss is due to much more than the selection of different places to start or stop transcription. Further, the idea of pervasive transcription extends far beyond the gene itself. It turns out that the greater part of “parasitic” or “junk” DNA is now known to be transcribed despite the fact that it does not code for proteins. In humans some 75% of it is transcribed according to current estimates, and those estimates have been rising fast.

Biologists have only recently begun to search in earnest for the functions of this unexpectedly transcribed DNA. The discoveries are now coming at a furious pace, and it would be hard to find any aspect of the organism’s life that is not in one way or another affected by the regulatory activities of the long and short noncoding RNAs being brought to light — microRNAs, small interfering RNAs, piwi-interacting RNAs, long intervening noncoding RNAs, small nuclear RNAs, promoter-associated RNAs, enhancer RNAs ... and the list seems to grow with every passing month.

But there is much more, which I can only vaguely allude to now. For example, DNA sequences such as promoters and enhancers, which regulate the transcription of protein-coding genes, may themselves be transcribed — often along with more or less of the downstream DNA sequence. The resultant RNA transcripts play into further processes of genome management.

Another remarkable discovery has given us extensive antisense transcription — that is, transcription from the “wrong” strand of the double helix. Such transcription is extremely unlikely to yield proteins, but can produce a whole array of RNAs, with all their regulatory implications. And not only can the antisense transcripts carry out important functions in their own right, but the very act of transcription, with transcribing enzymes moving in opposite directions along the two strands, can enhance or repress the expression of particular protein-coding or other sequences.

To conclude this much-too-brief litany: it’s been found, rather startlingly, that protein-coding RNAs derived from different genes can be stitched together to form a single mRNA. Equally surprising, the discarded portions of a “single” gene — portions once consigned to the noncoding “junk” category — participate in functions we have hardly begun to understand, ranging from the production of various noncoding RNAs to the regulation of alternative splicing, (whereby a single mRNA may lead to as many as a thousand or more different proteins), to the harboring of distinct genes within genes. And, finally, once a protein-coding gene has been transcribed into a messenger RNA, that mRNA may then be cleaved, yielding independent functional fragments.

In sum, as molecular biologists Alex Tuck and David Tollervey wrote a couple of years ago (2011*): “A single locus [of the genome] can produce multiple transcripts that use shared sequences in distinct ways to fulfill a spectrum of fundamentally different biological functions”. In fact, “sequences with a single function might ... be the exception rather than the rule, and the ability of a single sequence to encode multiple layers of information permits an almost unimaginable overall level of regulatory complexity and transcriptome diversity”.

And once again, speaking more specifically of gene regulation, Tuck and Tollervey wrote: “At the DNA level, numerous layers of regulatory information pervade the transcribed region, blurring the distinction between regulatory and transcribed sequences and refuting the notion of a modular gene.”

It is hard to put in sufficiently graphic terms what all this means for our understanding of the genome. Once-definite outlines dissolve amid countless interleaved, overlapping, cross-talking activities of the whole cell — activities that come dynamically together from all directions in order to coordinate and integrate the mind-numbing complexities, the mutually entangled processes, of pervasive transcription. Rigid functional distinctions that once made the double helix a model fulfillment of mechanistic expectation are obscured beyond recognition.

And everything takes place in the context of that three-dimensional, balletic performance of chromosomes, choreographed by the cell, that is so often mentioned nowadays in the literature of gene regulation: chromosomes are compacted or decompacted; they reposition themselves within the nucleus; they send out loops for interaction with other loci, or withdraw them; the double helix coils more tightly here and uncoils there; the two strands separate from each other at particular places and then reunite (“breathing”); distinctive structures of many sorts are formed, such as variations upon a four-stranded structure known as a “G-quadruplex”... In all this we see some of the contextually organized performance at the DNA end of the formula that once disarmed us with its simplicity: “DNA produces RNA and RNA produces protein”.

When a logical code was all that seemed to matter, it was easy to gloss over the subtlety that would have to reign in actual organisms. One needed only to say that the code was for the production of such-and-such a protein. But that is rather like saying our legs are for (let’s say) hunting animals; it omits almost everything that counts. In reality, our two legs are woven into all aspects of our morphology, physiology, and behavior. Everything legs are and do — whether it is their muscles acting as a kind of “second heart” to aid in blood circulation, or their bone marrow producing red blood cells, or the bones and muscles together helping us to hold our balance and view distant horizons from an upright stance — belongs to our life. All their activity, whether internal or directed toward an external goal, counts, and none of it can be separated off as the function of legs.

A similar perspective is demanded from us at the molecular level, where no mere logic will enable us to understand chromosomes with their DNA. The only way to learn about their significance is to observe their complex, embodied, three-dimensional performance — a performance that, among many other things, makes pervasive transcription possible. This spatial, temporal, and architectural drama is what DNA expression, in its fullest sense, is. DNA can indeed, in its own way, reveal the whole organism to us — but only because, according to a long-recognized principle of organic wholeness, the whole itself comes to expression in every part. There could hardly be a better illustration of this than the pervasively transcribed genome.

Tags: DNA/noncoding; plasticity/genome; holism; transcription/pervasive
Sources: Dinger, Marcel E., Paulo P. Amaral, Timothy R. Mercer and John S. Mattick (2009). “Pervasive Transcription of the Eukaryotic Genome: Functional Indices and Conceptual Implications”, Briefings in Functional Genomics and Proteomics vol. 8, no. 6, pp. 407-423. doi:10.1093/bfgp/elp038

European Molecular Biology Laboratory (2013). “Pushing the Boundaries of Transcription”, Press Release (April 24). Downloaded from on April 24, 2013.

Tuck, Alex C. and David Tollervey (2011). “RNA in Pieces”, Trends in Genetics vol. 27, no. 10 (Oct.), pp. 422-32. doi:10.1016/j.tig.2011.06.001

Further information: See Getting Over the Code Delusion: Biology’s Awakening and the much shorter article, Indefinable Genes and the “Wild West” Genomic Landscape.

This document:

Steve Talbott :: Pervasive Transcription: Using Genomes “Every Which Way”