How do you sequence a genome?

Wednesday, August 5, 2009 at 2:23 AM Bookmark and Share
Have you ever wondered how genes or whole genomes are sequenced, or how DNA sequencing plays into our understanding of how life has evolved here on Earth??

Without getting in to all the details, here's some great video footage that provides a glimpse into this world. Additionally, the video includes discussion (towards the end) of the role science plays in society and what that implies for the future of our species, and life on earth. At any rate - I think it's well worth watching despite the length of the video!

Here, Richard Dawkins is interviewing Craig Venter while getting a tour of his sequencing facilities - an impressive example of "industrial biology" driving the cutting edge of science. Even in the first 15 minutes or so, you get a feel for how much heavily these sequencing technologies rely on contributions from many scientific fields, including mathematics, statistics, computer science, chemistry and engineering.

If you pay attention, you can also glean some other interesting tidbits of information... For example, individual humans seem to differ not by the tenths of a percent we've been told, but probably more like 1% or 2%. Humans and chimps?? Well, we differ by more than the 1.2% so often quoted on such public forums as youtube -- apparently the difference is "more like 5-6%" (not to mention our differing number of chromosomes, etc.).

But, getting back to the original question: How do you sequence a genome?

To get a feel how this sort of sequencing works, let's first consider the 23 chromosome pairs in a single cell, all together holding a little over 6 billion DNA base pairs, which we could imagine lining up end-to-end as one single sequence - for example a book with 46 chapters each corresponding to an individual chromosome.

Next, imagine taking millions and millions of random snapshots of that DNA, each snapshot capturing only a very short sequence (e.g. a few hundred base pairs or maybe spanning a page or two if we stick to our "genome-as-a-book" analogy). Taken together, these tiny "snapshot" sequences (actually called "reads") cover the whole genome, with some of those reads overlapping one another. After taking enough snapshots, we know these reads contain all the information in the original genome sequence, but how can we put it all back together?

The answer hinges on the fact that these reads are in many places, overlapping. By using various mathematical and computational methods, all of the pieces can be matched up, eventually yielding the full genome sequence.

You can find more details here and here.


Post a Comment