|
| |
Know Your Enemy: Determining the Genetic Blueprint of Disease-Causing Microorganisms
Advances in molecular biology have led to remarkably fast and accurate methods for analyzing the genetic instructions of disease-causing organisms. A process called "sequencing" reveals the lineup of paired chemical bases that make up the pathogen's genome, the total complement of its DNA. The order of these bases provides the instructions for making each organism a unique life form.
Although less well-known than the $3 billion project to decode the entire human genome, efforts to sequence the entire genomes of pathogenic microbes have been ongoing in research centers, academic institutions, and in private companies. These efforts are enabling scientists to locate genes, to identify potential drug targets, and to understand mutations that contribute to drug resistance. In addition, by comparing the genomes of variant strains, researchers are able to note differences that may explain the relative ability of different strains to cause disease or to evade the immune system.
When scientists identify microbial genes that play a role in disease, drugs can be designed to block the activities controlled by those genes. Because most genes contain the instructions for making proteins, drugs can be designed to inhibit specific proteins. In addition to drugs, genome analysis can be used to predict microbial proteins as candidates for vaccine testing.
Genetic variations detected in different strains of the same pathogen can be used to study the population dynamics of these strains, such as the spread of a virulent or drug-resistant form of a pathogen in a susceptible population. Finally, understanding the genetic basis for both virulence and drug resistance may help predict disease progression and influence the type and extent of patient care and treatment.
Because of their small size, microbes can be sequenced relatively quickly. A bacterium's genome typically comprises 1 to 4 million base pairs of DNA, or pairs of chemical structures represented by the letters A, C, T, and G and T, most of which encode genes. At one-half to 1/100 the size of the smallest bacteria, viruses have even smaller genomes. In contrast, the human genome contains 3 billion base pairs of DNA, approximately 95 percent of which do not code for any genes. The human genome project began in 1990 and a first draft of the genome sequence was completed in 2000.
When the first DNA sequencing methods were developed in the mid-1970s, an individual scientist could sequence only a few hundred DNA base pairs per year. Today, teams of scientists at giant sequencing centers depend on computers, automation, robotics, and other advanced technologies to sequence and assemble millions of bases of DNA annually. For organisms with larger genomes, sequencing may require collaboration and coordination among several such centers.
The speed with which the first microbe sequencing project, Haemophilus influenzae, was completed in July 1995 stunned scientists. H. influenzae is a bacterium that is a common cause of upper respiratory infections, particularly in children. Using newly developed techniques, investigators used a shotgun approach to sequence thousands of fragments of the bacterium's genome. Special computer programs read these sequences and stitched them together by comparing overlapping sequences. The result was one complete circle of DNA containing all of the genetic information of this bacterium.
Encouraged by this success, NIAID has funded projects to sequence the full genomes of many medically important microbes, including the bacteria that cause tuberculosis, gonorrhea, chlamydia and cholera (see Table). Many of these microbes have been completely sequenced and are now being annotated and analyzed. During annotation, the position and function of each gene is predicted using sophisticated computer methods. This information serves as the basis for experimental studies that will help identify the important features of the genome that determine the biology of the microbe and its ability to cause disease.
NIAID grantees deposit the sequence data as it is acquired in specialized and public databases such as GenBank, run by the National Center for Biotechnology Information, where it can be accessed by anyone through the Internet. Access to the sequence data, prior to its publication in peer-reviewed journals, enables the broader research community to jump-start and accelerate their experimental studies. NIAID is working with the World Health Organization, the United Kingdom's Wellcome Trust, and others to identify ways the research community and funding agencies can coordinate efforts and capitalize on the data accrued by these sequencing projects.
Sequencing Basics
DNA consists of two entwined, helical strings of chemical units represented by the letters A, C, T, and G. Depending on the genome's size, sequencing can generally be approached in either of two ways in the effort to determine the pathogen's genes.
For relatively small genomes, every base pair of DNA can be sequenced. In this approach, the entire genome of an organism is cut and pasted into DNA carriers for easier handling. A "clone" refers to a single carrier that contains an inserted DNA fragment, and the collection of clones carrying different DNA fragments is referred to as a "library." The type of carrier used depends in part on the size of the genomic fragment that is being used in each project. Overlapping sequences contained in the different clones enable investigators, using computer software, to assemble, or stitch together, the smaller sequence fragments into one complete DNA molecule.
A complement to whole-genome sequencing, especially for relatively large genomes, is the ability to analyze only the parts of the genome that contain protein-coding genes. In cells, DNA is activated and copied into intermediate molecules called RNA. The base sequence of this expressed RNA directs the synthesis of proteins. In a method called expressed sequence tagging (EST), researchers isolate these RNA templates from cells and convert them back into a DNA form (complementary DNA, or cDNA). These cDNA segments can then be sequenced and analyzed in an effort to determine the function of the proteins that are expressed by the organism.
NIAID is a component of the National Institutes of Health (NIH). NIAID supports basic and applied research to prevent, diagnose, and treat infectious and immune-mediated illnesses, including HIV/AIDS and other sexually transmitted diseases, tuberculosis, malaria, autoimmune disorders, asthma and allergies.
Prepared by:
Office of Communications and Public Liaison
National Institute of Allergy and Infectious Diseases
National Institutes of Health
Bethesda, MD 20892
U.S. Department of Health and Human Services
March 2001
Publications |
Home
Last Updated 04.19.01 (dlb)
|
|
|