The Human Genome Project
The century that opened with rediscoveries of Gregor Mendel's studies on patterns of inheritance in peas closed with a research project in molecular biology heralded as the initial and necessary step for attaining a complete understanding of the hereditary nature of humankind. Both basic science and technological feat, the Human Genome Project (HGP) brought to biology a “big science” model previously confined to physics. Although originating and centered in the U.S., laboratories across the globe contributed to the mapping and sequencing of the haploid human genome's 22 autosomes and 2 sex chromosomes.
The official date of completion was timed to coincide with celebrations of the 50th anniversary of James D. Watson and Francis Crick's discovery of the double-helical structure of DNA. On 12 April 2003, heads of government of the six countries which contributed to the sequencing efforts (the U.S., the U.K., Japan, France, Germany, and China) issued a joint proclamation that the “essential sequence of three billion base pairs of DNA of the Human Genome, the molecular instruction book of human life,” had been achieved (Dept.
of Trade 2003). HGP researchers compared their feat to the Apollo moon landing and splitting the atom, foreseeing the dawn of a new era, “the era of the genome” (NHGRI 2003). What does the “era of the genome” promise?
Bruce Alberts, president of the National Academy of Sciences, characterized the completed human genome sequence as a “tremendous foundation on which to build the science and medicine of the 21st century” (NHGRI 2003). The statement released by the six world leaders in April 2003 expressed the hope that this progress in science and medicine would establish “a healthier future for all the peoples of the globe” (Dept. of Trade 2003).
Philosophical interest in the HGP centers on claims and hopes of this sort and raises a number of questions: How can DNA sequence information provide foundations for scientific and medical knowledge? Who will have access to the potential benefits arising from this research, and will such benefits be justly distributed? What possible harms lie ahead?
This article provides a brief history of the HGP and discusses a range of associated issues that gained the attention of philosophers during the project's planning stages and as it unfolded. Prominent among philosophical concerns are the conceptual foundations of the project and its ethical implications.
📑 Contents
1. Brief History of the Human Genome Project
HGP at the start
The HGP began officially in October 1990, but its origins go back earlier. In the mid-1980s, three scientists independently came up with the idea of sequencing the entire human genome: Robert Sinsheimer, then chancellor of University of California at Santa Cruz, as a way to spend $30 million donated to his institution to build a telescope when that project fell through; Salk Institute researcher Rene Dulbecco as a way to understand the genetic origins of cancer and other diseases; and the Department of Energy's (DOE's) Charles DeLisi as a way to detect radiation-induced mutations, an interest of that agency since the atomic bombings of Hiroshima and Nagasaki. Such a project had become technically feasible due to advances made during the previous decade or two: in the early 1970s, recombinant DNA technologies (use of restriction enzymes to splice DNA, reverse transcriptase to make DNA from RNA, viral vectors to carry bits of DNA into cells, bacterial cloning to multiply quantities of DNA); in the late 1970s, DNA sequencing and use of RFLP (restriction fragment length polymorphism) markers for gene mapping; and in the early to mid-1980s, DNA synthesis, pulsed-field gel electrophoresis, polymerase chain reaction (PCR), and automated DNA sequencing.
Sinsheimer's, Dulbecco's, and DeLisi's idea found supporters among a number of prominent molecular biologists and human geneticists—for example, Walter Bodmer, Walter Gilbert, Leroy Hood, Victor McKusick, and James D. Watson. However, many molecular biologists expressed misgivings. Especially through 1986 and 1987, there were concerns about the routine nature of sequencing and the amount of “junk DNA” that would be sequenced, that the expense and big science approach would drain resources from smaller and more worthy projects, and that knowledge of gene sequence was inadequate to yield knowledge of gene function.[1] In September 1986, committees were established to study the feasibility of a publicly-funded project to sequence the human genome: one by the National Research Council (NRC) on scientific merit, and one by the Office for Technology Assessment (OTA) as a matter of public policy. Both committees released reports in 1988. The OTA report, Mapping Our Genes: Genome Projects: How Big, How Fast? downplayed the concerns of scientist critics by emphasizing that there was not one but many genome projects, that these were not on the scale of the Manhattan or Apollo projects, that no agency was committed to massive sequencing, and that the study of other organisms was needed to understand human genes. The NRC report, Mapping and Sequencing the Human Genome, sought to accommodate the scientists’ concerns by formulating recommendations that genetic and physical mapping and the development of cheaper, more efficient sequencing technologies precede large-scale sequencing, and that funding be provided for the mapping and sequencing of nonhuman (“model”) organisms as well.
It was the DOE that made the first push toward a “Big Science” genome project: DeLisi advanced a five-year plan in 1986, $4.5 million was allocated from the 1987 budget, and recognizing the boost the endeavor would provide to national weapons laboratories, Senator Pete Domenici from New Mexico introduced a bill in Congress. The DOE undertaking produced consternation among biomedical researchers who were traditionally supported by the NIH's intramural and extramural programs—for example, Caltech's David Botstein referred to the initiative as “DOE's program for unemployed bomb-makers” (in Cook-Deegan 1994, p. 98). James Wyngaarden, head of the NIH, was persuaded to lend his agency's support to the project in 1987. Funding was in place in time for fiscal year (FY) 1988 with Congress awarding the DOE $10.7 million and the NIH $17.2 million.[2] The DOE and NIH coordinated their efforts with a Memorandum of Understanding in 1988 that agreed on an official launch of the HGP on October 1, 1990 and an expected date of completion of 2005. Total cost estimated by the NRC report was $3 billion.
The project's specific goals at the outset were: (i) to identify all genes of the human genome (initially estimated to be 100,000); (ii) to sequence the approximately 3 billion nucleotides of the human genome; (iii) to develop databases to store this information; (iv) to develop tools for data analysis; (v) to address ethical, legal, and social issues; and (vi) to sequence a number of “model organisms,” including the bacterium Escherichia coli, the yeast Saccharomyces cerevisiae, the roundworm Caenorhabditis elegans, the fruitfly Drosophila melanogaster, and the mouse Mus musculans. The DOE established three genome centers in 1988–89 at Lawrence Berkeley, Lawrence Livermore, and Los Alamos National Laboratories; as Associate Director of the DOE Office of Health and Environmental Research (OHER), David Galas oversaw the DOE's genome project from April 1990 until he left for the private sector in 1993. The NIH instituted a university grant-based program for human genome research and placed Watson, co-discoverer of the structure of DNA and director of Cold Spring Harbor Laboratory, in charge in 1988. In October 1989, the Department of Health and Human Services established the National Center for Human Genome Research (NCHGR) at the NIH with Watson at the helm. During 1990 and 1991, Watson expanded the grants-based program to fund seven genome centers for five-year periods to work on large-scale mapping projects: Washington University, St. Louis; University of California, San Francisco; Massachusetts Institute of Technology; University of Michigan; University of Utah; Baylor College of Medicine; and Children's Hospital of Philadelphia.
As the HGP got underway, a number of philosophers weighed in on its scientific merit—in terms of cost, potential impact on other areas of research, ability to lead to medical cures, and the usefulness of sequence data (Kitcher 1995; Rosenberg 1995; Tauber and Sarkar 1992; Vicedo 1992). However, of particular interest to philosophers is goal (v) concerning ethical, legal, and social issues. At an October 1988 news conference called to announce his appointment, Watson, in an apparently off-the-cuff response to a reporter who asked about the social implications of the project, promised that a portion of the funding would be set aside to study such issues (Marshall 1996c). The result was the NIH/DOE Joint Working Group on Ethical, Legal, and Social Implications (ELSI) of Human Genome Research, chaired by Nancy Wexler, which began to meet in September 1989.[3] The Joint Working Group identified four areas of high priority: “quality and access in the use of genetic tests; fair use of genetic information by employers and insurers; privacy and confidentiality of genetic information; and public and professional education” (Wexler in Cooper 1994, p. 321). The NIH and DOE each established ELSI programs: philosopher Eric T. Juengst served as the first director of the NIH-NCHGR ELSI program from 1990 to 1994. ELSI was funded initially to the tune of three percent of the HGP budget for both agencies; this was increased to four and later five percent at the NIH.
Map first, sequence later
As the NRC report had recommended, priority at the outset of the project was given to mapping rather than sequencing the human genome. HGP scientists sought to construct two kinds of maps. Genetic maps order polymorphic markers linearly on chromosomes; the aim is to have these markers densely enough situated that linkage relations can be used to locate chromosomal regions containing genes of interest to researchers. Physical maps order collections (or “libraries”) of cloned DNA fragments that cover an organism's genome; these fragments can then be replicated in quantity for sequencing. The joint NIH-DOE five-year plan released in 1990 set specific benchmarks: a resolution of 2 to 5 centimorgans (cM) for genetic linkage maps and physical maps with sequence-tagged site (STS) markers (unique DNA sequences 100–200 base pairs long) spaced approximately 100 kilobases (kb) apart and 2-megabase (Mb) contiguous overlapping clones (“contigs”) assembled for large sections of the genome. Sequencing needed to be made more efficient and less costly: aims were to reduce sequencing costs to $.50 per base and to complete 10 million bases of contiguous DNA (0.3 percent of the human genome) but otherwise to focus efforts on the smaller genomes of less complex model organisms (Watson 1990). HGP goals were facilitated by a number of technological developments during this initial period. For physical mapping, yeast artificial chromosomes (YACs) introduced in 1987 (Burke et al. 1987) permitted much larger segments of DNA to be ordered and stored for sequencing than was possible with plasmid or cosmid libraries. A new class of genetic markers, microsatellite repeats, was identified in 1989 (Litt and Luty 1989; Tautz 1989; Weber and May 1989); because these sets of tandem repeats of short (either dinucleotide, trinucleotide, or tetranucleotide) DNA sequences are more highly polymorphic and detectable by PCR, microsatellites quickly replaced RFLPs as markers of choice for genetic linkage mapping and furnished the STS markers which facilitated the integration of genetic and physical maps. Another technological achievement—the combined use of reverse transcription, PCR, and automated sequencing to map expressed genes—led to administrative changes at the NIH when, in April 1992, Watson resigned from his position as director of the NCHGR following a conflict with NIH director Bernadine Healy over gene patenting. In 1991, while working at the NIH, J. Craig Venter sequenced small portions of cDNAs from existing libraries to provide identifying expressed sequence tags (ESTs) of 200–300 bases which he then compared to already identified genes from various species found in existing databases (Adams et al. 1991).[4] Watson disagreed with Healy's decision to approve patent applications for the ESTs despite lack of knowledge of their function.[5] Soon after Watson's departure, Venter left NIH for the private sector.[6]
Francis Collins, an MD-PhD whose lab at University of Michigan co-discovered genes associated with cystic fibrosis and neurofibromatosis and contributed to efforts to isolate the gene for Huntington's disease, was appointed by Healy as Watson's replacement, and he began at the NCHGR in April 1993. Collins established an intramural research program at the NCHGR to complement the extramural program of grants for university-based research which already existed; ELSI remained a grant-funded program. The original NIH-DOE five-year plan was updated in 1993. The new five-year plan, in effect through 1998, accommodated progress that had been made in mapping, sequencing, and technological development (Collins and Galas 1993). The goal of a 2–5 cM genetic map was expected to be met by the 1995 target date. The deadline for a physical map with STS markers at intervals of 100 kb was extended to 1998; a map with intervals averaging 300 kb was expected by 1995 or 1996. Although the goal of $.50 per base cost of sequencing was projected to be met by 1996, it was recognized that this would be insufficient to meet the 2005 target date. The updated goal was to build up to a collective sequencing capacity of 50 Mb per year and to have 80 Mb of DNA (from both human and model organism genomes) sequenced by the end of 1998. This would be achieved by increasing the number of groups working on large-scale sequencing and heightening efforts to develop new sequencing technologies. Accordingly, in November 1995, the U.K.'s Wellcome Trust launched a $75 million, seven-year concentrated sequencing effort at the Sanger Centre in Cambridge, and in April 1996, the NCHGR awarded grants totaling $20 million per year for six centers (Houston's Baylor College of Medicine, Stanford University, The Institute for Genomic Research [TIGR], University of Washington-Seattle, Washington University School of Medicine in St. Louis, and Whitehead Institute for Biomedical Research—MIT Genome Center) to pilot high-volume sequencing approaches (Marshall 1996a).
Although the HGP's inceptions were in the U.S., it had not taken long for mapping and sequencing the human genome to become an international venture (see Cook-Deegan 1994). France began to fund genome research in 1988 and had developed a more centralized, although not very well-funded, program by 1990. More significant were the contributions of Centre d’Etudes du Polymorphisme Humain (CEPH) and Généthon. CEPH, founded in 1983 by Jean Dausset, maintained a collection of DNA donated by intergenerational families to help in the study of hereditary disease; Jean Weissenbach led an international effort to construct a complete genetic map of the human genome using the CEPH collection; later, with funding from the French muscular dystrophy association (AFM), director Daniel Cohen set out to construct a YAC clone library for physical mapping and oversaw the launching of Généthon in 1991 as an industrial-sized mapping and sequencing operation funded by the AFM. The U.K.'s genome project received its official start in 1989 although Sydney Brenner had commenced genome research at the Medical Research Council (MRC) laboratory several years before this. MRC funding was supplemented with private monies from the Imperial Cancer Research Fund, and later, the Wellcome Trust. The Sanger Centre, led by John Sulston and funded by Wellcome and the MRC, opened in October 1993. A combined four-year, 15-million-euro genome program by the European Community (E.C.) commenced in 1990. Germany, its citizens all too aware of abuses in the name of genetics, lagged behind other European countries: although individual researchers received government funds for genome research in the late-1980s and participated in the E.C. initiative, no actual national genome project was undertaken until 1995 (Kahn 1996). Japan, ahead of the U.S. in having funded the development of automated sequencing technologies since the early 1980s, was the major genome player outside the U.S. and Europe with several government agencies beginning small-scale genome projects in the late-1980s and early- 1990s, but a frequent target of U.S. criticism for the size of its investment relative to GNP.[7] China was the latecomer on the international scene: with 250 million yuan ($30 million) over three years from government and industry, the Chinese National Human Genome Center with branches in Beijing and Shanghai opened in July 1998, and was followed in 1999 by the Beijing Genomics Institute.[8]
As 1998, the last year of the revised five-year plan and midpoint of the project's projected 15-year span, approached, many mapping goals had been met. In 1994, Généthon completed a genetic map with more than 2000 microsatellite markers at an average spacing of 2.9 cM and only one gap larger than 20 cM (Gyapay et al. 1994), though the genetic mapping phase of the project did not finally come to a close until March 1996 with publication of comprehensive genetic maps of the mouse and human genomes in Nature: the mouse map produced by scientists at Whitehead-MIT Center for Genome Research contained 7,377 genetic markers (both microsatellites and RFLPs) with an average spacing of 0.2 cM (Dietrich et al. 1996); the human map produced by scientists at Généthon contained 5,264 microsatellite markers located to 2335 positions with an average spacing of 1.6 cM (Dib et al. 1996). Physical mapping was on track: in 1995, a physical map with 94 percent coverage of the genome and STS markers at average intervals of 199 kb was published (T. Hudson et al. 1995), as was CEPH's updated physical map of 225 YAC contigs covering 75 percent of the genome (Chumakov et al. 1995); however, bacterial artificial chromosomes (BACs), developed in DOE-funded research at Caltech in 1992 (Shizuya et al. 1992), soon replaced YACs because of their greater stability in propagating DNA for sequencing. Sequencing itself presented more of a challenge. The genomes of the smallest model organisms had been sequenced. In April 1996, an international consortium of mostly European laboratories published the sequence for S. cerevisiae which was the first eukaryote completed, with 12 million base pairs and 5,885 genes and at a cost of $40 million (Goffeau et al. 1996). In January 1997, University of Wisconsin researchers completed the sequence of E. coli with 4,638,858 base pairs and 4,286 genes (Blattner et al. 1997). However, despite ramped-up sequencing efforts over the past several years at the Sanger Centre and NHGRI-funded centers (the NCHGR had been elevated to the status of a research institute in 1997 and renamed the National Human Genome Research Institute), with only three percent of the human genome sequenced, sequencing costs hovering at $.40/base, and the desired high-output not yet achieved by the sequencing centers, and about $1.8 billion spent, doubts existed about whether the target date of 2005 could be met.
Suddenly, the HGP found itself challenged by sequencing plans from the private sector. In May 1998, TIGR's Venter announced he would partner with Michael Hunkapiller's company Applied Biosystems (ABI), a division of Perkin-Elmer Corporation which manufactured sequencing machines, to form a new company which would sequence the entire genome in three short years and for a fraction of the cost. The foreseen profits rested in the construction of a “definitive” database that would outdo Genbank by integrating medical and other information with the basic sequence and polymorphisms. The company, based in Rockville, MD and later named Celera Genomics, planned to use “whole-genome shotgun” (WGS) sequencing, an approach different from the HGP's. The HGP confined the shotgun method to cloned fragments already mapped to specific chromosomal regions: these are broken down into smaller bits then amplified by bacterial clones, sequences are generated randomly by automated machines, and computational resources are used to reassemble sequence using overlapping areas of bits. Shotgunning is followed by painstaking “finishing” to fill in gaps, correct mistakes, and resolve ambiguities. What Celera was proposing for the shotgun method was to break the organism's entire genome into millions of pieces of DNA with high-frequency sound waves, sequence these pieces using hundreds of ABI's new capillary model machines, and reassemble the sequences with one of the world's largest civilian supercomputers without the assistance provided by the preliminary mapping of clones to chromosomes. When WGS sequencing was considered as a possibility by the HGP, it was rejected because of the risk that repeat sequences would yield mistakes in reassembly.[9] But Venter by this time had successfully used the technique to sequence the 1.83 million nucleotide bases of the bacterium Hemophilus influenzae—the first free-living organism to be completely sequenced—in a year's time (Fleischmann et al. 1995).[10]
Race to the genome
The race to sequence the genome was on. The publicly-funded scientists downplayed the media image of a race often over the next couple of years, but they were certainly propelled by worries that funding would dry up before the sequence was complete given private sector willingness to take over and that the sequence data would become proprietary information—the Bermuda Accord, agreed to in February 1996 by the world's major sequencing laboratories which at the time included Venter's TIGR, required the public release of sequence data every 24 hours. Wellcome more than doubled its funds to the Sanger Centre (to £205 million) and the center changed its goal from sequencing one-sixth of the genome to sequencing one-third, and possibly one-half (Dickson 1998). The NHGRI and DOE published a new five-year plan for 1998-2003 (Collins et al. 1998). The plan moved the final completion date forward from 2005 to 2003 and aimed for a “working draft” of the human genome sequence to be completed by December 2001. This would be achieved by delaying the finishing process, no longer going clone-by-clone to shotgun, reassemble, and finish the sequence of one clone before proceeding to the next. A physical map of 41,664 STS markers was soon published (Deloukas et al. 1998), and so the physical mapping goal was met, but with only six percent of the human genome sequence completed, the plan called for new and improved sequencing technologies which could increase the sequencing capacity from 90 Mb per year at about $.50 per base to 500 Mb per year at no more than $.25 per base. Goals for completing the sequencing of the remaining model organisms were also set: December 1998 for C. elegans which was 80 percent complete, 2002 for D. melanogaster which was nine percent complete, and 2005 for M. musculus which was still at the physical mapping stage.
An interim victory for the publicly-funded project followed when, on schedule, the first animal sequence, that of C. elegans with 97 million bases and 19,099 genes, was published in Science in December 1998 (C. elegans Sequencing Consortium 1998). This was the product of a 10-year collaboration between scientists at Washington University (headed by Bob Waterston) and the Sanger Centre (headed by John Sulston), carried out at a semi-industrial scale with more than 200 people employed in each lab working around the clock. In March 1999, the main players—the NHGRI, Sanger Centre, and DOE—advanced the date of completion of the “working draft”: five-fold coverage of at least 90 percent of the genome was to be completed by the following spring (Pennisi 1999; Wadman 1999). This change reflected improved output of the new model of automated sequencing machines, diminished sequencing costs at $.20 to $.30 per base, and the desire to speed up the release of medically relevant data. NHGRI would take responsibility for 60 percent of the sequence, concentrating these efforts at only three centers with Baylor, Washington University, and Whitehead-MIT sharing $81.6 million over the ensuing 10 months; 33 percent of the sequence would be the responsibility of the Sanger Centre whose funds from Wellcome increased from $57 million to $77 million for the year; and the remaining sequence would be supplied by the DOE's Joint Genome Institute (JGI) in Walnut Creek, CA into which its three centers had merged in January 1997. The smaller international centers involved in sequencing were not consulted on this restructuring, but were later brought on board on the condition that they could keep up with the pace. The first chromosomes to be completed (and this was to finished, not working draft, standards) were the two smallest: the sequence for chromosome 22 was published by scientists at the Sanger Centre and partners at University of Oklahoma, Washington University in St. Louis and Keio University in Japan in December 1999 (Dunham et al. 1999); the sequence for chromosome 21 was published by an international consortium of mostly Japanese and German labs—half at RIKEN—in May 2000 (Hattori et al. 2000). The remaining chromosomes lagged behind, though the DOE announced completion of working drafts of chromosomes 5, 16, and 19 with three-fold coverage in April 2000. The progress made by the publicly-funded project could be monitored because sequence data were released at 24-hour intervals, but Celera's progress was more difficult to assess. HGP scientist Maynard Olson charged that Celera was doing “science by press conference” (in Davies 2002, p. 153). Certainly, Celera's press conferences gave the impression it was ahead in the race: on 10 January 2000 the company announced completion of 90 percent of the human genome sequence, and on 6 April 2000 the company announced completion of three-fold coverage of the DNA of one male donor. But there was also evidence that Celera did remain a threat: the validity of the WGS sequencing approach was demonstrated in March 2000 when Celera and the (publicly-funded) Berkeley Drosophila Genome Project published the sequence of D. melanogaster of about 180 Mb (Adams et al. 2000).
In June 2000, the contest ended in what appeared to be a tie for the prize, but was more an arranged truce. On 26 June 2000, Collins, Venter, and the DOE's Ari Patrinos joined U.S. President Bill Clinton (and British Prime Minister Tony Blair by satellite link) at a White House press conference to announce that the human genome had been sequenced. That Collins and Venter even shared the limelight on that day was itself a tremendous feat. The negotiated draw at the finish line permitted HGP scientists to save face and their upstart competitor to minimize the risk of alienating university-based researchers and losing their business. The agreement between parties included eventual simultaneous publication of their results. However, not only had results not yet been readied for publication, neither of the two sequence maps was complete (Pennisi 2000). The HGP had not met its previous year's goal of a working draft covering 90 percent of the genome: Collins reported that ordered BACs existed for 97 percent of the genome and that BACs for 85 percent of the genome had been sequenced, with 24 percent of the genome sequence in finished form, 22 percent of the genome sequence in near-finished form, and 38 percent of the genome sequence in provisional form. Assisted by its researchers’ access to HGP data stored on public databases, Celera's efforts were accepted as being further along: the company's press release that day announced completion of the “first assembly” of the human genome with 99 percent coverage. An editorial in Nature described the fanfare of 26 June as an “extravagant” example—one reaching “an all-out zenith or nadir, according to taste”—of scientists making public announcements not linked to peer-reviewed publication, here to bolster share prices (Celera) and for political effect (the HGP) given the “months to go before even a draft sequence will be scientifically useful” (Anonymous 2000, p. 981). The peer-reviewed publications came almost eight months later. Plans for joint publication in Science broke down when terms of agreement over data release could not be negotiated: Science's editors were willing to publish Celera's findings without Venter meeting the standard requirement that the sequence data be submitted to GenBank; Celera would instead make the data available on its own website. Press conferences in London and Washington, D.C. on 12 February preceded publications that week—by HGP scientists in Nature on 15 February 2001 and by Venter's team in Science on 16 February 2001. The HGP draft genome sequence, prepared based on map and sequence data available on 8 October 2000, covered about 94 percent of the genome, with about 25 percent in the finished form already attained for chromosomes 21 and 22. Indeed, the authors themselves described it as “an incomplete, intermediate product” which “contains many gaps and errors” (International Human Genome Sequencing Consortium 2001, p. 871). The results published by Celera, based on assemblies completed on 1 October 2001 using two different computational methods, had 84–90 percent of the genome covered by scaffolds at least 100 kb in length, with the composition of the scaffolds averaging 91–92 percent sequence and 8–9 percent gaps, leaving 93,857–105,264 gaps in total (Venter et al. 2001). In the end, Celera's published genome assembly made significant use of the HGP's publicly available map and sequence data, which left open the question whether WGS sequencing alone would have worked.[11]
Since the gaps in the sequence were unlikely to contain genes, and only genes as functional segments of DNA have potential commercial value, Celera was happy to move on and leave these gaps for the HGP scientists to fill in. Celera was faced with deciding what sort of company it would be: sequences from three different mouse strains were added to help attract subscribers to its database, and a brief foray was made into proteomics, but Venter resigned as CEO in January 2002 with the company's decision to focus on drug discovery rather than information (Davies 2002). Despite being timed to coincide with celebrations of the 50th anniversary of the Watson-Crick discovery of the double-helical structure of DNA, there was less fanfare surrounding the official date of completion of the HGP in April 2003, two years earlier than had been anticipated at the time of its official launch in October 1990, and several months earlier than called for in the most recent five-year plan. Americans had terrorism and war on their minds. In the end, sequencing—the third phase of the publicly-funded project—was carried out at 16 centers in six countries by divvying up among them sections of chromosomes for sequencing. 85 percent of the sequencing, however, was done at the five major sequencing centers (Baylor, Washington University, Whitehead-MIT, Sanger Center, and DOE's JGI), with the Sanger Centre responsible for nearly one-third. The cost was lower than anticipated, with $2.7 billion spent by U.S. agencies and £150 million spent by Wellcome Trust. The “finished” reference DNA sequence for Homo sapiens—all 3.1 billion nucleotide bases—is publicly accessible on the Internet (NCBI Human Genome Resources). If the As, Ts, Cs, and Gs of the genome sequence were printed in standard type, they would fill 75,490 pages of the New York Times (Wade 2003).[12]
In the project's early years, Norton Zinder, who chaired the NIH's Program Advisory Committee for the Human Genome, characterized it in this way: “This Project is creating an infrastructure for doing science; it's not the doing of the science per se. it will provide the biological community with the basic materials for doing research in human biology” (in Cooper 1994, p. 74). The published human genome reference sequences are part of that infrastructure, serving as tools for investigating human genetic variation. So far gene identification has been successful for the single genes of large effect implicated in rare Mendelian disorders. Difficulties arise for identifying the multiple genes of variable effect that interact with nongenetic factors in more common, complex conditions and for understanding the physiological processes associated with the development of these phenotypes. One approach to overcoming these difficulties focuses on relatively genetically homogeneous populations with members for whom extensive clinical data are available, as in the case of the Icelandic genome project, and basically extends the methods used for linkage mapping for diseases within families. Another approach is to conduct large-scale case-control association studies between phenotypes of interest and genetic markers. For both these approaches, single nucleotide polymorphisms (SNPs) are preferred as markers over the microsatellites used for genetic and physical mapping by the HGP. Worried about the private sector's efforts to patent SNPs, which would make them costly to use for research, the NHGRI-DOE's five-year plan for 1998–2003 included the goal of mapping 100,000 SNPs by 2003 (Collins et al. 1998). The development of a public database of SNPs received a $138 million push from the International HapMap Project, a three-year public-private partnership completed in 2005 that mapped variation in four population groups. The NHGRI's involvement in the HapMap Project was part of the continuing leadership role in genome research it envisioned for itself upon completion of the HGP (Collins et al. 2003). Other projects include ENCODE, which began as a pilot project to study gene function by analyzing one percent of the genome and is now looking at the remaining 99 percent, and more recently, clinENCODE, in which disease risk is being calculated for 400 people based on the corresponding one percent of the genome as a step toward personalized medicine. However, the infrastructure of mapping and sequencing technologies developed as part of the HGP—especially the ability to sequence entire genomes of organisms—has changed the way biology, not just human biology, is done. It is now recognized that genome structure by itself tells us only so much. In functional genomics, the interest is in how genomes—not just individual genes anymore—function. By studying the coordinated expression of the genome's various segments in different tissues at different times, researchers are coming to better understand organismal development. In comparative genomics, the study of genomic structure and function in different species is bringing about similar gains in understanding evolution. And genomics is now complemented by the field of proteomics which studies the structure and function of all of an organism's proteins, called the proteome.
2. Philosophy and the Human Genome Project
At the June 2000 White House press conference, President Clinton compared the feat of mapping and sequencing the human genome to the mapping of the Northwest Passage by early-nineteenth century explorers Meriwether Lewis and William Clark:
Nearly two centuries ago, in this room, on this floor, Thomas Jefferson and a trusted aide spread out a magnificent map, a map Jefferson had long prayed he would get to see in his lifetime. The aide was Meriwether Lewis and the map was the product of his courageous expedition across the American frontier, all the way to the Pacific. It was a map that defined the contours and forever expanded the frontiers of our continent and our imagination.
Today the world is joining us here in the East Room to behold the map of even greater significance. We are here to celebrate the completion of the first survey of the entire human genome. Without a doubt, this is the most important, most wondrous map ever produced by humankind.
Clinton continued on to say that he considered this “epoch-making triumph of science and reason” to be merely a starting point. Three “majestic horizons” lay immediately ahead: by 2003, production of a final version of the sequence map that would be complete and accurate; biotechnological development in the private sector based on the identification of all human genes and their functions; and ethical respect for “our oldest and most cherished human values” to ensure that genome science benefits “all citizens of the world,” protects privacy, and prevents discrimination (White House 2000).
Clinton's comparison of the human genome sequence map to Lewis and Clark's map of the Northwest Passage is perhaps less gratuitous than it might appear. Some members of the 1804–1806 expedition, the “Corps of Discovery,” sought to obtain natural scientific and anthropological knowledge over the course of their travels. The HGP shares the Enlightenment ideals of this period, especially the faith in scientific progress, the goal of systematic knowledge, and the confidence that universal benefits for humanity would ensue from the scientific pursuit of truth. Scientist Leroy Hood expressed the belief that “we will learn more about human development and pathology in the next twenty-five years than we have in the past two thousand” (1992, p. 163). He predicted that the HGP would facilitate movement from a reactive to preventive mode of medicine which would “enable most individuals to live a normal, healthy, and intellectually alert life without disease” (p. 158). The Lewis and Clark journey was an important symbol for encouraging Americans to move westward (the frontier was declared gone by 1890); similarly, “getting” the genome was represented by HGP proponents as a “frontier” of knowledge that, like the moon landing, needed to be conquered. But most important are the colonialist and economic aims associated with this early nineteenth-century “voyage of discovery.” Jefferson sought to establish a U.S. presence beyond its borders, in lands long inhabited by peoples indigenous to the Americas and to which Spain had already staked its claim. He made clear to Lewis that the principal aim of the journey was commercial: “The Object of your mission is to explore the Missouri river & such principal stream of it as by it's course and communication with the waters of the Pacific ocean, whether the Columbia, Oregon, Colorado or any other river may offer the most direct & practicable water communication across this continent for the purpose of commerce” (Discovering Lewis & Clark). The infrastructure to be developed with the HGP was similarly presented as an opportunity to “secure the leadership of the United States in biotechnology and present U.S. industry with a wealth of opportunities” (Hood 1992, p. 163). Legislative changes were enacted in the 1980s to encourage the commercial development of federally funded research: universities and other nonprofit institutions were allowed to apply for patents on such research and tax incentives were provided to the private sector to encourage investment. Although Lewis and Clark depended extensively throughout their journey on the assistance of Indians and French traders they encountered, they regarded the lands they covered as “virgin territory” that awaited the arrival of “civilized men” to be named and claimed. Similar attitudes are implicated in controversies over the commercialization of genomics research and intellectual property and patent rights: organizations representing indigenous peoples charge that the patenting of human genes and cell lines is a continuation of the “bioprospecting” and “biopiracy” carried out by multinational corporations in securing patents on medicinal and food uses of plants which have been long a part of traditional knowledge (Shiva 1996).
In the early years of the HGP, the DOE's David Galas expressed skepticism that ELSI-sorts of concerns were anything new: “there are no new problems. Issues concerning privacy, confidentiality, and discrimination will become much more pressing once the Genome Project generates the tools to diagnose diseases presymptomatically. The basic problems, however, are not new—they will simply be exacerbated” (in Cooper 1994, p. 167). Although legal scholar George Annas agreed there were no new problems, he argued that the combination and degree of problems involved did make the HGP unique: “there are probably no unique issues raised by the Human Genome Initiative. On the other hand, this project raises all of the issues in a much more focused manner (certainly a difference in degree if not in kind), and the fact that all of these issues are implicated in the project may itself make the project societally unique” (1990, p. 640). Many of the issues are of interest to philosophers: these include conceptual questions pertaining to scientific knowledge itself and the ethical ramifications of such knowledge and related technological developments. Philosophers of science, ethicists, political theorists and philosophers working in other areas have benefited from ELSI-related funding. There is now a vast literature on human genome-related topics, and this entry can do no more than provide a synopsis regarding what questions have been asked, what range of responses has been offered, and what remains for philosophical attention and debate.
2.1 Conceptual Foundations of the Human Genome Project
Bets placed during the HGP over how many genes would be discovered, as well as surprise expressed when far fewer than the original estimate were found (about 25,000–30,000 rather than 100,000—the rice genome apparently has more genes!) (Normile and Pennisi 2002; Pennisi 2003), suggest that “gene”—a term introduced by Wilhelm Johannsen in 1909—names a well-defined concept. The report that because of alternate splicing each gene is responsible for three or four proteins makes the same assumption. As does drawing distinctions between normal and abnormal genes, or seeking to isolate disease genes. The assumption is not very well substantiated, however. Philosophers of biology recognize that the genes of classical genetics, molecular genetics, evolutionary genetics, and more recently developmental genetics do not necessarily map onto each other.[13] Difficulties arriving at a definitive gene concept arise even when we confine ourselves to contemporary molecular biology. Evelyn Fox Keller (2000) points out an irony which has ensued from the HGP's successes: even though gene-talk is more pervasive than ever in the popular and scientific presses, the concept of the gene, whether defined structurally or functionally, has been “radically undermined” (p. 5). Keller provides this description of current laboratory practices: “As we listen to the ways in which the term is now used by working biologists, we find that the gene has become many things—no longer a single entity but a word with great plasticity, defined only by the specific experimental context in which it is used” (p. 69).
Recent philosophical efforts to define genes have sought to capture these practices. C. Kenneth Waters (1994) recognizes that specific research contexts determine whether genes are considered to include introns as well as exons, or regulatory or promoter regions as well as open reading frames (ORFs), but argues that what remains “fundamental” across these contexts is the concept of a gene as a stretch of DNA the linear sequence of which provides a template for a gene product, whether mRNA transcript or polypeptide. Because of problems posed for Waters’ account by mRNA splicing and editing, Eva Neumann-Held (1999) recommends replacing the “classical molecular gene concept” of a stretch of DNA coding for a single polypeptide with a “molecular process gene concept” which includes not just the relevant stretches of DNA but the entire cellular context in which polypeptides are produced. Lenny Moss (2003) identifies two gene concepts: the preformationist gene, Gene-P, defined by its relationship to a phenotype, is of instrumental utility for molecular geneticists—for example, the BRCA1 gene is used to predict breast cancer risk; the epigenesist gene, Gene-D, defined by its molecular sequence, serves as a “developmental resource” in providing a template for RNA and protein synthesis but is indeterminate with respect to phenotype since this depends on other developmental resources and the cellular and extracellular contexts.[14] Paul Griffiths and Karola Stotz (2006) distinguish three gene concepts: “instrumental genes” remain important in molecular genetics when relationships between genotype and phenotype are under investigation; “nominal molecular genes” are specific DNA sequences annotated by researchers as genes for structural reasons such as presence of ORFs; “postgenomic molecular genes” are not defined by structure but “by the way DNA sequences are used in particular cellular and broader contexts” (p. 515).
Given this context-dependence in what genes are considered to be and do, it seems that pluralism has become the order of the day, for genes as for species. Along these lines, John Dupré (2004) advocates “an atheoretical pluralism” that abandons any pretence to a “theoretical core to the concept”: simply, “a gene is any bit of DNA that anyone has reason to name and keep track of” (pp. 332–333).[15] Keller (2000) agrees that the theoretical importance of genes has faded; she writes: “it seems evident that the primacy of the gene as the core explanatory concept of biological structure and function is more a feature of the twentieth century than it will be of the twenty-first” (p. 9). She forecasts the emergence of new language; this is a situation for which Philip Kitcher believed molecular biology was ripe even 15 years ago when he wrote: “it is hard to see what would be lost by dropping talk of genes from molecular biology and simply discussing the properties of various interesting regions of nucleic acid” (1992, p. 130). Keller believes that gene-talk has served a purpose though, providing a flexibility which permits communication across those specific experimental practices within which “gene” attains precision. Hans-Jörg Rheinberger (2000) takes this argument one step further: gene concepts are not merely useful in spite of their ambiguity, they are useful in virtue of their ambiguity because, as “tools of research, they must reach out into the realm of what we do not yet know” (p. 223). He reminds us that this is nothing new: “The spectacular rise of molecular biology has come about without a comprehensive, exact, and rigid definition of what a gene is” (p. 222). Keller's and Rheinberger's views present an evident challenge to philosophical intuitions that scientific practice is furthered by arriving at precise definitions of basic concepts.
Early in the debates surrounding plans for the HGP, questions arose concerning what it means to map and sequence the human genome—“get the genome,” as Watson (1992) put it. About these concerns, McKusick (1989) wrote: “The question often asked, especially by journalists, is ‘Whose genome will be sequenced?’ The answer is that it need not, and surely will not, be the genome of any one person. Keeping track of the origin of the DNA that is studied will be important, but the DNA can come from different persons chosen for study for particular parts of the genome” (p. 913). The HGP and Celera reference sequences are indeed composites based on chromosomal segments that originate from different individuals: the sequence in any given region of the genome belongs to a single individual, but sequences in different regions of the genome belong to different individuals. However, in both cases, the majority of the sequence originates from just one person. As HGP sequencing efforts accelerated, concerns arose that only four genomes, a couple of which belonged to known laboratory personnel, were being used for physical mapping and sequencing (Marshall 1996b). The decision was made to construct 10 new clone libraries for sequencing with each library contributing about 10 percent of the total DNA. In the end, 74.3 percent of the total number of bases sequenced was derived from a single clone library—that of a male, presumably from the Buffalo area; seven other clone libraries contributed to an additional 17.3 percent of the sequence (International Human Genome Sequencing Consortium 2001, p. 866). A similar proportion—close to 71 percent—of the Celera sequence belongs to just one male even though five ethnically diverse donors were selected; incredibly enough, rumors have been confirmed that this individual is Venter himself (McKie 2002).
The deeper question, of course, is how we might understand a single human genome sequence, a composite that belongs to no actual individual in its entirety and only a handful of individuals in its parts, to be representative of the entire species. This seems to ignore the extensive genetic variability which exists. The functional equivalence of many DNA polymorphisms led two early critics of the HGP to argue that “there simply is no such entity as a ‘representative sequence’ or the human (or any) genome” making it “fallacious and even dangerous to call any one ‘normal’” (Sarkar and Tauber 1991, p. 691). Another critic pointed out that problems with the idea of a representative sequence persist even when consideration is limited to DNA differences that are not functionally equivalent but related to health and disease: the sequence will contain unknown defective genes (since no one, including donors, is free of these), there is a heterogeneity of mutations even in so-called single gene diseases, and it is impossible to identify the genetic basis of a disorder simply by comparing the sequences of sick and well people since there will be many differences between them (Lewontin 2000 [1992]). For Gilbert (1992), these criticisms of representativeness arise from a failure to appreciate the difference between the approaches of molecular biologists who attend to similarities and evolutionary biologists who attend to differences within the species: “The human genome project … is directed toward a molecular biologist's view of a species rather than a population biologist's view. The latter views a species as the envelope of all possible variants that can breed together; the importance of that envelope is that different aspects of a species population will be drawn forth if you change the environment. Molecular biologists generally view the species as a single entity, sharply defined by a set of genes and a set of functions that makes up that entity” (p. 84). Gilbert held that the two approaches are consistent with each other, but many evolutionist critics of the HGP—both scientists and philosophers—did not, deriding the aims of mapping and sequencing the human genome as a throwback to anti-evolutionary, preDarwinian, typological, and essentialist thinking.[16] The functional approach of molecular biologists alluded to by Gilbert is said to represent genetic variation in improperly normative ways, whereas “in evolutionary biology, variation is not the same as deviation” (Hull 1994, p. 208). When molecular geneticists view mutations as abnormal, not in the sense that they are rare or a change in form, but as “errors” in the genetic code or “damage” to the genome's proper structure, they impose an arbitrary a priori categorization: “it is genetic ‘errors’ that made us as a biological species: we humans are integrated aggregates of such ‘errors.’ Genetic variation is the source of evolution; it is the reason why there could be primates and not just protists or their precursors” (Limoges 1994, p. 124).
There are related worries that the human genome reference sequence will arbitrate a standard of genetic normality; for example, the application of concepts like “genetic error” and “damage” to the genome institutes a call for correction or repair (Limoges 1994; also Murphy 1994). McKusick (1989) has defended the HGP's approach as “consistent with that of most biological research which depends on a few, and even on single individuals, to represent the whole, and with the fact, recognized by geneticists, that there is no single normal, ideal, or perfect genome” (p. 913). However, the normal-abnormal distinction is fundamental to the structure-function studies of proximate fields of biology like physiology and molecular genetics, and while McKusick is no doubt correct to say that geneticists accept that there is no single normal, ideal, or perfect genome, this does not mean that individual DNA sequences are not constituted as normal or abnormal based on their functional significance or that entire genomes are not deemed to fall inside or outside of an acceptable range. Indeed, the 1988 OTA report on the HGP recommends the “eugenic use of genetic information … to ensure … that each individual has at least a modicum of normal genes” (p. 85). It is little wonder that many worry that as an increasing number of mutations are identified and tested for, the range of what is considered normal may narrow, with diminished tolerance for those people who lie outside this range. And there can be no reassurance that judgments of health and disease, normality and abnormality, manage to escape normativity by being transported to the level of the genome; instead, they carry with them any social values and cultural biases that are implicated at the higher level. Says critic Ruth Hubbard (in Holloway 1995, p. 50): “I have gone out on a limb by saying that most people in our culture are very judgmental about women who terminate a pregnancy because of sex. How different is that from terminating a pregnancy because of Down syndrome?”
With the HGP reference sequence available as a basis for comparison, attention has shifted to the genetic variation within the species that evolutionist critics accused the project at the outset of ignoring. Humans have been found to be 99.9 percent alike, with common sequence variants occurring every 1000 bases. There is interest in identifying the sites of the genome where variation occurs, the frequency of these differences, and their significance. The social significance attaching to such research was foreseen during the early years of the project. Scientist David Baltimore predicted that the HGP would reveal that the belief that “we are all equal, all the same” is a myth: “We are going to have to come to terms with the fact that we are all born with different talents and tendencies” (in Cooper 1994, p. 320). Similarly, philosopher Marc Lappé (1994) raised the possibility that the HGP could reveal group differences—with sequences localized to particular groups or varying in frequency among groups—and that any such differences in the genetic lottery would raise significant ethical implications for health care and social policy.[17] But the conceptualization of this variation also presents challenges—for example, in distinguishing between normal and abnormal genetic variation (Gannett 2003a), or drawing population boundaries in the constitution of individual versus group differences (Gannett 2003b). Pharmaceuticals are the most powerful engine driving post-HGP diversity research, and though “personalized medicine” was touted as a benefit of the HGP, en route, a detour via the study of group genetic differences has been taken. For example, the International HapMap Project, in order to compile a map adequately dense with SNP markers to permit the identification of genes implicated in common diseases and drug responses, sampled the DNA of four populations (European-Americans in Utah, Yoruba in Ibadan, Nigeria, Japanese in Tokyo, and Han Chinese in Beijing).[18] Likely due to lessons learned from the difficulties experienced by the Human Genome Diversity Project (see Reardon 2004), attempts were made to involve representatives of these groups in the planning of research through “community engagement” and “community consultation.” These efforts raise conceptual questions not only about the relations between what are ostensibly distinct social and biological groups (Gannett 2003b, Juengst 1998), but what makes a “community” (Davis 2000). Now that “group” genetic differences have become of interest to more than just evolutionary biologists and population geneticists, impetus is provided to longstanding debates about whether race is biologically real or socially constructed and more recent ones concerning the appropriateness of the use of racial categories in biomedical research (Gannett 2005; Root 2003).
Various HGP proponents told us that we would discover “our human essence” in the genome. According to Dulbecco (1986), “the sequence of the human DNA is the reality of our species” (p. 1056); Gilbert is quoted as saying “sequencing the human genome is like pursuing the holy grail” (in Lee 1991, p. 9); on the topic of his decision to dedicate three percent of HGP funds to ELSI, Watson writes: “The Human Genome Project is much more than a vast roll call of As, Ts, Gs, and Cs: it is as precious a body of knowledge as humankind will ever acquire, with a potential to speak to our most basic philosophical questions about human nature, for purposes of good and mischief alike” (with Berry 2003, p. 172).
There are theological worries about a genetic reductionism that suggests that we are no more than our smallest material parts—the bits of DNA that make up the genome. For example, Leon Kass, chairman of the President's Council on Bioethics from 2001 to 2005, decries, with arrival of “the age of genetic technology,” “the erosion, perhaps the final erosion, of the idea of man as noble, dignified, precious or godlike, and its replacement with a view of man, no less than of nature, as mere raw material for manipulation and homogenization” (2002, p. 138). Collins, an evangelical Christian, doesn’t share such worries; he is quoted in the Los Angeles Times as saying: “God is not threatened by all this. I think God thinks it's wonderful that we puny creatures are going about the business of trying to understand how our instruction book works, because it's a very elegant instruction book indeed” (Gosselin 2000). Of course, this particular religious world view is countered by an evolutionist one held by other scientists. Gilbert's “holy grail” is not so holy after all; he believes that the HGP reveals our place amidst the interconnectedness of all life forms: “The data base of the human genome, coupled with our knowledge of the genetic makeup of model organisms, promises to reveal patterns of genes and to show us how we ourselves are embedded in the sweep of evolution that created our world” (1992, p. 97).
A more secular philosophical concern about essentialism is tied to longstanding debates in philosophy of biology about species (see Erefshefsky 1992). Gilbert (1992) foresaw from the HGP a DNA-based definition of Homo sapiens: “At the end of the genome project, we will want to be able to identify all the genes that make up a human being. For example, we will compare the sequences of the human and the mouse and be able to determine the genes that define a mammal by this comparison…. So by comparing a human to a primate, we will be able to identify the genes that encode the features of primates and distinguish them from other mammals. Then, by tweaking our computer programs, we will finally identify the regions of DNA that differ between the primate and the human—and understand those genes that make us uniquely human” (p. 94). While it is true that any stretch of DNA that belongs to all and only humans would be among those differences found by comparing a single human genome sequence to a single nonhuman primate or mouse genome sequence, any “uniquely human” differences could not be distinguished from the others without extensive infra- and inter-specific population studies which are not part of the HGP. Even if such population studies were carried out, Gilbert's assumptions about species essentialism—that species can be defined or represented by properties (in this case, certain stretches of DNA) universally shared among, and particular to, their members—have long been challenged by philosophers of biology (Gannett 2003a; Robert and Baylis 2003). Because evolution is a gradual process where species are constantly undergoing change, Aristotelian (essentialist) definitions of species need to be abandoned; from an evolutionary perspective, in David Hull's (1994) words: “The essence of a particular species is to have no essence” (p. 215). Species should instead be defined as cluster concepts (Hull 1965) or recognized to be individuals (i.e. spatio-temporally restricted, historically contingent particulars) to which organisms belong as parts, and not classes, sets, or natural kinds at all (Ghiselin 1974; Hull 1978).
Besides these attempts to reduce species to beanbags of genes, genetic reductionism enters in attempts to explain cellular or organismal properties solely in terms of genes, or entire organisms in terms of genomes. Gilbert (1992) endorses an essentialism of this sort as well: “The information carried on the DNA, that genetic information passed down from our parents,” he writes, “is the most fundamental property of the body” (p. 83), so much so, in fact, that “one will be able to pull a CD out of one's pocket and say, ‘Here is a human being; it's me!’” (p. 96). The social prevalence of this representation of the genome as the “most fundamental” aspect of the individual means that genetic information has a particularly acute impact on self-identity and self-understanding (Quaid 1994). Another genome scientist Eric Lander (1996) characterizes the HGP as “the 20th century's version of the discovery and consolidation of the periodic table” with the genes “elements” and gene variants responsible for disease susceptibilities “isotopes” (pp. 536–537). The probable social consequence of this beanbag conception of the organism, combined with a concept of genetic disease that relocates the locus of disease from organism to genome, is the direction of technological fixes at the genome (Keller 1994). When these technological fixes include prenatal genetic screening and the possible modification of IVF embryos, it is suggested that genetic reductionism contributes to the commodification of children by making them an instrument of parental desire (Darnovsky 2001). The relevant notion of “reduction” at play here is the explanation of wholes in terms of parts. As Sahotra Sarkar (1998) notes, it is important to distinguish between genetic reductionism and physical reductionism: “From the point of view of physical reductionism, DNA enters the molecular milieu on par with proteins or, for that matter, lipids or any other molecules that are found in living organisms. Physical reductionism does not require any assumption about the primacy of DNA or of genes in the explanation of biological behavior” (p. 174). The reduction of organisms to their genomes by molecular geneticists takes yet further molecular biology's—and, more generally, proximate biology's—reduction of organisms to their constituent physical parts in a way that effaces the contexts (provided by populations and environments) in which organisms develop (Griesemer 1994). Definitions of health and disease attach to organisms and their physiological processes in particular environments and cannot simply be relocated to the level of the genome (Limoges 1994; Lloyd 1994). It is wrong to presume that diseases become more objectively defined entities once they receive a genetic basis since social and cultural values implicated in designations of health and disease can merely become incorporated at the level of the genome, in what counts as a normal or mutant gene.
There is an additional sense in which genetic reductionism is implicated in the HGP, and Gilbert makes reference to this as well. This is the sense, familiar to philosophers of science, of intertheoretic reduction, whereby (usually) higher-level theories are said to be reduced by lower-level ones insofar as these lower-level theories explain/predict the phenomena of the higher level. Gilbert (1992) foresaw that the HGP would furnish the basis for a “theoretical biology” in which from the genome's DNA sequence it would be possible to predict protein sequence, and from protein sequence it would be possible to predict three-dimensional protein structure—either from “first principles” based on energy calculations or from observed structural similarities of the building blocks—and from there make predictions about function, a predictability that Gilbert suggests would extend to individual organisms and their behavior, and might therefore be difficult to accept: “To recognize that we are determined, in a certain sense, by a finite collection of information that is knowable will change our view of ourselves” (p. 96). Gilbert seems to conflate epistemology with ontology, moving from genetic reductionism where genes suffice to predict or explain behavior to genetic determinism where genes are sufficient causes of behavior (see next section), but more importantly, even at the lowest level of organization, his vision faces formidable obstacles. Notwithstanding the protein folding problem and the need to consider gene regulation in order to proceed beyond the level of protein structure, there are significant difficulties in attempting even to predict the linear structure of proteins from sequence data alone: specifically, abilities are limited for recognizing transcription initiation sites and, in the presence of extensive RNA editing, the boundaries between introns and exons and coding and noncoding segments of DNA (Sarkar 1998).[19]
Molecular biology's technological capacity to manipulate the human genome brings society into something of an existentialist predicament. Science has tended to conceive human essence as a fixed object discoverable in nature. But a human essence embedded in manipulable genome is not immutable—it is created, not discovered. There is a very real sense in which in making the difficult choices we face—for example, those involved in prenatal genetic testing and germ-line manipulation—we really are choosing ourselves.[20]
Gilbert's reductionist vision of the sequenced human genome as “the grail” upon which a “theoretical biology” can be founded brings to the fore philosophical questions about genetic determinism. One might ask with Richard Lewontin (2000 [1992], p. 139), however rhetorically: “How is it that a mere molecule [DNA] can have the power of both self-reproduction and self-action, being the cause of itself and the cause of all other things?” Getting straight on genetic determinism is important. There is a long, ignoble history of marshalling ideological justification for unjust and oppressive social and political institutions and structures by appealing to the ostensibly scientific assertion that “human nature is fixed by our genes” (Rose et al. 1984; also Lewontin 1993).[21] Critics of the HGP saw it as placing “the seal of approval from mainstream science” on hereditarianism, favoring nature over nurture like the eugenics of the early to mid-20th century, to promote a “technological fix” for social problems (Allen 1994, p. 164).[22] However, with the HGP nearing completion and the availability of entire genome sequences for numerous organisms supporting the movement from structural to functional genomics, Keller—one such early critic—found that the deterministic and reductionistic assumptions underlying the HGP had actually been undermined by the research in molecular biology the HGP made possible: “What is most impressive to me is not so much the ways in which the genome project has fulfilled our expectations but the ways in which it has transformed them…. Contrary to all expectations, instead of lending support to the familiar notions of genetic determinism that have acquired so powerful grip on the popular imagination, these successes pose critical challenges to such notions” (2000, p. 5).
Yet, DNA is still portrayed as fundamental: in a public lecture held in celebration of the completion of the HGP, Collins characterized the HGP as “an amazing adventure into ourselves, to understand our own DNA instruction book, the shared inheritance of all humankind.”[23] While virtually all biologists disavow genetic determinism today, it is not always so clear what exactly they are denying. Jonathan Kaplan (2000) identifies three different ways in which claims about genetic determinism might be understood: (i) as “complete information” where everything about us is viewed as predictable based on our genes; (ii) as “intervention is useless” where traits are said to be impervious to environmental changes; and (iii) as traits that are in some sense primarily, even if not wholly, genetic. Kaplan argues that when biologists disavow genetic determinism it is (ii) they have in mind (with phenylketonuria—PKU—frequently used as an example).[24] According to Kaplan, (i) is easily seen to be “trivially false,” and therefore not worth disavowing—yet, this resembles Laplacian determinism's concern with predictability, and as we have seen, Gilbert seems to be making such a claim. Despite their disavowals of genetic determinism, Kaplan finds that biologists often adhere to (iii); however, the basis for the primacy of genes remains to be understood. Questions about genetic determinism and Collins’ representation of the sequenced human genome as “our own DNA instruction book”—which suggests an asymmetry between genetic and nongenetic causes—need to be approached at several different levels: cellular, organismal, and societal.[25]
At the cellular level, the book is said to contain “the genetic instructions for the entire repertoire of cellular components” (Collins et al. 2003, p. 3). This genetic determinism at the cellular level is sustained by metaphors of Weismannism and DNA as “code” or “master molecule” (Griesemer 1994; Keller 1994). DNA is accorded causal priority over other cellular components in a couple of ways. One way is to treat DNA as temporally prior. This may be in a physical sense: Weismannism assumes that intergenerational continuity exists only for germ cell nuclei whereas somatic cells and germ cell cytoplasm arise anew in each generation. It may also be in the sense of a point of origin for the transfer of information: the central dogma of molecular biology, which represents a 1950s reformulation of Weismannism in terms of information theory, asserts that information travels unidirectionally from nucleic acids to protein, and never vice versa. The chief difficulty for these claims of temporal priority is of the chicken-and-egg variety: nucleic acids need proteins and other cellular components to make proteins (Smith 1992). Although it is fully accepted that the fertilized ovum contains the cytoplasmic contribution of at least the maternal germ cell, there persists a tendency in developmental genetics to focus on cytoplasmic (mitochondrial) DNA and to ignore the role of cytoplasmic proteins. It is also contentious whether amongst the cell's components only nucleic acids can be said to transmit information: for some philosophers, genetic coding plays a theoretical role at least at this cellular level (Godfrey-Smith 2000); for others, genetic coding is merely (and misleadingly) metaphorical, and all cellular components are potential bearers of information (Griffiths 2001; Griffiths and Gray 1994; Sarkar 1996). An additional way in which DNA is accorded causal priority lies in its treatment as ontologically prior: this is exemplified in Watson's description of DNA as “the most golden of molecules” (in Bodmer and McKie 1994, p. 10). Causal asymmetry provides a possible reason for privileging DNA on an ontological basis: provided all cellular components necessary for protein synthesis are present, modification of the DNA sequence may be followed by a predictable and specifiable change in protein sequence, but the opposite will not occur. This difference could be conceived in terms of the Aristotelian distinction between formal and efficient causation and the accompanying metaphysical preferences for form over matter and mind over body that are deeply embedded in western philosophy. Keller (2000) describes how, in the discourse of “gene action” which arose between the mid-1920s and 1960s and culminated in Francis Crick's “central dogma,” “the gene was bestowed with the properties of materiality, agency, life, and mind” and rendered “[p]art physicist's atom and part Platonic soul” (p. 47).[26]
At the level of the organism, talk of genetic coding and the asymmetry between genetic and nongenetic causes such talk conveys, even when countenanced at the cellular level, are deemed less acceptable (Godfrey-Smith 2000). New research in functional genomics may well lead to less deterministic accounts even of so-called single gene disorders. For these, the concepts of penetrance and expressivity operate in ways which accommodate the one-one genetic determinist model where the mutation is necessary and/or sufficient for both the presence of the condition and confounding patterns of phenotypic variability. But the severity of even a fully penetrant condition like Huntington's disease seems to depend on not just genetic factors like the number of DNA repeats in the mutation but epigenetic factors like the sex of the parent who transmitted the mutation (Ridley et al. 1991).
At the level of individuals in society, when we consider complex conditions to which both genetic and environmental differences contribute—for example, psychiatric disorders or behavioral differences—gene-centrism persists. The April 1998 cover of Life captures the reader's attention: “WERE YOU BORN THAT WAY? Personality, temperament, even life choices. New studies show it's mostly in your genes.” Leading scientists have said similar things. At the outset of the HGP, Watson told us: “we used to think our fate is in our stars. Now we know, in large measure, our fate is in our genes” (in Jaroff 1989). Post-HGP, Watson seems unaffected by the changes that have so impressed Keller. While he introduces the recent book Behavioral Genetics in the Postgenomic Era by stating confidently that “with the arrival of the human DNA genome sequence and its attendant list of human genes, the experimental procedures will soon be on hand to finally settle the long contentious nature-nurture arguments” (p. xxii), the question seems already settled for him in his assertions that “children come into the world with fixed personalities” and “effective remedies for socially inappropriate behaviors” will best be carried out at the molecular level (in Plomin et al. 2003, p. xxii).
But notice the waffle words used by Watson and on the cover of Life: “in large measure” and “mostly in your genes.” Everyone is an interactionist these days, in some sense of “interaction.” Genes and environment, or nature and nurture, are recognized both to be necessary for development: by themselves, genes can’t determine or do anything. Yet, theorists still seem to give the nod to one or the other, suggesting that it is mostly genes or mostly the environment, mostly nature or mostly nurture, that make us what we are. This implies that it is possible to apportion the relative contributions of each. Gilbert (1992) suggests this in his dismissal of a more simplistic version of genetic determinism: “We must see beyond a first reaction that we are the consequences of our genes; that we are guilty of a crime because our genes made us do it; or that we are noble because our genes made us so. This shallow genetic determinism is unwise and untrue. But society will have to wrestle with the questions of how much of our makeup is dictated by the environment, how much is dictated by our genetics, and how much is dictated by our own will and determination” (pp. 96–97). However, the assertion that the relative contributions of genes and environment, nature and nurture, can be apportioned in this way is misleading if not outright false. As Lewontin argued in his classic paper on heritability, it is impossible to infer causal relations from the analysis of variance. The only legitimate exception is where there is “perfect or nearly perfect additivity between genotypic and environmental effects so that the differences among genotypes are the same in all environments and the differences between environments are the same for all genotypes” (1974, p. 408).[27] Contrary to Watson's assertion, the replacement of quantitative with molecular genetic techniques cannot resolve the nature-nurture controversy because of the same problem that affects heritability measures: the context-dependence of genes as causes given the nonadditivity of gene-gene and gene-environment interactions. Recent work in developmental systems theory (DST) which undermines any such attempts to apportion causal responsibility in organismal development makes clear why: traits are jointly determined by multiple causes, each context-sensitive and contingent (Griffiths and Gray 1994; Griffiths and Knight 1998; Oyama 1985; Oyama et al. 2001; Robert 2004).[28]
When genetics enter philosophical debates about freedom and determinism, questions about moral and legal responsibility are central: if the genes a person happens to inherit can be said in some sense to determine her actions, is it legitimate to praise, blame, reward, or punish that person? Retributivism pulls in opposing directions: genetic predisposition to violent or criminal acts may suggest “a volitional disability that makes blame inappropriate” or “a permanence that invites blame” (Wasserman 2001, p. 304). The HGP is unlikely to enlighten or complicate these longstanding debates, however (Baron 2001). The thesis of universal causation (no uncaused events) and its implications for freedom are unaffected by genetics: it makes no sense to claim that all events have genetic causes, and were it to turn out that certain events—i.e. human behaviors or actions—have genetic causes, these pose no different a threat for freedom than their nongenetic counterparts. In addition, genetic causes of behavior are likely to be tendencies or predispositions which do not necessitate their effects: for incompatibilists, genetic and nongenetic causes are jointly responsible for behaviors, and genetic determinism adds nothing to the challenge determinism already poses for freedom; for compatibilists, since a person's behavioral genetic tendencies or predispositions do not compel her to act in a certain way, they are no different than nongenetic (biological or environmental) tendencies or predispositions not of a person's own making. Of course, the general public—in voting booths, on juries, etc.—may be swayed more by genetic explanations given beliefs in genetic determinism fuelled by media reports of apparent discoveries of genes for this or that behavior.
The gene is a “cultural icon”: in popular culture, from movies to cartoons to Dear Abby, quite apart from its biological and medical contexts, the gene has become “a symbol, a metaphor, a convenient way to define personhood, identity, and relationships in socially meaningful ways” (Nelkin and Lindee 1995, p. 16). Hardly a week goes by when we do not hear about a newly discovered gene for one thing or another. “Geneticization” is a term used to describe this phenomenon marked by an increasing tendency to reduce human differences to genetic ones (Lippman 1991).[29] This tendency is accompanied by worries of critics that embracing a reductionist approach to medicine that conceives of human health and disease in wholly molecular or genetic terms individualizes these and detracts attention from our shared social and physical environments and the role of toxins, fast food, poverty, lack of access to health care, etc. (Nelkin and Tancredi 1989; Hubbard and Wald 1993). One of the justifications for spending several billion dollars on human genome research is the belief that genes are key determinants of not only rare Mendelian diseases like Huntington's disease or cystic fibrosis but common multi-factorial conditions like cancer, depression, and heart disease. In Watson's words: “Some call New Jersey the Cancer State because of all the chemical companies there, but in fact, the major factor is probably your genetic constitution” (in Cooper 1994, p. 326).
Writes an early critic of the HGP: “Without question, it was the technical prowess that molecular biology had achieved by the early 1980s that made it possible even to imagine a task as formidable as that of sequencing what has come to be called ‘the human genome.’ But it was the concept of genetic disease that created the climate in which such a project could appear both reasonable and desirable” (Keller 1992, p. 293). Given that the development of any trait involves the interaction of both genetic and nongenetic factors, on what bases can genes be privileged as causes in order to claim that a particular disease or nondisease trait is “genetic” or caused by a “genetic susceptibility” or “genetic predisposition”? Does it make sense for HGP proponents like Bodmer to characterize even smoking-induced forms of cancer as genetic? “Cancer, scientists have discovered, is a genetic condition in which cells spread uncontrollably, and cigarette smoke contains chemicals which stimulate those molecular changes” (Bodmer and McKie 1994, p. 89).[30] From the outset, we need to distinguish between genes conceived as causes of a trait's appearance in a given individual (“x is a gene for trait y in organism z” or “My three-pack-a-day Aunt Viv must have the gene that causes cancer”) and genes as causes of differences in traits among individuals (“x is a gene for trait y in population z” or “Lots of people in my family smoke, but only Aunt Viv and Cousin Sal seem to have inherited the gene for cancer”).[31]
The logical interrelatedness of cause and effect—that is, whether a condition is necessary and/or sufficient for a given event to occur—is the approach taken to defining what makes a condition “genetic” in individuals. A strong sense of “genetic disease” is recognized when the genetic factor is both necessary and sufficient for the disease to arise “regardless of environment” (Wulff 1984), or when the genetic factor is sufficient for the disease to present “in all known environments” (Kitcher 1996)—this latter definition recognizes that, in some cases, a disease may have nongenetic as well as genetic origins (since the genetic factor is sufficient but not necessary). “Genetic susceptibility” is defined as an increased probability of disease in all known (strong sense) or some (weak sense) environments (Kitcher 1996). Note that ceteris paribus clauses referring to an assumed background of necessary, though not sufficient, genetic and environment factors are required by these definitions. Just as striking a match causes it to ignite only if it is dry and in the presence of oxygen, as we saw in the previous section, genes don’t do anything alone. This is the first of three ways in which genetic explanations are context-dependent.
Adopting a population-based approach to genetic causation, where differences in genes are understood to explain differences in traits and not traits themselves, replaces the need for ceteris paribus clauses because they rely on the actual distribution of the necessary genetic and nongenetic background factors in specific populations. The case can be made that the first approach is indebted to the second, and that one never explains a property of an object tout court but only in relation to a reference class of an object or objects that lack the property (but share the necessary background factors). Writes Germund Hesslow (1983), “all explanations of individual facts of the form Fa—that is, where an object a has a certain property F—involve a comparison with other objects which lack the property in question” (p. 91). No trait can be labeled “genetic” in any absolute sense, but only relative to a specific population. For example, lactose intolerance is considered to be a genetic condition in northern European populations where ingestion of milk products is common and lactase deficiency rare, whereas in African populations, where ingestion of milk products is rare and lactase deficiency common, it is considered to be an environmental condition (Hesslow 1984). This is the second way in which genetic explanations are context-dependent.[32]
The third, and final, way in which genetic explanations are context-dependent is that they are a function of the present state of knowledge. Huntington's disease is deemed a genetic condition on both the individual and population accounts: a single mutant gene is necessary, and arguably sufficient given necessary (and standard) background conditions, for symptoms to appear in a given person; the presence and absence of disease symptoms in members of the population is accounted for in terms of the presence and absence of the mutation. This is nevertheless an epistemically relative claim. Once the relevant gene is mapped and sequenced, the mechanisms by which genetic and nongenetic factors interact to produce symptoms of the disease remain to be understood. Such causal knowledge is often obtained through the experimental manipulation of conditions beyond “normal” limits, and what conditions are exploited as possible causes in the laboratory and what conditions are kept constant as necessary background, along with pragmatic decisions about how research efforts should be expended more generally, are influenced by clinical and social, as well as scientific, contexts (Gannett 1999).[33]
Behind philosophical attempts to seek objective, nonevaluative foundations for designations of diseases as “genetic” or “environmental” lie positivist assumptions that theoretical understanding furnishes the basis for rational action. One concern with geneticization and the trend to label an increasing number of diseases and conditions “genetic” is that this provides normative support for directing future research and therapeutic interventions in particular ways, that is, at the level of the genome (Cranor 1994). Watson's (1992) colorful metaphor makes this normative support explicit: “Ignoring genes is like trying to solve a murder without finding the murderer. All we have are victims” (p. 167). But this is fallacious reasoning, as the context-dependence of genetic explanations shows. We might instead understand geneticization to be the consequence of an increased capacity to manipulate DNA in the laboratory and (potentially) the clinic and not an advancement in theoretical understanding. Genetic explanations, on such a view, are pragmatic: there is a practical context in which genes are singled out as causes not only because they are amenable to technological control but because they are increasingly perceived to be more tractable than their nongenetic counterparts and therefore the best means to a variety of ends (Gannett 1999).
Many of the model organisms chosen for the HGP had already enjoyed illustrious careers in the history of genetics: the fruitfly D. melanogaster was the organism that started it all in T. H. Morgan's lab at Columbia University in the 1910s, ushering in the era known today as classical genetics; with discoveries of spontaneous mutation and recombination in the 1940s, the bacterium E. coli helped to take genetics molecular, serving also as host for the phage studied by Max Delbrück's group; the nematode worm C. elegans was Sydney Brenner's choice to model the development of the nervous system in the mid-1960s at Cambridge University.[34] It was, in fact, these histories that recommended them: “the experimental organisms that became ‘model organisms’ were not selected and constructed mainly on the basis of principles of universality or even typicality of their biological characteristics and processes, though it was hoped that many features would prove to be shared or common to other organisms, particularly humans. Instead they were primarily chosen for ease of experimental tractability and due to the availability of some background information on basic genetic composition and relation to phenotype” (Ankeny 2001, pp. S253-S254).
Philosophical questions arise about the senses in which these various organisms serve as “models.” Potentially, models may embody a range of characteristics: as typical or representative; as ideal or perfect; as convenient, tractable, or manipulable; as homologous (conserved evolutionarily); as analogous; as exemplars; as abstractions. Models may also be used in a variety of ways: to model disease processes; to model normal processes; as structural models; as type organisms representative of the species or higher phylogenetic level; as heuristic tools; as mathematical devices.[35] In a recent article examining researchers’ use of the flowering plant Arabidopsis thaliana as a model organism, Sabina Leonelli (2008) points out that models can be abstract (vs. concrete) in different ways: absolutely, in terms of their sense perceptibility; or relatively, in terms of their physical meaning with respect to the phenomena represented or the range of phenomena they are taken to represent. She then shifts the philosophical focus from models themselves to modeling practices: abstraction, for example, becomes a component of the activity of producing a model rather than solely an attribute of the model. This approach emphasizes the need to attend not just to the relationship between model and phenomenon modeled but the material, social, and institutional settings and varied commitments of researchers.
Rachel Ankeny (2000, 2001) considers several ways in which organisms might plausibly be considered models. She suggests that mapped and sequenced genomes for these various HGP model organisms serve as “descriptive models.” A genome reference sequence is a model because it is, first, “an idealized, abstract entity constructed from the natural organism” and, second, “‘model’ is used to indicate a promissory note about this organism providing a framework for pursuing explanatory questions and ultimately serving as a prototype for understanding more complex organisms” (Ankeny 2000, p. S267). Genome reference sequences are descriptive because they are “constructed largely without motivation by hypotheses to be tested or traditional explanatory questions” but rather as preliminary to such work (Ankeny 2000, p. S267). In this, they are similar to earlier efforts to use wiring diagrams for C. elegans to model neural structure: these diagrams were based on data from several worms but were presented as canonical; the worms were “wild type” and presumed to exhibit species-typical structure; the diagrams served as a tool to investigate abnormalities (Ankeny 2000). Nonhuman genome reference sequences become tools as their corresponding organisms are used as experimental models for understanding basic biological processes common to many species or diseases processes found in humans—for example, by knocking out genes in mice. Here, analogical reasoning is at work; “[m]odels in this sense of the term seem to provide what might be termed strong causal analog models” (Ankeny 2001, p. S255). According to Kenneth Schaffner (1998a), this is quite typical of biological explanation: unlike physicists, biologists frame explanations “around a few exemplar subsystems in specific organisms … used as (interlevel) prototypes to organize information about other similar (overlapping) models” (p. 278).
Monod famously once said that what is true of E. coli is true of the elephant, but just as famously, this proved not at all to be the case in moving from prokaryotic to eukaryotic gene regulation. More recently, biologist Bruce Alberts writes: “we can say with confidence that the fastest and most efficient way of acquiring an understanding of ourselves is to devote an enormous effort trying to understand … relatively ‘simple’ organisms” (in Schaffner 1998a, p. 277). What inspires such confidence when a simple organism like C. elegans with its 302 neurons and repertoire of behaviors (movement in response to touch and chemical stimuli, egg laying, and mating) is used to model human behavior in all its complexity? Most importantly, the model organism must be representative of the systems being modeled: genomic sequences must be similar in model and modeled organisms; there must be a known cause-effect relationship between the model organism's genotype and phenotype; there cannot be any “causally relevant disanalogies” between model and modeled organisms, for example, due to differences in complexity (Ankeny 2001, p. S257). Schaffner (1998b) examines molecular geneticists’ use of C. elegans as a behavioral model. Even in these simple organisms, relations between genes, neurons, and behaviors are complex (many-many), with one gene-one behavior associations rare exceptions and their intervening causal chains yet to be understood. While such complexity is to be expected in humans with their more complicated nervous systems, Schaffner believes that there may be a small number of single gene effects on behavior where these genes are highly homologous and strongly conserved—hence, the usefulness of simple models like C. elegans combined with others for investigating basic mechanisms and psychiatric disorders.
The model organism approach faces challenges, however. As Schaffner recognizes, model organisms are also idealizations: organisms are selected for features not generalizable even to close relatives like rapid development, short generation time, small adult size, and insensitivity to environmental variation (Wimsatt 1998); strains are inbred to remove genetic diversity. Context-sensitivity diminishes expectations that similar mechanisms operate in simple and complex systems; multiple realizeability creates doubts that similar explanations will be found across taxa (Wimsatt 1998). Evolution is a branching process; as Richard Burian (1993) emphasizes: “At (virtually?) all levels of the biological world—including the biochemical—it is an open question how general the findings produced by the use of a particular organism are” (p. 365). Consequently, support for theoretical hypotheses requires the experimental findings to be placed within a comparative and evolutionary framework attentive to how widely the relevant nucleotide sequences and traits are distributed phylogenetically: “detailed knowledge of (historical) biological contingencies constrains—and ought to constrain—the evaluation of experimental work in biology and the knowledge claims based on that work” (Burian 1993, p. 366).[36]