Researchers identify biochemical functions for most of the human genome

Only about 1 per cent of the human genome contains gene regions that code for proteins, raising the question of what the rest of the DNA is doing. Scientists have now begun to discover the answer: About 80 percent of the genome is biochemically active, and likely involved in regulating the expression of nearby genes, according to a study from a large international team of researchers.

Manolis Kellis, an associate professor in the Department of Electrical Engineering and Computer Science and CSAIL principal investigator.

The consortium, known as ENCODE (which stands for ''Encyclopedia of DNA Elements''), includes hundreds of scientists from several dozen labs around the world. Using genetic sequencing data from 140 types of cells, the researchers were able to identify thousands of DNA regions that help fine-tune genes' activity and influence which genes are expressed in different kinds of cells.

Just as the sequencing of the human genome helped scientists learn how mutations in protein-coding genes can lead to disease, the new map of non-coding regions should provide some answers on how mutations in the regulatory elements lead to diseases such as lupus and diabetes, says Manolis Kellis, an associate professor of computer science at MIT, an associate member of the Broad Institute and an author of a paper describing the findings in the Sept. 5 online edition of Nature.

''Humans are 99.9 per cent identical to each other, and you only have one difference in every 300 to 1,000 nucleotides,'' Kellis says. ''What ENCODE allows you to do is provide an annotation of what each nucleotide of the genome does, so that when it's mutated, we can make some predictions about the consequences of the mutation.''

Kellis, who leads MIT's Computational Biology Group, is one of the principal investigators involved in the Nature paper. The ENCODE collaboration is publishing about two dozen additional papers this week detailing the new results.

Mapping non coding DNA