Deciphering the language of transcription factors

A new method identifies the precise binding sites of transcription factors - proteins that regulate the production of other proteins - with 10 times the accuracy of its predecessors

Transcription factors are proteins that bind to DNA to promote or suppress protein production. Since almost all diseases involve disruption of the protein-production process, transcription factors are promising biological targets for drugs - and could even serve as drugs themselves.

But there are likely thousands of transcription factors in humans, each of which might bind to the genome at tens of thousands of different locations. Previously, there was no cost-effective way to figure out exactly where transcription factors bind - which exact DNA letters in a given stretch of genome each of them attaches to. Biologists thus relied on approximate methods to identify the general vicinity of binding sites.

In the August issue of the online journal PLoS Computational Biology, a team of researchers from MIT's Computer Science and Artificial Intelligence Laboratory presented a new analytic technique that identifies binding sites with much greater accuracy. As a consequence, the researchers were able to infer previously unknown relationships among transcription factors, which could provide clues to the roles they play in biological processes.

The researchers initially tested their technique on two sets of experimental data, which they say represent both ''relatively easy and difficult cases'' for analysis. In the easy case, their new technique identified the precise locations at which transcription factors bound to the genome with more than 90 per cent accuracy, while the accuracy of existing techniques was about 10 per cent or less. In the difficult case, the new method was more than 55 per cent accurate, compared to about 5 per cent for existing techniques.

The leading method for determining how transcription factors behave in living cells is to chop up the DNA from millions of cells and use protein antibodies to extract the fragments that have a particular transcription factor attached to them.