By intensely and systematically comparing the human X
chromosome to genetic information from chimpanzees, rats
and mice, a team of scientists from the United States and
India has uncovered dozens of new genes, many of which are
located in regions of the chromosome already tied to
disease.
Regions of the X chromosome, one of the two sex
chromosomes (Y is the other), have been linked to mental
retardation and numerous other disorders, but finding the
particular genetic abnormalities involved has been
difficult.
The team's accomplishment, described in the April
issue of Nature Genetics, should speed research into
diseases associated with the X chromosome and encourage
similar analyses of other chromosomes.
"To our knowledge, this is the first time critical
analysis of an entire chromosome has been done by a group
that wasn't involved in determining the chromosome's
genetic sequence," said study leader Akhilesh Pandey, an
assistant professor in the
McKusick-Nathans Institute of Genetic
Medicine at Johns Hopkins and chief scientific adviser
to the Institute of Bioinformatics in Bangalore, India,
where the analyses took place. "We didn't start small. We
wanted to prove that complete annotation can be done, and
done in a way that lets you find new and unexpected
things."
For 18 months, 26 Indian scientists pored through the
publicly available sequence of the X chromosome
(information generated by the Wellcome Trust Sanger
Institute in England and others) to identify genes and
other important parts of its DNA.
But unlike other efforts, the team didn't just "mine
the data" by using computers to search for known patterns
in the genetic sequence. Instead, Pandey decided they would
look for similarities between the human X chromosome's
protein-encoding instructions and corresponding regions in
the mouse. Regions that were identical or nearly so were
then examined carefully by IOB biologists.
"We didn't want to start out by saying that genes had
to look a certain way," Pandey said. "So our only initial
assumption was that if a genetic region is important and
codes for a protein, the sequence will be conserved at the
protein level. Thus, even if the genetic sequence is
different here and there, the protein sequence could still
be the same."
Essentially, the researchers took advantage of the
redundancy inherent in the genetic code. DNA's four
building blocks — A, T, C and G — act as
instructions for proteins in select three-block sets. These
three-block sets each "code" for just one of the 20
possible protein building blocks, or amino acids, but some
of the sets code for the same amino acid. For example, the
DNA sequences TTGAGGAGC and CTACGATCA are quite different,
but both specify the same three amino acids —
leucine, arginine and serine, in that order.
"Instead of telling the computer what to look for, we
let nature tell the computer what was important," Pandey
said. "When you align the protein-encoding instructions of
the human and mouse, the genes jump out at you."
In the regions that were the same between species, the
scientists found 43 new "gene structures" that encode
proteins. Some of the newly identified genes sit in regions
long tied to X-linked mental retardation syndromes, which
appear only in boys, or other disorders. Quite remarkably,
Pandey said, almost half the new genes don't look like any
previously known genes, nor do they look like each
other.
"These would not be found any other way because no one
knew to look for them," he said. "No one had ever
identified any aspect of their sequences as being
important."
The IOB scientists and the U.S. members of the team
experimentally investigated a few of the new genes to
confirm the comparative approach's validity. Their results,
as well as data created by other scientists since the
U.S-India team started working, confirm the existence of
some of the newly identified genes. The team's work also
showed that some so-called pseudogenes on the X chromosome
are actually expressed, or transcribed, which contradicts
the widespread idea that they are functionless.
"We're really trying to show that complete annotation
of chromosomes can be done, and that doing it this way
means you can find things you don't expect to find," Pandey
said. "It's long, painstaking work, but it's worth it."
Pandey said he hopes that researchers will take the
initiative to annotate sequenced genetic information and
validate regions used in their work.
The research at the Institute of Bioinformatics was
funded internally. Co-first authors from Johns Hopkins are
Ramars Amanchy, Salil Sharma, Jose Badano, Suraj Peri,
Nicholas Katsanis and Pandey. Sharma is also affiliated
with the IOB.
The terms of Pandey's arrangement as chief scientific
adviser to the Institute of Bioinformatics are being
managed by Johns Hopkins in accordance with its conflict of
interest policies.