News Release
Office of News and Information
Johns Hopkins University
3003 N. Charles Street, Suite 100
Baltimore, Maryland 21218-3843
Phone: (410) 516-7160 / Fax (410) 516-5251
|
September 24, 1999
FOR IMMEDIATE RELEASE
MEDIA CONTACT:
Michael Purdy
(410) 516-7906
|
Hopkins-led Team Developing New Ways to Handle Data
Deluge
The fountain of information at the heart of science has become a
fire hose, and an increase to river-like volumes is on the
way.
The CERN particle collider in Geneva, Switzerland, for instance,
currently
produces more than 1 petabyte, or about 1,000,000,000,000,000
bytes, of
information every year. The words and other text in all the books
in the
Library of Congress, in contrast, add up to only about
one-thousandth of
that information, or one terabyte (1 trillion bytes). And CERN is
just one
example of the tremendous information-generating powers of modern
science.
"Our current ways of doing science are very much based on the
concept that
our data sets are so small that we can sort of ‘eyeball' the
whole thing
and locate the interesting data," says Alexander Szalay
(pictured at right), Alumni
Centennial Professor of Physics and Astronomy at The Johns
Hopkins
University. "And with the data sets we are getting in an
increasing number
of areas of science, this is just not going to be feasible. So we
have to
do something drastically different."
Szalay leads an interdisciplinary team of researchers developing
new ways
to store, access and search large volumes of data. Participants
in the
Hopkins-led collaborative include scientists from Cal Tech, the
U.S.
Department of Energy's Fermilab and Microsoft Corp. They have
been
working together for several years already; this month they
will
receive the first formal support for their efforts in a 3-year,
$2.5
million grant from the National Science Foundation.
"This problem is of course much bigger than astronomy or particle
physics,"
Szalay says. "I think this is actually becoming more a problem
for the
whole society. We are choking on information, and we have to sort
out the
relevant from the irrelevant. So I think what we're doing is a
very
interesting test bed for experimenting with new technologies that
could
have broader applications elsewhere."
Particle physicists were among the first to have to deal with
huge
quantities of information. Their work to manage that information
led to the
development of tools and techniques that found uses beyond the
realm of the
physics lab, notes Aihud
Pevsner (pictured at left), Jacob P. Hain Professor of Physics and Astronomy at
Johns
Hopkins and a member of the collaborative.
"To help work with large data sets at CERN, Tim Berners-Lee
invented in
1989 what later became the World Wide Web," says Pevsner. "He did
it
because the tools that they had at the time were inadequate for
the
distribution of the data sets they were working with."
Pevsner, a particle physicist, will be one of 500 American
physicists
working at the Large Hadron Collider (LHC) at CERN, the world's
most
powerful particle collider. The LHC is expected to produce
100-petabyte
data sets.
Szalay is a researcher for the
Sloan
Digital Sky Survey (SDSS), an effort he calls the "cosmic
genome
project," which will map everything visible in several large
chunks of the
northern and southern sky. SDSS starts next year, and before it
is over he
estimates that it will produce 40 terabytes of data with a
2-terabyte catalog.
Such a high volume of data reduces the chances that astronomers
will miss
gathering important information, but it also makes it harder to
find that
information among what's been gathered. "When you have so much
data that it
chokes you, you have to keep breaking it up into smaller chunks
until it no
longer chokes you," Szalay says.
Developing better ways to break down large quantities of
information is the
first major component of research under the NSF grant. The SDSS
information, for example, might be broken up both by the area of
the sky
that the data comes from and by the color of the objects observed
in the
sky. The challenge, though, is to make sure that this process of
partitioning the data improves the scientists' abilities to see
important
patterns and irregularities in the data.
"We want to try to make it possible for data that will be of
interest to
the same kinds of queries to be ‘located' close together so they
are easier
to find," says Ethan
Vishniac, director of the Johns Hopkins Center for
Astrophysical
Sciences, also a collaborative member.
Another concern is that these huge chunks of information will
probably be
stored at geographically different locations. Some
next-generation science
projects involve so much information, according to Szalay, that
it cannot
be brought to researchers across computer networks. Arranging
ways to
simultaneously access data in these different locations without
ever
bringing it together in one database, a technique called
"distributed
processing," is the second major component of research supported
by the NSF
grant.
The third component of the NSF grant will improve a technique
called
"parallel" querying. This involves searching in different
locations at the
same time, not unlike sending out an army of librarians to search
or work
in several different, large libraries at once. Researchers will
strive to make these search agents smarter and more independent
by improving the software they use.
To test their efforts at dealing with these challenges,
researchers will
use data from the SDSS, from the CERN Particle Collider and from
GALEX, a
sky-mapping survey that covers the same areas as SDSS but
measures
different forms of radiation.
"Data sets that are astronomical in every sense of that word are
great test
beds for computer scientists to experiment with to develop novel
techniques
for visualizing, organizing, and querying information," says
Michael
Goodrich, Hopkins professor of computer science and a member of
the
collaborative.
Additional collaborators include physicist Harvey Newman,
research
scientist Julian Bunn and astronomer Chris Martin of Caltech;
physicist
Thomas Nash of Fermilab; computer scientist Jim Gray of
Microsoft; and
astronomers Ani
Thakar and
Peter Kunszt of Hopkins.
The $2.5 million NSF grant is one of 31 announced by NSF as part
of a new
effort to support "knowledge and distributed intelligence"
projects. The
grants are focused on efforts to apply new computer technology
across
multidisciplinary areas in science and engineering.
Johns Hopkins University news releases can be found on the
World Wide Web at
http://www.jhu.edu/news_info/news/
Information on automatic e-mail delivery
of science and medical news releases is available at the
same address.
|
Go to
Headlines@HopkinsHome Page
|