Research Topics
The research plan is based on developing domain-specific
solutions to general data management problems, with
the hypothesis that by carefully limiting the scope of
operations to the needs of climate-change research,
we can devise and implement effective and efficient
tools. We may classify the research topics into four
groups.
-
Data integration
- The goal of this component is to
provide an integrated logical view of datasets
relevant to climate-change research, shielding
the researcher from low-level details such as the
data formats, storage location, database schema,
document formats, varying nomenclature, etc.
-
Data mining
- Once we have an integrated view
of data, we will develop a suite of tools
that researchers can use to identify interesting
patterns and features in the integrated data.
-
Provenance
- When consulting an integrated view
of data, it is important to know of the source
from which a displayed value or fact is derived.
While this task poses few challenges when the
integration is simple (e.g., the displayed value
being the average of a set of values from different
sources), it is much more complex with the
interaction uses more sophisticated operations
(which are necessary for effective integration),
such as extraction of numerical data from
text, semantic mapping of terms, and schema
transformations.
-
Workflows
- If we visualize the lifecycle of data, from
the point of origin to the scientific discovery or
other product that they enable, they typically go
through several steps that include both human
and automated processing. While our earlier
components are designed to ease and enhance the
individual steps, the goal of this component is
to develop methods for effectively managing the
entire collection of steps (workflow).
Our initial work has focused on the integration and
interactive analysis of data, as described next.