The amount of data available to climate-change researchers has grown rapidly in the past few years, and the trend is likely to continue. This growth of data has two components: First, advances in data collection and storage technologies permit us to generate data faster (e.g., from sources such as ice cores). Second, advances in the Web and related technologies permit us to easily access data produced by others. By itself, the first component holds the potential to significantly boost scientific discovery. This potential is amplified many fold by the second component. The amplification is not only quantitative, due to the availability of a larger quantity of data generated by others, but also qualitative, due to the benefits of integrating data from multiple disciplines (e.g., integrating ice-core data with data on the activities and health of prehistoric communities). There are already several examples of the kinds of discoveries enabled by such an integrated view of available data [1].

However, the potential benefits of the above integrated approach are currently difficult to realize. Although researchers are able to integrate disparate data by carefully and painstakingly studying them, they are limited in their ability to do so with the large volumes of data now available. In other words, the conventional method of integrating data, in which all integration decisions are made by a human, simply does not scale up.

Certainly, there are many important aspects of the data integration and comprehension process that we cannot realistically expect to automate in the near future. These are aspects that require deep background knowledge, scientific training, and creativity. There are also, however, very many aspects of this process that may be partly or fully automated with a focused effort on extending current body of work on data integration and data mining. It is these latter aspects that are the target of this work. By automating as much of the process as we can, we hope to enable researchers to better focus their time and energy on aspects that are truly deserving of such attention.