The Nature of Data Used in Climate Research

From ice cores to ocean sediments, from arid deserts to tropical rainforests, climate research seeks to understand the interaction between Earth’s natural forces and its complex interlinked ecosystems. It illuminates what drives changes in the climate, how those changes will affect human society and wildlife, and what can be done to mitigate harmful trends.

The nature of the underlying data is enormously varied: collected from land-based stations, ships and buoys in the ocean, aircraft flying over the Earth’s surface, and satellites orbiting the planet, from drilling into ancient polar ice, to sampling air to see how concentrations of greenhouse gases have changed over time, to examining tree rings and looking at the composition of seawater and terrestrial vegetation. As a result, data processing is incredibly labor-intensive and the quality of the data can vary dramatically across locations and time.

In order to make use of this vast array of observations, scientists process and reprocess them using various techniques, resulting in a number of different datasets (see Dee et al. 2016). The three most prominent long-term datasets are gridded station-based surface temperature datasets produced by NASA’s Goddard Institute for Space Studies, the University of East Anglia Climatic Research Unit and the U.S. National Centers for Environmental Information, and all produce new versions of these datasets periodically, reflecting the accumulation of more data as well as methodological innovations (see Morice et al. 2021).

Observational data can also be used to test scientific hypotheses, such as the hypothesis that humans are altering the planet’s climate by increasing the concentration of greenhouse gases in the atmosphere, which warms the planet’s surface. Such tests are often referred to as “attribution” studies, and employ both qualitative and quantitative methods, including basic physical reasoning and statistical analyses. Some attribution studies employ GCM/ESMs to simulate what would happen over a period of time if one or more causal factors were changing, with other factors held constant; the simulated pattern of change that emerges is sometimes called a “fingerprint” of this causal factor.