Are the CRU data “suspect” (hacked emails) ? An objective assessment by www.realclimate.org
15/12/2009 www.realclimate.org Filed under: Climate ScienceCommunicating ClimateIPCCInstrumental…. Record Reporting on climateTutorialsskeptics— eric @ 15 December 2009 Kevin Wood, Joint Institute for the Study of the Atmosphere and Ocean, University of Washington, Eric Steig, Department of Earth and Space Sciences, University of Washington
In the wake of the CRU e-mail hack, the suggestion that scientists have been hiding the raw meteorological data
that underpin global temperature records has appeared in the media. For example, New York Times science writer John Tierney wrote,
“It is not unreasonable to give outsiders a look at the historical readings and the adjustments made by experts… Trying to prevent skeptics from seeing the raw data
was always a questionable strategy, scientifically.”
The implication is that something secretive and possibly nefarious has been afoot in the way data
have been handled, and that the validity of key data products (especially those produced by CRU) is suspect on these grounds. This is simply not the case.
It may come as a surprise to some that the first compilation of world-wide meteorological data was published
by the Smithsonian Institution in 1927, long before anthropogenic climate change emerged as an important issue (Clayton et al., 1927).
This volume is still widely available on the library shelf as are updates that were issued periodically. This same data collection provided
the foundation for the World Monthly Surface Station Climatology, 1738-cont. As has been the case for many years, any interested party
can access this from UCAR (http://dss.ucar.edu/datasets/ds570) and other electronic data archives.
Now, it is well known that these data are not perfect. Most records are not as complete
as could be wished. Errors periodically creep in and have to be identified and weeded out. But beyond the simple errors
of the key-entry type there are inevitably discontinuities or inhomogeneities introduced into the records due to changes
in observing practices, station environment, or other non-meteorological factors. It is very unlikely there is any historical record in existence unaffected by this issue.
Filtering inhomogeneities out of meteorological data is a complicated procedure.
Coherent surface air temperature (SAT) datasets like those produced by CRU also require a procedure
for combining different (but relatively nearby) record fragments. However, the methods used to undertake
these unavoidable tasks are not secret: they have been described in an extensive literature over many decades
(e.g. Conrad, 1944; Jones and Moberg, 2003; Peterson et al., 1998, and references therein). Discontinuities may
nevertheless persist in data products, but when they are found they are published (e.g. Thompson et al., 2008).
Furthermore, it is a fairly simple exercise to extract the grid-box temperatures from a CRU dataset
—CRUTEM3v for example—and compare it to raw data from World Monthly Surface Station Climatology.
CRU data are available from http://www.cru.uea.ac.uk/cru/data/temperature. One should not expect a perfect match
due to the issues described above, but an exercise like this does provide a simple way to evaluate the extent to which
the CRU data represent the underlying raw data. In particular, it would presumably be of interest to know whether
the trends in the CRU data are very different than the trends in the raw data, since this could be taken as indication that the methods used by CRU result in an overstatement of the evidence for global warming.
As an example, we extracted a sample of raw land-surface station data and corresponding CRU data. These were arbitrarily
selected based on the following criteria: the length of record should be ~100 years or longer, and the standard reference period
1961–1990 (used to calculate SAT anomalies) must contain no more than 4 missing values. We also selected stations
spread as widely as possible over the globe. We randomly chose 94 out of a possible 318 long records.
Of these, 65 were sufficiently complete during the reference period to include in the analysis. These were split into two groups
of 33 and 32 stations (Set A and Set B), which were then analyzed separately.
Results are shown in the following figures. The key points: both Set A and Set B indicate warming
with trends that are statistically identical between the CRU data and the raw data (>99% confidence);
the histograms show that CRU quality control has, as expected, narrowed the variance (both extreme positive and negative values removed).
Comparison of CRUTEM3v data with raw station data taken from World Monthly Surface Station Climatology.
On the left are the mean temperature anomalies from each pair of randomly chosen times series. On the right
are the distribution of trends in those time series and their means and standard errors. (The standard error provides
an estimate of how well the sampling of ~30 stations represents the full global data set assuming a Gaussian distribution.)
Note that not all the trends are for identical time periods, since not all data sets are the same length.
Conclusion: There is no indication whatsoever of any problem with the CRU data. An independent study
(by a molecular biologist it Italy, as it happens) came to the same conclusion using a somewhat different analysis.
None of this should come as any surprise of course, since any serious errors would have been found and published already.
It’s worth noting that the global average trend obtained by CRU for 1850-2005, as reported by the
IPCC (http://www.ipcc.ch/pdf/assessment-report/ar4/wg1/ar4-wg1-chapter3.pdf), 0.47 0.54 degrees/century,*
is actually a bit lower (though not by a statistically significant amount) than we obtained on average with our random sampling of stations.