From: Ben Santer <santer1@llnl.gov>
To: "Thomas.R.Karl" <Thomas.R.Karl@noaa.gov>
Subject: Re: [Fwd: sorry to take your time up, but really do need a   scrub of this singer/christy/etc effort]
Date: Fri, 14 Dec 2007 14:31:15 -0800
Reply-to:  santer1@llnl.gov
Cc: carl mears <mears@remss.com>,  SHERWOOD Steven <steven.sherwood@yale.edu>, Tom Wigley <wigley@cgd.ucar.edu>, Frank Wentz <frank.wentz@remss.com>,  "'Philip D. Jones'" <p.jones@uea.ac.uk>, Karl Taylor <taylor13@llnl.gov>, Steve Klein <klein21@mail.llnl.gov>,  John Lanzante <John.Lanzante@noaa.gov>, "Thorne, Peter" <peter.thorne@metoffice.gov.uk>,  "'Dian J. Seidel'" <dian.seidel@noaa.gov>, Melissa Free <Melissa.Free@noaa.gov>,  Leopold Haimberger <leopold.haimberger@univie.ac.at>, "'Francis W. Zwiers'" <francis.zwiers@ec.gc.ca>,  "Michael C. MacCracken" <mmaccrac@comcast.net>, Tim Osborn <t.osborn@uea.ac.uk>, "David C. Bader" <bader2@llnl.gov>,  'Susan Solomon' <ssolomon@al.noaa.gov>

<x-flowed>
Dear Tom,

As promised, I've now repeated all of the significance testing involving 
  model-versus-observed trend differences, but this time using 
spatially-averaged T2 and T2LT changes that are not "masked out" over 
tropical land areas. As I mentioned this morning, the use of non-masked 
data facilitates a direct comparison with Douglass et al.

The results for combined changes over tropical land and ocean are very 
similar to those I sent out yesterday, which were for T2 and T2LT 
changes over tropical oceans only:

COMBINED LAND/OCEAN RESULTS (WITH STANDARD ERRORS ADJUSTED FOR TEMPORAL 
AUTOCORRELATION EFFECTS; SPATIAL AVERAGES OVER 20N-20S; ANALYSIS PERIOD 
1979 TO 1999)

T2LT tests, RSS observational data: 0 out of 49 model-versus-observed 
trend differences are significant at the 5% level.
T2LT tests, UAH observational data: 1 out of 49 model-versus-observed 
trend differences are significant at the 5% level.

T2 tests, RSS observational data: 1 out of 49 model-versus-observed 
trend differences are significant at the 5% level.
T2 tests, UAH observational data: 1 out of 49 model-versus-observed 
trend differences are significant at the 5% level.

So our conclusion - that model tropical T2 and T2LT trends are, in 
virtually all realizations and models, not significantly different from 
either RSS or UAH trends - is not sensitive to whether we do the 
significance testing with "ocean only" or combined "land+ocean" 
temperature changes.

With best regards, and happy holidays to all!

Ben

Thomas.R.Karl wrote:
> Ben,
> 
> This is very informative.  One question I raise is whether the results 
> would have been at all different if you had not masked the land.  I 
> doubt it, but it would be nice to know.
> 
> Tom
> 
> Ben Santer said the following on 12/13/2007 9:58 PM:
>> Dear folks,
>>
>> I've been doing some calculations to address one of the statistical 
>> issues raised by the Douglass et al. paper in the International 
>> Journal of Climatology. Here are some of my results.
>>
>> Recall that Douglass et al. calculated synthetic T2LT and T2 
>> temperatures from the CMIP-3 archive of 20th century simulations 
>> ("20c3m" runs). They used a total of 67 20c3m realizations, performed 
>> with 22 different models. In calculating the statistical uncertainty 
>> of the model trends, they introduced sigma{SE}, an "estimate of the 
>> uncertainty of the mean of the predictions of the trends". They defined
>> sigma{SE} as follows:
>>
>> sigma{SE} = sigma / sqrt(N - 1), where
>>
>> "N = 22 is the number of independent models".
>>
>> As we've discussed in our previous correspondence, this definition has 
>> serious problems (see comments from Carl and Steve below), and allows 
>> Douglass et al. to reach the erroneous conclusion that modeled T2LT 
>> and T2 trends are significantly different from the observed T2LT and 
>> T2 trends in both the RSS and UAH datasets. This comparison of 
>> simulated and observed T2LT and T2 trends is given in Table III of 
>> Douglass et al.
>> [As an amusing aside, I note that the RSS datasets are referred to as 
>> "RSS" in this table, while UAH results are designated as "MSU". I 
>> guess there's only one true "MSU" dataset...]
>>
>> I decided to take a quick look at the issue of the statistical 
>> significance of differences between simulated and observed 
>> tropospheric temperature trends. My first cut at this "quick look" 
>> involves only UAH and RSS observational data - I have not yet done any 
>> tests with radiosonde datas, UMD T2 data, or satellite results from 
>> Zou et al.
>>
>> I operated on the same 49 realizations of the 20c3m experiment that we 
>> used in Chapter 5 of CCSP 1.1. As in our previous work, all model 
>> results are synthetic T2LT and T2 temperatures that I calculated using 
>> a static weighting function approach. I have not yet implemented 
>> Carl's more sophisticated method of estimating synthetic MSU 
>> temperatures from model data (which accounts for effects of topography 
>> and land/ocean differences). However, for the current application, the 
>> simple static weighting function approach is more than adequate, since 
>> we are focusing on T2LT and T2 changes over tropical oceans only - so 
>> topographic and land-ocean differences are unimportant. Note that I 
>> still need to calculate synthetic MSU temperatures from about 18-20 
>> 20c3m realizations which were not in the CMIP-3 database at the time 
>> we were working on the CCSP report. For the full response to Douglass 
>> et al., we should use the same 67 20c3m realizations that they employed.
>>
>> For each of the 49 realizations that I processed, I first masked out 
>> all tropical land areas, and then calculated the spatial averages of 
>> monthly-mean, gridded T2LT and T2 data over tropical oceans (20N-20S). 
>> All model and observational results are for the common 252-month 
>> period from January 1979 to December 1999 - the longest period of 
>> overlap between the RSS and UAH MSU data and the bulk of the 20c3m 
>> runs. The simulated trends given by Douglass et al. are calculated 
>> over the same 1979 to 1999 period; however, they use a longer period 
>> (1979 to 2004) for calculating observational trends - so there is an 
>> inconsistency between their model and observational analysis periods, 
>> which they do not explain. This difference in analysis periods is a 
>> little puzzling given that we are dealing with relatively short 
>> observational record lengths, resulting in some sensitivity to 
>> end-point effects.
>>
>> I then calculated anomalies of the spatially-averaged T2LT and T2 data 
>> (w.r.t. climatological monthly-means over 1979-1999), and fit 
>> least-squares linear trends to model and observational time series. 
>> The standard errors of the trends were adjusted for temporal 
>> autocorrelation of the regression residuals, as described in Santer et 
>> al. (2000) ["Statistical significance of trends and trend differences 
>> in layer-average atmospheric temperature time series"; JGR 105, 
>> 7337-7356.]
>>
>> Consider first panel A of the attached plot. This shows the simulated 
>> and observed T2LT trends over 1979 to 1999 (again, over 20N-20S, 
>> oceans only) with their adjusted 1-sigma confidence intervals). For 
>> the UAH and RSS data, it was possible to check against the adjusted 
>> confidence intervals independently calculated by Dian during the 
>> course of work on the CCSP report. Our adjusted confidence intervals 
>> are in good agreement. The grey shaded envelope in panel A denotes the 
>> 1-sigma standard error for the RSS T2LT trend.
>>
>> There are 49 pairs of UAH-minus-model trend differences and 49 pairs 
>> of RSS-minus-model trend differences. We can therefore test - for each 
>> model and each 20c3m realization - whether there is a statistically 
>> significant difference between the observed and simulated trends.
>>
>> Let bx and by represent any single pair of modeled and observed 
>> trends, with adjusted standard errors s{bx} and s{by}. As in our 
>> previous work (and as in related work by John Lanzante), we define the 
>> normalized trend difference d as:
>>
>> d = (bx - by) / sqrt[ (s{bx})**2 + (s{by})**2 ]
>>
>> Under the assumption that d is normally distributed, values of d > 
>> +1.96 or < -1.96 indicate observed-minus-model trend differences that 
>> are significant at the 5% level. We are performing a two-tailed test 
>> here, since we have no information a priori about the "direction" of 
>> the model trend (i.e., whether we expect the simulated trend to be 
>> significantly larger or smaller than observed).
>>
>> Panel c shows values of the normalized trend difference for T2LT trends.
>> the grey shaded area spans the range +1.96 to -1.96, and identifies 
>> the region where we fail to reject the null hypothesis (H0) of no 
>> significant difference between observed and simulated trends.
>>
>> Consider the solid symbols first, which give results for tests 
>> involving RSS data. We would reject H0 in only one out of 49 cases 
>> (for the CCCma-CGCM3.1(T47) model). The open symbols indicate results 
>> for tests involving UAH data. Somewhat surprisingly, we get the same 
>> qualitative outcome that we obtained for tests involving RSS data: 
>> only one of the UAH-model trend pairs yields a difference that is 
>> statistically significant at the 5% level.
>>
>> Panels b and d provide results for T2 trends. Results are very similar 
>> to those achieved with T2LT trends. Irrespective of whether RSS or UAH 
>> T2 data are used, significant trend differences occur in only one of 
>> 49 cases.
>>
>> Bottom line: Douglass et al. claim that "In all cases UAH and RSS 
>> satellite trends are inconsistent with model trends." (page 6, lines 
>> 61-62). This claim is categorically wrong. In fact, based on our 
>> results, one could justifiably claim that THERE IS ONLY ONE CASE in 
>> which model T2LT and T2 trends are inconsistent with UAH and RSS 
>> results! These guys screwed up big time.
>>
>> SENSITIVITY TESTS
>>
>> QUESTION 1: Some of the model-data trend comparisons made by Douglass 
>> et al. used temperatures averaged over 30N-30S rather than 20N-20S. 
>> What happens if we repeat our simple trend significance analysis using 
>> T2LT and T2 data averaged over ocean areas between 30N-30S?
>>
>> ANSWER 1: Very little. The results described above for oceans areas 
>> between 20N-20S are virtually unchanged.
>>
>> QUESTION 2: Even though it's clearly inappropriate to estimate the 
>> standard errors of the linear trends WITHOUT accounting for temporal 
>> autocorrelation effects (the 252 time sample are clearly not 
>> independent; effective sample sizes typically range from 6 to 56), 
>> someone is bound to ask what the outcome is when one repeats the 
>> paired trend tests with non-adjusted standard errors. So here are the 
>> results:
>>
>> T2LT tests, RSS observational data: 19 out of 49 trend differences are 
>> significant at the 5% level.
>> T2LT tests, UAH observational data: 34 out of 49 trend differences are 
>> significant at the 5% level.
>>
>> T2 tests, RSS observational data: 16 out of 49 trend differences are 
>> significant at the 5% level.
>> T2 tests, UAH observational data: 35 out of 49 trend differences are 
>> significant at the 5% level.
>>
>> So even under the naive (and incorrect) assumption that each model and 
>> observational time series contains 252 independent time samples, we 
>> STILL find no support for Douglass et al.'s assertion that: "In all 
>> cases UAH and RSS satellite trends are inconsistent with model trends."
>> Q.E.D.
>>
>> If Leo is agreeable, I'm hopeful that we'll be able to perform a 
>> similar trend comparison using synthetic MSU T2LT and T2 temperatures 
>> calculated from the RAOBCORE radiosonde data - all versions, not just 
>> v1.2!
>>
>> As you can see from the email list, I've expanded our "focus group" a 
>> little bit, since a number of you have written to me about this issue.
>>
>> I am leaving for Miami on Monday, Dec. 17th. My Mom is having cataract 
>> surgery, and I'd like to be around to provide her with moral and 
>> practical support. I'm not exactly sure when I'll be returning to 
>> PCMDI - although I hope I won't be gone longer than a week. As soon as 
>> I get back, I'll try to make some more progress with this stuff. Any 
>> suggestions or comments on what I've done so far would be greatly 
>> appreciated. And for the time being, I think we should not alert 
>> Douglass et al. to our results.
>>
>> With best regards, and happy holidays! May all your "Singers" be carol 
>> singers, and not of the S. Fred variety...
>>
>> Ben
>>
>> (P.S.: I noticed one unfortunate typo in Table II of Douglass et al. 
>> The MIROC3.2 (medres) model is referred to as "MIROC3.2_Merdes"....)
>>
>> carl mears wrote:
>>> Hi Steve
>>>
>>> I'd say it's the equivalent of rolling a 6-sided die a hundred times, 
>>> and
>>> finding a mean value of ~3.5 and a standard deviation of ~1.7, and
>>> calculating the standard error of the mean to be ~0.17 (so far so
>>> good).  An then rolling the die one more time, getting a 2, and
>>> claiming that the die is no longer 6 sided because the new measurement
>>> is more than 2 standard errors from the mean.
>>>
>>> In my view, this problem trumps the other problems in the paper.
>>> I can't believe Douglas is a fellow of the American Physical Society.
>>>
>>> -Carl
>>>
>>>
>>> At 02:07 AM 12/6/2007, you wrote:
>>>> If I understand correctly, what Douglass et al. did makes the 
>>>> stronger assumption that unforced variability is *insignificant*.  
>>>> Their statistical test is logically equivalent to falsifying a 
>>>> climate model because it did not consistently predict a particular 
>>>> storm on a particular day two years from now.
>>>
>>>
>>> Dr. Carl Mears
>>> Remote Sensing Systems
>>> 438 First Street, Suite 200, Santa Rosa, CA 95401
>>> mears@remss.com
>>> 707-545-2904 x21
>>> 707-545-2906 (fax))
>>
>>
> 
> -- 
> 
> *Dr. Thomas R. Karl, L.H.D.*
> 
> */Director/*//
> 
> NOAAs National Climatic Data Center
> 
> Veach-Baley Federal Building
> 
> 151 Patton Avenue
> 
> Asheville, NC 28801-5001
> 
> Tel:  (828) 271-4476
> 
> Fax:  (828) 271-4246
> 
> Thomas.R.Karl@noaa.gov <mailto:Thomas.R.Karl@noaa.gov>
> 


-- 
----------------------------------------------------------------------------
Benjamin D. Santer
Program for Climate Model Diagnosis and Intercomparison
Lawrence Livermore National Laboratory
P.O. Box 808, Mail Stop L-103
Livermore, CA 94550, U.S.A.
Tel:   (925) 422-2486
FAX:   (925) 422-7675
email: santer1@llnl.gov
---------------------------------------------------------------------------- 
</x-flowed>

