cc: "Thorne, Peter" <peter.thorne@metoffice.gov.uk>, Leopold Haimberger <leopold.haimberger@univie.ac.at>, Karl Taylor <taylor13@llnl.gov>, Tom Wigley <wigley@cgd.ucar.edu>, John Lanzante <John.Lanzante@noaa.gov>, "'Susan Solomon'" <ssolomon@al.noaa.gov>, Melissa Free <Melissa.Free@noaa.gov>, peter gleckler <gleckler1@llnl.gov>, "'Philip D. Jones'" <p.jones@uea.ac.uk>, Thomas R Karl <Thomas.R.Karl@noaa.gov>, Steve Klein <klein21@mail.llnl.gov>, carl mears <mears@remss.com>, Doug Nychka <nychka@ucar.edu>, Gavin Schmidt <gschmidt@giss.nasa.gov>, Frank Wentz <frank.wentz@remss.com>, ssolomon@frii.com
date: Fri, 25 Apr 2008 12:55:28 -0700
from: Ben Santer <santer1@llnl.gov>
subject: Re: [Fwd: JOC-08-0098 - International Journal of Climatology]
to: Steve Sherwood <steven.sherwood@yale.edu>

<x-flowed>
Dear Steve,

Thanks very much for these comments. They will be very helpful in 
responding to Reviewer #1.

Best regards,

Ben

Steve Sherwood wrote:
> Ben,
> 
> It sounds like the reviewer was fair.  If (s)he misunderstood or didn't 
> catch things, the length of the manuscript may have been a factor, and I 
> am definitely sympathetic to that particular complaint.
>>
>> CONCERN #1: Assumption of an AR-1 model for regression residuals.
> I also am no great fan of AR1 models parameterized by the lag-1 
> variance, because if the time step is too short they can go greatly 
> astray at longer lags where it matters.  But if you choose the 
> persistence parameter to give a good fit to the entire autocorrelation 
> function--i.e. make sure it decays to 1/e at about the right lag--it 
> should work fine.  I suggest trying this to see whether it changes 
> anything much, and if not, leaving it at that.  I think that for simply 
> generating confidence intervals on a scalar measure there is no reason 
> to go to higher-order AR processes, as a matter of principle.
> 
>> CONCERN #2: No "attempt to combine data across model runs."
> The only point of doing this would seem to be to test whether there are 
> any individual models that can be falsified by the data.  It is a 
> judgment call whether to go down this road--my judgment would be, no, 
> that is a subject for a model evaluation/intercomparison paper.  The 
> question at issue here is whether GCMs or the CMIP3 forcings share some 
> common flaw; the implication of the Douglass et al paper is that they 
> do, and that future climate may therefore venture outside the range 
> simulated by GCMs.  The appropriate null hypothesis is that the observed 
> data record could with nonnegligible probability have been produced by a 
> climate model---not that it could be reproduced by every climate model.
> 
>>
>> The Reviewer seems to be arguing that the main advantage of his 
>> approach #2 (use of ensemble-mean model trends in significance 
>> testing) relative to our paired trends test (his approach #1) is that 
>> non-independence of tests is less of an issue with approach #2. I'm 
>> not sure whether I agree. Are results from tests involving GFDL CM2.0 
>> and GFDL CM2.0 temperature data truly "independent" given that both 
>> models were forced with the same historical changes in anthropogenic 
>> and natural external forcings? The same concerns apply to the high- 
>> and low-resolution versions of the MIROC model, the GISS models, etc.
> (S)he seems to have been referring to the fact that all models are 
> tested with the same data.  I also fail to see how any change in 
> approach would affect this issue.
>>
>> I am puzzled by some of the comments the Reviewer has made at the top 
>> of page 3 of his review. I guess the Reviewer is making these comments 
>> in the context of the pair-wise tests described on page 2. Crucially, 
>> the comment that we should use "...the standard error if testing the 
>> average model trend" (and by "standard error" he means DCPS07's 
>> sigma{SE}) IS INCONSISTENT with the Reviewer's approach #3, which 
>> involves use of the inter-model standard deviation in testing the 
>> average model trend.
> I also am puzzled.  The standard error is appropriate if you have a 
> large ensemble of observed time series, but not if you have only one.  
> Computing the standard error of the model mean is useless when you have 
> no good estimate of the mean of the real world to compare it to.  The 
> essential mistake of DCPS was to assume that the single real-world time 
> series was a perfect estimator of the mean.
>>
>> And I disagree with the Reviewer's comments regarding the superfluous 
>> nature of Section 6. The Reviewer states that, "when simulating from a 
>> know (statistical) model... the test statistics should by definition 
>> give the correct answer. The whole point of Section 6 is that the 
>> DCPS07 consistency test does NOT give the correct answer when applied 
>> to randomly-generated data!
> Maybe there is a more compact way to show this?
>> In order to satisfy the Reviewer's curiosity, I'm perfectly willing to 
>> repeat the simulations described in Section 6 with a higher-order AR 
>> model. However, I don't like the idea of simulation of synthetic 
>> volcanoes, etc. This would be a huge time sink, and would not help to 
>> illustrate or clarify the statistical mistakes in DCPS07.
> I wouldn't advise any of that.
> 
> -SS
> 


-- 
----------------------------------------------------------------------------
Benjamin D. Santer
Program for Climate Model Diagnosis and Intercomparison
Lawrence Livermore National Laboratory
P.O. Box 808, Mail Stop L-103
Livermore, CA 94550, U.S.A.
Tel:   (925) 422-2486
FAX:   (925) 422-7675
email: santer1@llnl.gov
---------------------------------------------------------------------------- 

</x-flowed>
