From: Ben Santer <santer1@llnl.gov>
To: Carl Mears <mears@sonic.net>
Subject: Re: Our d3* test
Date: Mon, 02 Jun 2008 09:32:01 -0700
Reply-to: santer1@llnl.gov
Cc: Steven Sherwood <Steven.Sherwood@yale.edu>, "Thorne, Peter" <peter.thorne@metoffice.gov.uk>, Leopold Haimberger <leopold.haimberger@univie.ac.at>, Karl Taylor <taylor13@llnl.gov>, Tom Wigley <wigley@cgd.ucar.edu>, John Lanzante <John.Lanzante@noaa.gov>, "'Susan Solomon'" <ssolomon@al.noaa.gov>, Melissa Free <Melissa.Free@noaa.gov>, peter gleckler <gleckler1@llnl.gov>, "'Philip D. Jones'" <p.jones@uea.ac.uk>, Thomas R Karl <Thomas.R.Karl@noaa.gov>, Steve Klein <klein21@mail.llnl.gov>, carl mears <mears@remss.com>, Doug Nychka <nychka@ucar.edu>, Gavin Schmidt <gschmidt@giss.nasa.gov>, Frank Wentz <frank.wentz@remss.com>

<x-flowed>
Dear Carl,

This issue is now covered in the version of the manuscript that I sent 
out on Friday. The d2* and d3* statistics have been removed. The new d1* 
statistic DOES involve the standard error of the model average trend in 
the denominator (together with the adjusted standard error of the 
observed trend; see equation 12 in revised manuscript). The slight irony 
here is that the new d1* statistic essentially reduces to the old d1* 
statistic, since the adjusted standard error of the observed trend is 
substantially larger than the standard error of the model average trend...

With best regards,

Ben
Carl Mears wrote:
> Hi 
> 
> I think I agree (partly, anyway) with Steve S.
> 
> I think that d3* partly double counts the uncertainty.
> 
> Here is my thinking that leads me to this:
> 
> Assume we have a "perfect model".  A perfect model means in this context
>    1.  Correct sensitivities to all forcing terms
>    2.  Forcing terms are all correct
>    3.  Spatial temporal structure of internal variability is correct.
> 
> In other words, the model output has exactly the correct "underlying" 
> trend, but
> different realizations of internal variability and this variability has 
> the right 
> structure.
> 
> We now run the model a bunch of times and compute the trend in each case.
> The spread in the trends is completely due to internal variability.
> 
> We compare this to the "perfect" real world trend, which also has 
> uncertainty due
> to internal variability (but nothing else).
> 
> To me either one of the following is fair:
> 
> 1.  We test whether the observed trend is inside the distribution of 
> model trends.  The uncertainty in the 
> observed trend is already taken care of by the spread in modeled trends, 
> since the representation of
> internal uncertainty is accurate.
> 
> 2.  We test whether the observed trend is equal to the mean model trend, 
> within uncertainty.  Uncertainty here is
> the uncertainty in the observed trend s{b{o}}, combined with the 
> uncertainty in the mean model trend (SE{b{m}}.  
> 
> If we use d3*, I think we are doing both these at once, and thus double 
> counting the internal variability
> uncertainty.  Option 2 is what Steve S is advocating, and is close to 
> d1*, since SE{b{m}} is so small.  
> Option 1 is d2*.  
> 
> Of course the problem is that our models are not perfect, and a 
> substantial portion of the spread in 
> model trends is probably due to differences in sensitivity and forcing, 
> and the representation
> of internal variability can be wrong.  I don't know how to separate the 
> model trend distribution into 
> a "random" and "deterministic" part.  I think d1* and d2* above get at 
> the problem from 2 different angles, 
> while d3* double counts the internal variability part of the 
> uncertainty. So it is not surprising that we 
> get some funny results for synthetic data, which only have this kind of 
> uncertainty.  
> 
> Comments?
> 
> -Carl
> 
>    
> 
> 
> On May 29, 2008, at 5:36 AM, Steven Sherwood wrote:
> 
>>
>> On May 28, 2008, at 11:46 PM, Ben Santer wrote:
>>>
>>> Recall that our current version of d3* is defined as follows:
>>>
>>> d3* = ( b{o} - <<b{m}>> ) / sqrt[ (s{<b{m}>} ** 2) + ( s{b{o}} ** 2) ]
>>>
>>> where
>>>
>>> b{o}      = Observed trend
>>> <<b{m}>>  = Model average trend
>>> s{<b{m}>} = Inter-model standard deviation of ensemble-mean trends
>>> s{b{o}}   = Standard error of the observed trend (adjusted for 
>>>                autocorrelation effects)
>>
>> Shouldn't the first term under sqrt be the standard deviation of the 
>> estimate of <<b(m)>> -- e.g., the standard error of <b(m)> -- rather 
>> than the standard deviation of <b(m)>?  d3* would I think then be 
>> equivalent to a z-score, relevant to the null hypothesis that models 
>> on average get the trend right.  As written, I think the distribution 
>> of d3* will have less than unity variance under this hypothesis.
>>
>> SS
>>
>>
>> -----
>> Steven Sherwood                                       
>> Steven.Sherwood@yale.edu <mailto:Steven.Sherwood@yale.edu>
>> Yale University                                               ph: 203 
>> 432-3167
>> P. O. Box 208109                                             fax: 203 
>> 432-3134
>> New Haven, CT 06520-8109                 
>> http://www.geology.yale.edu/~sherwood
>>
>>
>>
>>
>>
>>
> 


-- 
----------------------------------------------------------------------------
Benjamin D. Santer
Program for Climate Model Diagnosis and Intercomparison
Lawrence Livermore National Laboratory
P.O. Box 808, Mail Stop L-103
Livermore, CA 94550, U.S.A.
Tel:   (925) 422-2486
FAX:   (925) 422-7675
email: santer1@llnl.gov
---------------------------------------------------------------------------- 

</x-flowed>

