From: Phil Jones <p.jones@uea.ac.uk>
To: santer1@llnl.gov
Subject: Re: Our d3* test
Date: Thu May 29 15:13:35 2008

    Ben,
      Hopefully the email to Francis will help to resolve this quickly. It would seem
    from Tom's email that the new d3 approaches the expected result for largish N.
    A test ought to do this as Tom says.
      You'll need to change the response a little as although you may have misinterpreted
    Francis, you may not have Rev 1.
       Hope this is out of your hair as soon as feasible.
      Climate Audit are an odd crowd. McIntyre is claiming that he spotted the problem
    in 1945 in the marine data - and refers to a blog page from late last year! We were
    already on to it by then and he didn't really know what he was talking about anyway.
    Maybe this paper and the various press coverage (especially Dick Reynold's N&V as he
    spelt it out) will allow them to realize that what is really robust in all this is the
    land record. I suspect it won't though.  One day they may finally realize the concept
    of effective spatial degrees of freedom. John Christy doesn't understand this!
    Cheers
    Phil
   At 04:46 29/05/2008, you wrote:

     Dear folks,
     Just wanted to let you know that I did not submit our paper to IJoC. After some
     discussions that I've had with Tom Wigley and Peter Thorne, I applied our d1*, d2*, and
     d3* tests to synthetic data, in much the same way that we applied the DCPS07 d* test and
     our original "paired trends" test (d) to synthetic data. The results are shown in the
     appended Figure.
     Relative to the DCPS07 d* test, our d1*, d2*, and d3* tests of
     hypothesis H2 yield rejection rates that are substantially
     closer to theoretical expectations (compare the appended Figure with Figure 5 in our
     manuscript). As expected, all three tests show a dependence on N (the number of
     synthetic time series), with rejection rates decreasing to near-asymptotic values as N
     increases. This is because the estimate of the model-average signal (which appears in
     the numerator of d1*, d2*, and d3*) has a dependence on N, as does the estimate of
     s{<b_{m}>}, the inter-model standard deviation of trends (which appears in the
     denominator of d2* and d3*).
     The worrying thing about the appended Figure is the behavior of d3*. This is the test
     which we thought Reviewers 1 and 2 were advocating. As you can see, d3* produces
     rejection rates that are consistently LOWER (by a factor of two or more) than
     theoretical expectations. We do not wish to be accused by Douglass et al. of devising a
     test that makes it very difficult to reject hypothesis H2, even when there is a
     significant difference between the trends in the model average signal and the
     'observational signal'.
     So the question is, did we misinterpret the intentions of the Reviewers? Were they
     indeed advocating a d3* test of the form which we used? I will try to clarify this point
     tomorrow with Francis Zwiers (our Reviewer 2).
     Recall that our current version of d3* is defined as follows:
     d3* = ( b{o} - <<b{m}>> ) / sqrt[ (s{<b{m}>} ** 2) + ( s{b{o}} ** 2) ]
     where
     b{o}      = Observed trend
     <<b{m}>>  = Model average trend
     s{<b{m}>} = Inter-model standard deviation of ensemble-mean trends
     s{b{o}}   = Standard error of the observed trend (adjusted for
     autocorrelation effects)
     In Francis's comments on our paper, the first term under the square root sign is
     referred to as "an estimate of the variance of that average" (i.e., of <<b{m}>> ). It's
     possible that Francis was referring to sigma{SE}, which IS an estimate of the variance
     of <<b{m}>>. If one replaces s{<b{m}>} with sigma{SE} in the equation for d3*, the
     performance of the d3* test with synthetic data is (at least for large values of N) very
     close to theoretical expectations. It's actually even closer to theoretical expectations
     than the d2* test shown in the appended Figure (which is already pretty close). I'll
     produce the "revised d3*" plot tomorrow...
     The bottom line here is that we need to clarify with Francis the exact form of the test
     he was requesting. The "new" d3* (with sigma{SE} as the first term under the square root
     sign) would lead to a simpler interpretation of the problems with the DCPS07 test. It
     would show that the primary error in DCPS07 was in the neglect of the observational
     uncertainty term. It would also simplify interpretation of the results from Section 6.
     I'm sorry about the delay in submission of our manuscript, but this is an important
     point, and I'd like to understand it fully. I'm still hopeful that we'll be able to
     submit the paper in the next few days. Many thanks to Tom and Peter for persuading me to
     pay attention to this issue. It often took a lot of persuasion...
     With best regards,
     Ben
     ----------------------------------------------------------------------------
     Benjamin D. Santer
     Program for Climate Model Diagnosis and Intercomparison
     Lawrence Livermore National Laboratory
     P.O. Box 808, Mail Stop L-103
     Livermore, CA 94550, U.S.A.
     Tel:   (925) 422-2486
     FAX:   (925) 422-7675
     email: santer1@llnl.gov
     ----------------------------------------------------------------------------

   Prof. Phil Jones
   Climatic Research Unit        Telephone +44 (0) 1603 592090
   School of Environmental Sciences    Fax +44 (0) 1603 507784
   University of East Anglia
   Norwich                          Email    p.jones@uea.ac.uk
   NR4 7TJ
   UK
   ----------------------------------------------------------------------------

