date: Mon, 16 Oct 2000 22:54:31 +0100
from: Tim Osborn <T.Osborn@uea.ac.uk>
subject: progress
to: k.briffa@uea.ac.uk, p.jones@uea.ac.uk

Hi Keith & Phil  (a long one this, as I have an hour to kill!)

We're making slow-ish progress here but it's still definitely v. useful.  I've 
brought them up-to-date with our work and given them reprints.  Mike and Scott 
Rutherford have let me know what they're doing, and I've got a preprint by 
Tapio Schneider describing the new method and there's a partially completed 
draft   paper where they test it using the GFDL long control run (and also the 
perturbed run, to test for the effect of trend and non-stationarities).  The 
results seem impressive - and 'cos they're using model data with lots of 
values set to missing, they can do full verification.  The explained 
verification variances are very high even when they set 95% of all grid-box 
values to missing (leaving about 50 values with data over the globe I think).

In fact the new method (regularized expectation maximization, if that means 
anything to you, which is similar to ridge regression) infills all missing 
values (not just in the climate data), which is interesting for infilling 
climate data from climate data, proxy data from climate data (see below).

As well as the GFDL data, they've also applied the method to the Jones et al. 
data on its own.  The method fills in missing temperatures using the 
non-missing temperatures (i.e., similar to what Kaplan et al. do, or the 
Hadley Centre do for GISST, but apparently better!).  So they have a complete 
data set from 1856-1998 (except that any boxes that had less than about 90 
years of data remain missing, which seems fair enough since they would be 
going too far if they infilled everything).

We're now using the MXD data set with their program and the Jones et al. data 
to see: (i) if the missing data from 1856-1960 in the Jones et al. data set 
can be filled in better using the MXD plus the non-missing temperatures 
compared to what can be achieved using just the non-missing temperatures.  I 
expect that the MXD must add useful information (esp. pre-1900), but I'm not 
sure how to verify it!  The program provides diagnostics estimating the 
accuracy of infilled values, but it's always nice to test with independent 
data.  So we're doing a separate run with all pre-1900 temperatures set to 
missing and relying on MXD to infill it on its own - can then verify, but need 
to watch out for the possibly artificial summer warmth early on.  We will then 
use the MXD to estimate temperatures back to 1600 (not sure that their method 
will work before 1600 due to too few data, which prevents the iterative method 
from converging), and I will then compare with our simpler maps of summer 
temperature.  Mike wants winter (Oct-Mar) and annual reconstructions to be 
tried too.  Also, we set all post-1960 values to missing in the MXD data set 
(due to decline), and the method will infill these, estimating them from the 
real temperatures - another way of "correcting" for the decline, though may be 
not defensible!

They will then try the Mann et al. multi-proxy network with the new method 
(which they've not done till now).  They've given me the full data set, so we 
can do stuff with that later.  I have, I think, all the programs needed for 
his old method, so we could still look at that on our own, but he's not keen 
on spending time on that while I'm here.  I've swapped it for the MXD data set 
(the Hugershoff chronologies, and also the gridded, but uncalibrated, version 
of the Hugershoff chronologies).  The gridded stuff was needed for the 
reconstruction efforts, because the 387 chronologies would all have had equal 
weight and we wanted a simple way to account for clustered groups - the 
gridded version that I made seemed the easiest way, even though that is the 
Osborn et al. paper that is yet to be written!  What conditions do I need to 
place on subsequent use of the MXD chronologies/gridded data?  That (i) we be 
informed of what they're doing with it; (ii) that Osborn & Briffa (and Jones?) 
be co-authors on any subsequent papers, and if the MXD dataset provides the 
core of the paper, then Schweingruber too?  We will all be on the paper that 
comes out of the reconstructions I've just described, but I'm thinking about 
any future stuff they use it for.

I hadn't realised (until just now) that their reconstruction program is so 
slow that it will take about 3-6 days to run each one!  They have about 6-8 
separate processors/machines so we need to get them all various runs going on 
at once.  Even so, results are unlikely to be available to Friday morning (I 
leave Friday midday), or after I've got back home - this looks like being an 
ongoing thing!

No need to reply to all/any of this - just thought I'd bring you up-to-date 
while I had some time to spare.

Cheers

Tim

