cc: t.osborn@uea.ac.uk
date: Wed, 19 Sep 2007 12:52:29 +0100 (BST)
from: "Tim Osborn" <t.osborn@uea.ac.uk>
subject: late review
to: mark.new@ouce.ox.ac.uk

Hi Mark,

sorry for lateness, hope it is useful nevertheless.  Submitted via online
system, but copied here too:

----------
Review of Semenov, Latif and Jungclaus:
"Is the observed NAO variability during the instrumental record unusual?"

This paper compares the observed NAO variability with that simulated
during a 1500-year simulations with a coupled ocean-atmosphere general
circulation model.  Others have reported related comparisons in the past,
finding that at least some recent periods are outside the likely range of
internal variability (whether that range is estimated via empirical
methods or numerical climate models) and typically concluding that there
is probably a contribution from some (natural or anthropogenic or
combined) external forcing in explaining recent observations.  Here
Semenov et al. obtain similar results, namely that the observations
slightly exceed the range of internally-generated climate variability
(perhaps the exceedance is less than from other studies/models), yet they
interpret their results to mean that they "suggest that the observed NAO
variability...including the recent increase can be explained solely by
internal variability".  I don't think criticism of their interpretation
should preclude publication, even though I would interpret them
differently.  If the observations are outside the 95% range of
variability, or, as here, outside the range of the full 1500-yr sample of
variability, but are not far outside that range, then some would simply
conclude that unusual (i.e. externally-forced) variability had been
confidently detected, while others (such as Semenov et al.) might conclude
that there may be very little role for external forcing because internal
variability *might* explain almost all of the observed variations. 
Really, however, it might be better to take into account the likelihood
that strong changes occurred purely by chance during a strongly-forced
period of time.  We might then conclude that internal variability might
explain some, or even a very large part, but not likely all (unless our
models are deficient), of the observed increase in the NAO from the 1960s
to the 1990s, and that there is probably some contribution from external
forcings, though this contribution might be only a small fraction of the
changes.

My recommendation, therefore, is that the manuscript should be published
in GRL, but that it should first be modified to indicate that it isn't
really in any disagreement with previous work -- if should be possible to
do this, while still accommodating the authors' tendency to focus on the
possibility that external forcing may have a limited role to play. 
Certainly, however, the statements that are inconsistent with the results
should be removed, such as the final sentence of the abstract which was
quoted above.  The results are rather difficult to see in Figure 3b, but
the accompanying text states that the observed trend exceeds the range of
simulated trends (and anyway we never do statistical tests on the extremes
of a range, given that they are more sensitive to sampling variability and
if distributions are asymptoting towards infinity then they may not be
bounded, but higher values have vanishingly smaller likelihood).  Readers
who are not familiar with all the work in this area should not go away
with the false impression that the observations are "well within the model
variability" when they actually exceed model variability.

Specific comments:

(1) abstract: "observed multi-decadal NAO variations are well within the
model variability" and also final sentence are misleading since the
observed 1960s-1990s trend is bigger than any 30-yr trend in the model.

(2) abstract: "highly non-stationary behaviour" is wrong, the authors have
not tested to see if the variations are bigger than expected by sampling
variability, so cannot be said to be non-stationary.  Even if
autocorrelated, doesn't mean they are non-stationary: a stationary
autoregressive process still exhibits centennial variability.  Either
remove, reword or do a test to prove it!

(3) abstract: any conclusions regarding contribution of internal
variability and external forcing should have the attached caveat or
condition that they depend on this single model (other studies have used
multiple models).

(4) overall: I couldn't see anywhere a statement regarding the season
being analysed.  Surely it isn't annual-means?  I expect winter-means, but
it must make this clear (e.g. Dec-Jan-Feb, Dec-Jan-Feb-Mar)?  Some studies
use the latter 4-month season and perhaps the strong contribution of March
trends in the NAO would increase the unusualness of the trend if the
current work used only DJF?

(5) Page 6-7: although no signficant differences in *local* SLP
variability are found (via the F-test), the authors have chosen (top of
page 7) to compare the observations version (from HadSLP) that have the
lowest interannual variability (s.d. 5.6 hPa) with the model (s.d. 7.1
hPa) and this difference is statistically significant using the F-test. 
The authors should note this significant difference and consider whether
it affects the results, or whether the observed and simulated NAO series
are normalised using their own standard deviations and hence whether this
significant over-estimation of interannual SLP gradient variability by the
model affects the overall findings?

(6) Page 7, lines 13-16: inconsistent to say that the observed trend
slightly exceeds all trends simulated during 1500 years yet at the same to
say it is "not unusual".  Also this would be the ideal place to give
quantitative values to (a) the strongest observed trend and (b) the
maximum, the 99th and 95th percentiles of the simulated trend
distribution.  How far above the 95th percentile the observations lie will
help to interpret it's unusualness!

(7) Page 8, line 2: see comment (2) above regarding use of
"non-stationary" without an appropriate statistical test.

(8) Page 8, line 10-11: the authors seem to imply that because the
"observed variability is rather similar to the simulated", the previous
rejection of an internally-generated trend at the 95% confidence level
should be rejected.  Not so.  But the authors can of course choose to
focus on the fact that some, or even most, of the trend may be
internally-generated.

(9) Page 8, line 21-22: terminology seems strange here.  Usually it is the
null hypothesis that is rejected, yet surely the null hypothesis was not
that the PDFs are different (as implied here), but that they were the
same.  Is there 85% confidence that the PDFs are not different, or *only*
85% confidence that they the PDFs are different?  I don't believe the
former is correct.
----------




