Procedure for updating the databases underlying the
CRU high-resolution grids.
Tim Mitchell, 25.06.03, revised 30.3.04

1. Transform additional station data into CRU time-series
  (.cts) format. The .cts subroutines are in crutsfiles.f90.
   The .cts files should be stored under ~/data/stnmon
   The .f90 programs described are under ~/code/linux/cruts or
      ~/code/alpha/cruts . The programs should be easily portable from
      the Alphas to Linux or vice versa. 
   If the additional data is in the style of ...
(a) GHCNv2 or CLIMAT (the Phil Jones format)
     i.e. one file per time interval,
     use (or modify) makecruts.f90.
(b) MCDW or CLIMAT (original) or CLIMAT (AOPC-Offenbach) format
     i.e. one file per year and month,
     use or modify reformat.f90. The MCDW data must have already
     gone through a two stage process - see the readme file in
     ~/data/stnmon/mcdw/_raw
(c) Jian's Chinese data from Excel
     i.e. a single ASCII table per variable, with one
     line per station/year, use (or modify) fromexcel.f90.
(d) the CRU time-series file format (but not quite right)
     it may be easily convertible using option 1
     in opcruts.f90.

2. The size of the arrays required in the entire procedure can be
   substantially reduced by subdividing the additional station
   data by continent at this stage. Simply subdivide the initial
   raw data into a set of raw files, by continent. Reduced arry 
   sizes mean programs that run more quickly and reliably.
   
3. Clean up the metadata in the .cts headers. This is done 
  using the information in the master metadata file, which is
  the most recently dated file in /cru/tyn1/f709762/cruts/master. 
  Run cleanmeta.f90 on the transformed CRU ts file. The sole
   purpose of this program is to make the header line as
   accurate as possible, without adding new information. Thus
   the following steps are included:
     (A) The original station code is stored as a 7-digit code
         in both the main and 'old' code columns.
     (B) The station and country labels are made all-caps and
         any hyphens (etc.) are removed.
     (C) Impossible lat/lon/elv values are setting to missing.
     (D) The country label is checked, and made consistent,
         using the master country list. 
     (E) The lat/lon are checked to ensure that they are
         reasonable, using the country information. Each country
         is given a centroid and and a 3-sigma distance. Stns
         lying outide this radius are flagged.
     (F) If a corresponding source code file (.src) is available,
         it too is checked, else one is created using information
         about the source supplied by the user. 

4. Check the homogeneity of the additional .cts file. This is 
   done using reference time-series. This is far too complex to
   describe here. Run homogiter.f90 on the cleaned .cts file.

   The program runs iteratively, to maximise the proportion of
   the original data that can be checked and placed in homog.  
   A .cts (and .src) file with best estimates is stored in
   ~/cruts/homog, and stns that could not be
   checked are stored in ~/cruts/retry 
   
   This program may fire up 2 subsidiary xterms on start-up. 
   (If given the option, decline.)
   These can be killed forcibly if necessary once the program has
   finished executing, but not before. One xterm is simply a view
   of the log file, which provides a rough progress meter. The other
   is the idl window, running an IDL program that awaits for prompts
   through a pipe (stored in /cru/scratch2/f709762) to tell it when
   and where there is a data file for it to read and plot, in (f).

5. Merge the additional .cts file with the existing
   database (.dtb and .dts). The most recent version of each
   database is in ~/data/cruts/database, the latest version of
   the master station code file is in ~/data/cruts/master, and the latest 
   accessions file is in ~/data/cruts/accession 

   Use updatedtb.f90 to merge the new file in ~/cruts/homog with
   the existing database. Do this immediately, before going through
   the whole process for a new region, to ensure that as much info as
   possible is avilable for creating reference series for the new region. 

6. OPTIONAL AT THIS STAGE
   Add normals to the database file. The best time to do this is when all
   the new information has been absorbed into the database (steps 1-5) and
   the database is about to be used for gridding. The program addnorm.f90
   is used to add the 1961-90 normal in a header line in the .dtb files. This
   is then used by the program that calculates anomalies.  

   Where possible, a normal is calculated from the station series itself. 
   Where this is not possible through insufficient data, an attempt is made to
   estimate what the normal would have been if it had been measured. This 
   estimate is made using the reference series construction software used in
   step 4. Neighbouring stations are used to construct a reference 
   time-series that includes 1961-90 wherever possible, and this 
   reference time-series is used to calculate a 1961-90 normal for that stn,
   and stores it in the 'normal' line in the database file.

7. ADDITIONAL CAPABILITY, REQUIRED FOR GRIDDING
   Transform the database file (with normals added under step 6, if possible)
   from absolute values to anomaly values, prior to gridding, using anomdtb.f90.
   The key output option from this program is (3), which dumps the anomalies
   to a set of .txt files that can be read by the idl gridding software. 
   The other output options provide info - (1) produces the same data in the .cts
   format, (2) only summarises the outputs through data counts, (4) summarises
   the original data discarded as duplicates.

   The usual options to select are:
     normal period = 1961 - 1990
     missing percent permitted = 25
     stdevs to reject = 3
     duplicate stns = 8km

8. ADDITIONAL CAPABILITY, USE AS REQUIRED
   The program opcruts.f90 is the home for all the little useful routines for
   manipulating the .cts and .dtb (and .src and .dts) files. Option 1 can be used
   to convert from one of these formats to another. The other options can be
   exploresd to find out what they do.
