homogsplit {climatol} R Documentation

## Apply homogen() on overlapping rectangular areas.

### Description

If the number of series is too big to be homogenized all at a time (normally several thousands, depending on the computer resources), this function can homogenize them by splitting the geographical domain in overlapping rectangular areas.

### Usage

homogsplit(varcli, anyi, anyf, xc=NULL, yc=NULL, xo=.5, yo=.38, maponly=FALSE,
suf=NA, nm=NA, nref=c(10,10,4), swa=NA, std=3, ndec=1, dz.max=5,
dz.min=-dz.max, wd=c(0,0,100), snht1=25, snht2=snht1, tol=.02, maxdif=NA,
mxdif=maxdif, force=FALSE, wz=.001, trf=0, mndat=NA, gp=3, ini=NA,
na.strings="NA", maxite=999, vmin=NA, vmax=NA, nclust=100, grdcol=grey(.4),
mapcol=grey(.4), hires=TRUE, expl=FALSE, metad=FALSE, sufbrk='m', tinc=NA,
tz='UTC', cex=1.2, verb=TRUE, x=NA)


### Arguments

 varcli Acronym of the name of the studied climatic variable, as in the data file name. anyi Initial year of the data present in the file. anyf Final year of the data present in the file. xc Vector of X axis coordinates setting the domain splitting meridians. yc Vector of Y axis coordinates setting the domain splitting parallels. xo Overlapping width in the East-West direction. yo Overlapping width in the North-South direction. maponly Do not homogenize. Only draw a map with stations locations and domain partitioning. suf Optional suffix appended with a '-' to the name of the variable in the input files. nm Number of data per year in each station. (Defaults to NA, and then it will be computed from the total number of data). nref Maximum number of references for data estimation. (Defaults to 10 in the detection stages, and to 4 in the final series adjustments). swa Size of the step forward to be applied to the staggered window application of SNHT. If not set (the default), 365 terms (one year) will be used for daily data, and 60 otherwise. std Type of normalization: 1:deviations from the mean, 2:rates to the mean (only for means greater than 1), 3:standardization (subtract the mean and divide by the sample standard deviation). ndec Number of decimal digits to which the homogenized data must be rounded. dz.max Threshold of outlier tolerance, in standard deviations. (5 by default). dz.min Lower threshold of outlier tolerance if different from the higher one. (By default, they will be the same, with opposite signs). wd Distance (in km) at which reference data will weigh half of that of another located at zero distance. (Defaults to c(0,0,100), meaning that no weighting will be applied in the first two stages, and 100 km in the third). snht1 Threshold value for the stepped SNHT window test applied in stage 1. (25 by default. No SNHT analysis will be performed if snht1=0). snht2 Threshold value for the SNHT test when applied to the complete series in stage 2 (same value as snht1 by default). tol Tolerance factor to split several series at a time. The default is 0.02, meaning that a 2% will be allowed for every reference data. (E.g.: if the maximum SNHT test value in a series is 30 and 10 references were used to compute the anomalies, the series will be split if the maximum test of the reference series is lower than 30*(1+0.02*10)=36. Set tol=0 to disable further splits when any reference series has already been split at the same iteration). maxdif Maximum difference of any data item in consecutive iterations. If not set, defaults to half of the data precision (defined by the number of decimals). mxdif Old maxdif parameter (maintained for compatilibility). force Force break even when only one reference is available. (FALSE by default). wz Scale parameter of the vertical coordinate Z. 0.001 by default, which gives the vertical coordinate (in m) the same weight as the horizontal coordinates (internally managed in km). trf By default, data are not transformed (trf=0), but if the data frequency distribution is very skewed, the user can choose to apply a log(x+1) transformation (trf=1) or any root of index trf>1 (2 for square root, 3 for cubic root, etc. Fractional numbers are allowed). mndat Minimum number of data for a split fragment to become a new series. It defaults to half of the swa value for daily data, or to nm otherwise, with a minimum of 5 terms. gp Graphic parameter: 0:no graphic output, 1:only descriptive graphics of the input data, 2:as with 1, plus diagnostic graphics of anomalies, 3:as with 2, plus graphics of running annual means and applied corrections, 4:as with 3, but running annual totals (instead of means) will be plotted in the last set of graphics. (Better when working with precipitation data). ini Initial date, with format 'YYYY-MM-DD'. If not set, it will be assumed that the series begin the first of January of the initial year anyi. na.strings Character string to be treated as a missing value. (It can be a vector of strings, if more than one is needed). Defaults to 'NA', the standard missing data code in R. maxite Maximum number of iterations when computing the means of the series. (999 by default). vmin Minimum possible value (lower limit) of the studied variable. Unset by default, but note that vmin=0 will be applied if std is set to 2. vmax Maximum possible value (upper limit) of the studied variable. (E.g., for relative humidity or relative sunshine hours it is advisable to set vmax=100). nclust Maximum number of stations for the cluster analysis. (If much greater than 100, the default value, the process may be too long and the graphic too dense). grdcol Color of the graphic background grids. (Gray by default.) mapcol Color of the background map. (Gray by default). hires By default, the background map will be drawn in high resolution. Set this parameter to FALSE if you are studying a big geographical area (>1000 km). expl Set this to TRUE to perform an exploratory analysis. metad Set this to TRUE if a metadata file is provided (see the details). sufbrk Suffix to add to varcli to form the name of the provided metadata file. This parameter is only relevant when metad=TRUE. Its default value 'm' is meant to read the file of break-points detected at the monthly scale. tinc Time increment between data. Not set by default, but can be defined for subdaily data, as in e.g.: tinc='3 hour'. tz Time zone. Only relevant for subdaily data. ('UTC' by default.) cex Character expansion factor for graphic labels and titles. (Defaults to 1.2. Note that if station names are long, they will not fit in titles when increasing this parameter too much.) verb Verbosity. Set to FALSE to avoid messages being output to the console. (They will be in the output log file anyway). x Vector of dates. (To be read from the *.rda file.)

### Details

First of all take into account that this is an experimental function, and will fail if there are time steps completely void of data in any sub-area.

If you have not decided the splitting meridians and parallels, do not set them, and the function will provide a map to help in selecting the areas.

If you set the xc and yc splitting borders, setting maponly=TRUE will also produce a map with the stations, plus the required overlapping areas, without doing any homogenization. In this way you can review the limits of the areas to choose new ones if you are not happy with the current partitioning.

All parameters except xc, yc, xo, yo and maponly are the same as in the homogen function, and will be passed to it to perform the homogenization.

If a rectangular area include less than 10 stations, these will be added to the next area. Warning! If it is the last area, they will not be processed, and the homogenization results will have inconsistent number of stations. In this case the user should try a new set of cutting limits.

One graphic output file will be produce for every area containing stations, but the rest of the output will be merged into single files.

### Value

This function does not return any value.

homogen, dahstat, dahgrid

### Examples

#Set a temporal working directory and write input files:
wd <- tempdir()
wd0 <- setwd(wd)
data(Ptest)
dim(dat) <- c(720,20)
dat[601:720,5] <- dat[601:720,5]*1.8
write(dat[481:720,1:12],'pcp_1991-2010.dat')
write.table(est.c[1:12,1:5],'pcp_1991-2010.est',row.names=FALSE,col.names=FALSE)
#Now run the example:
homogsplit('pcp',1991,2010,2.9,39.6,0,0,std=2)