Οι καταγραφές υδροκλιματικών δεδομένων περιέχουν ανομογένειες, δηλαδή σφάλματα
διαφόρων τύπων, συνηθέστερα άλματα, τα οποία εντοπίζονται και διορθώνονται με
στατιστικές κυρίως μεθόδους ομογενοποίησης, συνήθως σε σύγκριση με γειτονικές
χρονοσειρές. Η ομογενοποίηση αποτελεί αντικείμενο διαμάχης, επειδή υπάρχει υποψία ότι
εισάγει συστηματικά σφάλματα τα οποία αλλοιώνουν την κλιματική πληροφορία.
Στα πλαίσια της εργασίας αυτής διερευνήθηκαν συστηματικά οι διάφορες μέθοδοι
ομογενοποίησης δεδομένων θερμοκρασίας και βροχόπτωσης και οι μελέτες αξιολόγησής
τους στη βιβλιογραφία. Διαπιστώθηκε ότι οι μέθοδοι και οι αξιολογήσεις τους μέχρι σήμερα
βασίζονται σε δύο υποθέσεις: Πρώτον ότι τα υδροκλιματικά δεδομένα χαρακτηρίζονται από
ασθενή ή και καθόλου αυτοσυσχέτιση, χωρίς να λαμβάνεται υπόψη ένα στοχαστικό
χαρακτηριστικό των δεδομένων, η μακροπρόθεσμη εμμονή, δηλαδή μια δομή
αυτοσυσχέτισης που διατηρείται στο χρόνο και προσδιορίζεται από το συντελεστή Ηurst.
Και δεύτερον, ότι η διαφορά θερμοκρασίας και ο λόγος βροχόπτωσης γειτονικών σταθμών
αποτελούν σειρά τυχαίων αριθμών, ενώ αντίθετα αποδεικνύεται ότι είναι συσχετισμένα.
Για να διερευνηθούν τα αποτελέσματα της ομογενοποίησης σε δεδομένα με μακροπρόθεσμη
εμμονή, εφαρμόστηκαν, με χρήση της μεθόδου Monte Carlo δύο κλασικές μέθοδοι
ομογενοποίησης, οι SNHT και Διπλή αθροιστική καμπύλη, σε ετήσια ομογενή συνθετικά
δεδομένα θερμοκρασίας και βροχόπτωσης, αντίστοιχα.
Διαπιστώθηκε ότι οι δύο μέθοδοι ομογενοποίησης οδηγούν σε στατιστικά μη σημαντικό
αριθμό χρονοσειρών με πλασματικά άλματα για τιμή του συντελεστή Hurst Η = 0.5, που
αντιστοιχεί σε ασυσχέτιστα δεδομένα. Αύξηση όμως του συντελεστή Hurst (0.5 < Η < 1)
οδηγεί σε αύξηση του ποσοστού των χρονοσειρών με πλασματικά άλματα
Διαπιστώθηκε επίσης ότι ο αριθμός και ο τρόπος χρήσης των χρονοσειρών αναφοράς
επηρεάζει επίσης πολύ το ποσοστό των χρονοσειρών στις οποίες εντοπίζονται ψευδή άλματα.
Από την εφαρμογή της μεθόδου SNHT προέκυψε ότι το ποσοστό χρονοσειρών με ψευδή
άλματα φαίνεται να επηρεάζεται από το μήκος των χρονοσειρών και την ελάχιστη απόσταση
ανάμεσα στις ανομογένειες, όχι όμως από τη συσχέτιση ανάμεσα στις χρονοσειρές αναφοράς
και τις ελεγχόμενες.
Τέλος, από διόρθωση των αλμάτων που εντοπίζονται από το SNHT φαίνεται ότι η μέθοδος
δεν οδηγεί σε αλλοίωση των τάσεων μεταβολών θερμοκρασίας, καθώς τα ποσοστά των
χρονοσειρών στις οποίες αυξάνονται και μειώνονται οι τάσεις είναι σχεδόν ίσα. Αντίθετα, η
διόρθωση αλμάτων τείνει να μειώνει το συντελεστή Ηurst για αρχικά δεδομένα με μέτρια
έως ισχυρή εμμονή (Η > 0.65), όχι όμως για Η < 0.65.
Hydroclimatic time series contain inhomogeneities, which are errors introduced by
replacements and calibration of instruments, station relocations, changes in the environment
of the stations, etc. The most common expression of inhomogeneities is shifts between two
parts of time series. The identification and correction of inhomogeneities is called
homogenization and is usually done with statistical methods which compare a candidate
station with one or more neighbouring reference stations, assuming that they belong to the
same climatological region and they reflect the same weather and climate variations.
The homogenization of hydroclimatic data, mainly of temperature and precipitation time
series, is a procedure of great importance and also a controversial subject because of its
implications in the estimations of climate change. This study focuses on a generally ignored
by the homogenization community, though important characteristic of hydroclimatic time
series and of its effects on homogenization, the long-term persistence of hydroclimatic data,
and has two components: (a) a literature review and (b) a computational approach.
1. Literature review
A systematic study of the scientific literature was made in order to examine types and causes
of inhomogeneities, to identify and classify the existing homogenization methods, understand
their stochastic background and to evaluate their output. This literature review focused
mainly on previous evaluation studies of homogenization methods with synthetic data.
A main result of this study is that existing homogenization methods generally ignore the
long-term persistence of hydroclimatic data expressed by the Hurst coefficient and examine
only first-order autoregressive series (AR(1)) or series of identically and independently
distributed Gaussian errors. No systematic studies of the relationship between the Hurst
coefficient and the homogenization results have been identified.
It was also found that homogenization methods assume that the series of temperature
differences or precipitation ratios between reference and candidate stations constitute series
of random numbers (e.g. white noise). However a basic stochastic analysis indicates that the
difference and ratio series reproduce the autocorrelation function of the reference and
candidate series assuming that both have the same autocorrelation structure.
2. Computational approach
A computational approach based on Monte Carlo simulations permitted to understand and
evaluate the behaviour of selected classical homogenization methods as a function of various
parameters:
a) the Hurst coefficient of hydroclimatic time series (tested values Η={0.50, 0.55, ..., 0.90}),
b) the cross-correlation coefficient between candidate and reference time series (tested
values: ρXY={0.5, 0.6, 0.7, 0.8, 0.9, 0.95}),
c) the number of reference time series and the way they were used to locate shifts (reference
systems - shown in Table 1),
d) the length of the time series (50 and 100 years), and
e) the minimum distance between possible inhomogeneities or between inhomogeneities and
the edge of series (tested values 5 and 10 years).
The percentage of time series with false alarms was regarded as the critical factor for this
evaluation.
For temperatures, the homogenization method selected was SNHT for shifts (Alexandersson
and Moberg, 1997) in combination with all systems of reference series summarized in Table
1 except 3/1 (see Figure 1). For multiple reference series a pairwise comparison of the
candidate series with all reference series was applied. The SNHT was applied using a cutting
algorithm described by Domonkos (2011a) and a 95% confidence level. For the precipitation
data the Double Mass Curve (Kohler, 1949, Searcy and Hardison,1960) was selected. The
original method is subjective and involves the identification of the main inhomogeneity of the
time series on a graph. An objective (automated) version was developed using a piecewise
linear algorithm based on the least squares approach. The reference series systems used were
the most commonly applied 1/1 and 3/1.
The synthetic series with long-term persistence were simulated using a multiple time-scale
fluctuation approach proposed by Koutsoyiannis (2002) and following the normal
distribution. Temperature data were generated with zero mean and unit standard deviation
and precipitation data with a mean of 1000mm and a standard deviation of 300mm. For every
candidate series one or multiple correlated reference series with the same characteristics were
generated (see Table 1). All simulations and computations were based on original Matlab
codes.
3. Results and conclusions
Some main conclusions of this study are summarized in Figures 1, 2, 3 and 4.
a) For time series with H=0.5 (i.e. characterized by white noise), the false alarm rate is not
significant (below 5%), which is expected because of the design of the homogenization
methods, but the percentage of series with false alarms increases with H.
b) The number of reference series and of the minimum number required to locate shifts in the
time series greatly affects the percentage of series with false alarms.
Furthermore, some more conclusions can be extracted concerning the application of the
SNHT to temperature data and the Double mass curve to precipitation data.
For the temperature data (see Figures 1 and 2) it can be assumed that:
a) For a Hurst coefficient H ≥ 0.85, common shifts located by all (minimum number 3)
reference time series correspond to percentage of series with false alarms higher than 5%
and tend to indicate a real inhomogeneity (e.g. systems 2/2 and 3/3 in Figure 1).
b) In the case of a common shift identified by some of the reference series only, this may
only correspond to a false alarm (e.g. Figure 1). In such cases a possible inhomogeneity
must be confirmed by analysis of the reference time series.
c) The cross-correlation coefficient between reference and candidate series does not seem to
influence the percentage of time series with false alarms.
d) The percentage of series with false alarms for time series with length 50 years was lower
than the percentage for 100 years (Figure 1).
e) The minimum distance between possible inhomogeneities or between inhomogeneities and
the edge of series influences but not greatly the percentage of series with false alarms. A
minimum distance of 10 years leads to a lower percentage than a minimum distance of 5
years (Figure 1).
f) For the case of a single reference series, corrections of the located shifts were applied.
These corrections led to a similar percentage of series with increased and decreased trend
after the homogenization. Therefore it seems that SNHT does not introduce significant
changes in the temperature trends.
g) For H < 0.65 the percentage of series with an increased Hurst coefficient after
homogenization is similar to that with a lower Hurst coefficient. For Η > 0.65 there is a
different case. The percentage of series with an increased Hurst coefficient exceeds that of
series with a lower Hurst coefficient. This difference increases with the increase of the
initial Hurst coefficient of the time series.
For the precipitation data (see Figures 3 and 4) it can be assumed that:
a) For all values of the Hurst coefficient examined, the percentage of series with false alarms
decreases with the increase of the ratio of the slopes of the two lines of the Double Mass
Curve.
b) Application of the Double Mass Curve with a reference time series produced by three time
series (3/1) tends to decrease the percentage of false alarms in comparison to the
application of the method with a single reference series (1/1).
c) For the system 1/1 and all the parameters examined a slope ratio 1.5 corresponds to a
percentage of series with false alarms lower than 5%. For the system 3/1 the same ratio is
1.3. These values seem to be indicative of a real inhomogeneity.