Reconstruction of late spring phenophases in Poland and their response to climate change , 1951 – 2014

Phenology is primarily seen as an indicator of the impacts of climate change. The strongest biological signal of climatic change is revealed by phenological data from the period after 1990. Unfortunately, the Polish nationwide network of phenological monitoring was terminated in 1992, and was only reactivated in 2005. Here, we attempt to reconstruct late spring phenophases of flowering of Syringa vulgaris L. and Aesculus hippocastanum L. across several sites in Poland from 1951 to 2014 using the GIS-based approach (if observations from neighboring stations were available) and multiple regression modeling with stepwise screening and bootstrap resampling. It was found that the air temperature and its indices explain over 60% of the variance, giving an accuracy of 3.0–3.4 days (mean absolute error) and correlation coefficients of 0.83 and 0.78 for lilac and horse chestnut, respectively. Altogether, both plant species showed a statistically significant advancement in the onset of flowering with an average rate of 1.7 days per decade. We also found that the final trend is the result of rapid acceleration of the increase in air temperature after the 1990s, while most of the trends for late spring were ambiguous before that period.


Introduction
Phenological changes are regarded as an important indicator of climate change.Studies conducted in Europe show the imprint of global warming in the functioning of physical and biological components of ecosystems, changing nature with its characteristic phenological cycles, and hence the great interest in phenological data as important proxies in contemporary climate research.The majority of studies have demonstrated an advance of phenological events as a response to global warming [1][2][3][4][5][6][7].However, only about 40% of the reported trends have proven to be statistically significant [7].
In many countries, high quality phenological data from plants have been successfully used in the reconstruction of long-term temperature time series, which are essential for assessing regional climate change in the past centuries [8][9][10][11][12].This kind of research is of great significance when long-time phenological observations are available (especially for a longer period of time compared to the dense network of instrumental observations).This situation is common in most countries, and is also confirmed in the case of Poland, where disorganized phenological observations started in the late nineteenth century.However, the oldest discovered local records of phenological observations in the Polish territories were found in incunabula from the fifteenth and sixteenth centuries, thus confirming a very long tradition of phenological cycle observation [13][14][15].
Up until today, this type of observations has been independently carried out in Poland by different institutions, mainly botanical gardens, agricultural and forestry departments, which perform them for their own research.Therefore, data collections are not standardized, and each of these institutions uses their own observation methodologies and different data formats [16,17].Although an attempt to integrate or assimilate these isolated networks on a national scale seems to have a very high scientific potential, it must be unequivocally stated that numerous errors might occur when using the most common techniques in order to homogenize the archived dataset.
Unfortunately, historical data often lack continuity and are full of serious unintentional errors (e.g., typographical).On the other hand, the lack of reliable metadata means that this kind of errors cannot be corrected without the development and application of a very specific methodology [18].Therefore, assessing the possibility of data reconstruction in plant phenology, and thus improving the quality of this dataset, is of utmost importance in determining how climate change affects living organisms.
In this study, we attempted to reconstruct late spring phenophases on the example of the flowering of Syringa vulgaris and Aesculus hippocastanum, using the multiple linear regression (MLR) approach.To assess the robustness of regression techniques, which are the most commonly used in studies of similar nature (e.g., [19,20]), the researchers decided to focus on relatively short , but nonetheless more reliable phenological observations.This solution was aimed at obtaining a more accurate, complete and homogenous dataset, which might be more reliable for modeling purposes.

Phenological data
In this study, we used phenological observations carried out by the Polish Institute of Meteorology and Water Management -National Research Institute (IMGW-PIB).It is the only institution which currently conducts nation-wide integrated phenological monitoring with a homogenous methodology of observations.We selected 12 stations which are evenly distributed throughout the country (Fig. 1), covering different climatic regions of Poland.Syringa vulgaris L. and Aesculus hippocastanum L. flowering observations were chosen as the indicator species of the phenological late spring season, which is characterized by a relatively small variation in the onset dates [21].That may suggest a distinguishable physical signal affecting the physiological reaction of both species, and might therefore be suitable for finding robust predictors applied for regression models.In the case of the selected species, the beginning of flowering was recorded as the day when approximately 10% of the first flowers open.
Although our intention was to use the longest possible time series, the historical background of phenological network in Poland after World War II needs to be highlighted.The first version of the modern phenological network became operational in Poland in the years 1951-1992.After a long 13year break, it was partly reactivated in 2005 on ca.40 stations by both professional and voluntary observers.Observations were carried out on selected indicator species in the same neighborhood as previously, even though the very specific area of observations was not precisely delineated (in a spatial sense).Therefore, the chosen observational site was characteristic in terms of climate, soil, topography, and had average conditions for the growth and development of indicator species, which were represented by numerous individuals.
In 2007, an administrative decision was made to include phenological observations in an extended program of meteorological observations run by the IMGW-PIB.Since then, phenological observations have been carried out according to the BBCH-scale (abbr.from German: "Biologische Bundesanstalt, Bundessortenamt und Chemische Industrie"), which was implemented in observation procedures in most European countries with similar growth stages of plant species [22].And so, the 2005 reactivated observational network was completely abandoned by 2007 (or up to 2008 depending on the station), and moved to new locations of the main meteorological stations.It means that the newly established (from 2008 onwards) phenological network, with observations carried out by professional observers, does not cover the archival one in a spatial sense.To fill the gap, the procedures described below were applied in terms of reconstructing the missing data.

Meteorological data
In most cases, the 12 selected phenological stations do not have a complete and quality-controlled archived dataset for air temperature.On the other hand, air temperature and its indices are commonly identified as a reliable proxy for determining phenological changes [21,[23][24][25][26]. Therefore, to ensure reproducibility, researchers have decided to use the nearest node of a freely available ECA&D [27].The ECA&D database is a collection of daily meteorological parameters derived from observations obtained at the stations, assimilated and interpolated to a regular grid of 25 × 25 km.Using gridded instead of raw weather data allows one to proceed with all the calculations independently of the available meteorological data.A daily dataset was next applied to compute the monthly mean air temperature and accumulated growing degree days (GDD) according to the formula given in Eq. 1, which uses the threshold values of 0.0°C, 2.5°C, 5.0°C, 7.5°C, and 10°C.Accumulated GDDs were calculated by adding the specific daily GDD values starting from the beginning of the year.If the daily maximum temperature was not higher than the threshold value, then the GDD value did not change [28].
where: T max -daily maximum temperature T min -daily minimum temperature T threshold -threshold temperature.

Database reconstruction 2008-2014 -kriging with external drift
The phenological network reactivated in the years 2005-2007 was usually run in the same locations as the one in operation until 1992.However, very short time series (3 years only) are problematic in terms of a reliable detection of climate change signal and its impact on the onset date of the late spring phenological season.To fulfill the gap for both phenophases, it was decided to apply GIS algorithms for spatial interpolation.The idea is based on extracting values for the locations of historical stations using interpolation results derived from phenological observations carried out in the IMGW-PIB meteorological stations in the years 2008-2014.To incorporate altitude corrections, we applied the kriging with external drift method using the digital elevation model as an explanatory variable in the linear regression model.A similar approach was adopted by Ustrnul and Czekierda [29], Czernecki and Miętus [30], Szymanowski and Kryza [31] showing good accuracy in terms of recognizing spatial patterns and seasonal dynamics in Poland.

Database reconstruction 1993-2004 -multiple linear regression model
Since no phenological data are present in the period 1993-2004, researchers decided to use the MLR modeling technique, very common in phenological studies, in order to reconstruct these time series [19,20].In this study, it was decided to follow the advice for model building provided by Kuhn and incorporated into the caret R package dedicated to statistical model training and tuning [32,33].First of all, pre-processing procedures were applied, especially in terms of removing outliers which may commonly occur in phenological archives.This step was done with special care mainly by taking into account some ideas suggested by Schaber and Badeck [18], e.g., removal of observation outliers in case of detecting noticeable differences with a simple regression model.Additionally, onset dates in the surrounding stations and differences in flowering dates of Syringa vulgaris and Aesculus hippocastanum were taken into account in doubtful cases.Finally, we manually corrected or excluded 27 observations of S. vulgaris (5% of the whole dataset) and 16 observations of A. hippocastanum (2.9%).
The number of possible predictors that might be applied in MLR means that the final selection is very important in terms of avoiding model overfitting.Keeping in mind Occam's razor principle, to avoid making more assumptions than is really necessary, it was decided to test extensively a relatively large set of available predictors including: (i) monthly mean GDDs from January to May, with a threshold of 0.0°C, 2.5°C, 5.0°C, 7.5°C, and 10°C; (ii) stations' coordinates and altitude; (iii) factorialtype stations, which may require additional corrections if stations represent a very specific kind of local features; (iv) monthly mean air temperatures from November of the previous year up to May.
In each case, datasets were pre-processed using the Yeo-Johnson transformation, which is similar to the popular Box-Cox model, but can accommodate predictors with negative values as well [34,35].Additionally, scaling (i.e., dividing by the standard deviation) and centering (i.e., subtracting the mean of the predictors' data) were applied to transfer raw datasets to a more robust distribution so as to fulfill the theoretical prerequisites for regression techniques (i.e., related to the Gaussian distribution).
The cross-validation procedure was based on the 10-fold cross-validation approach repeated 10 times using bootstrap resampling.After every single calculation of MLR, the Akaike information criteria (AIC) [36] were computed to select the best suited set of predictors and to estimate the final values of model's parameters.This strategy of using linear regression with stepwise selection makes it possible in both cases to exclude half of the possible predictors with redundant information.

Statistical measures
All calculations were carried out using the R programming language [37] and its dedicated packages: "gstat" [38] for incorporating GIS-based spatial interpolation methods and "caret" [35] for the needs of statistical modeling.Evaluation of the MLR's performance was verified against all available observational datasets using: (i) normalized standard deviation -to account for the magnitude of reproduced ranges [39], (ii) Pearson's (r) correlation coefficient -to inform about the level of agreement in reproducing a directional change of time series, (iii) mean absolute error (MAE) -to inform about the mean "distance" from the perfect prognosis, and (iv) root mean square error (RMSE) -to account for the square root of a prognosis' average squared distance from the observations [40].The mathematical background used for verification purposes is stated below in Eq. 2-Eq. 5.
Eq. 2 Eq. 3 where: x j -observational value x avg -the mean of observations y j -predicted value y avg -predicted mean.
The changing trends in the onset dates of the late spring season were examined using the slope coefficient of linear regression.Although in many similar studies significance is usually verified by the F-test statistics, in this particular case we decided to apply the Mann-Kendall non-parametric test [41,42] at the significance level of 1−α = 0.95.This test is recommended in cases where low sensitivity to abrupt breaks due to an inhomogeneous time series is needed.What is even more important is that the Mann-Kendall test does not require a Gaussian distribution of data, which is a common problem when analyzing long-term phenological data with a strong trend and skewness of distribution [43].

Phenology model evaluation
The created MLR models do not differ significantly in terms of a reliable reconstruction of the archived time series.Slightly better verification statistics were obtained for Syringa vulgaris than Aesculus hippocastanum.However, these regularities are not confirmed in every of the analyzed locations, which presumably suggests major variability in the quality of data, which is very common in phenological studies [44] and is usually strongly related with an observers' perception or the reaction of individual plants to external factors.
Since the predictors used for the model building were scaled and preprocessed, it is impossible to interpret literally the contribution of every single variable to the final result.However, some ideas, e.g., suggested by Gevrey et al. [45] were used to estimate the contribution of selected variables so as to determine the most important predictors.It turns out that accurate modeling of the late spring phenophase is most dependent on accumulated GDD indices >5°C in May and monthly mean air temperature in April.GDD indices for April are almost twice less important than those in May.Temporal autocorrelation weakens for all the preceding months, e.g., it is estimated that the monthly mean temperature in February keeps around four times less information compared to some of the calculated GDD indices in May.In both models, geographical location features were excluded from the MLR while applying step-wise screening with the AIC method.Nonetheless, this information was partly preserved as a factorial type of the data from every station, which was one of the most important ingredients in the final model's equation.Out of 12 analyzed stations, five for S. vulgaris and six for A. hippocastanum have significant p-value statistics.Therefore, the data from the analyzed stations confirm that late spring flowering events are not always linearly related to large-scale patterns of temperature.
These site-specific features for individual stations are clearly seen in the model's ability to reproduce the interannual variability.Historical dates of S. vulgaris and A. hippocastanum flowering are on average reproduced with a Pearson correlation of 0.83 and 0.78, respectively.The highest correlation for lilac was obtained in Zbików Duchnice (r = 0.91) and only slightly lower values were calculated for Olza (r = 0.91), Gorzów Śląski (r = 0.88), and Solec nad Wisłą (r = 0.87).The weakest correlation was For the horse chestnut, the statistics were slightly worse.The average MAE for all analyzed stations was 3.43, while RMSE was 4.42.The best reproducibility in terms of MAE was obtained in Solec nad Wisłą (2.53) and Węgorzewo (2.56), while the lowest was found in Dąbrowa Białostocka (4.09).For RMSE statistics, this range varies from 3.06 (Solec nad Wisłą) to 5.57 in Dąbrowa Białostocka.This station also has the lowest correlation values (r = 0.56), while the highest possibility to reconstruct temporal variability was found in the data series from Solec nad Wisłą (r = 0.90).
Since regression-based modeling techniques tend to underestimate variation in modeled time series and focus on reproducing mean values, some evaluation measure inaccuracies are the result of the under-representation of extreme flowering dates.This can be clearly seen in the histogram of residuals presented in Fig. 2, and later on in time series charts in Fig. 3 and Fig. 4. The model's accuracy in 50% of all cases predicted for lilac is in range of −2.5 to 2.3 days, while 90% of model residuals is in the range of −6.3 up to 6.4.All model predictions with a precision of ±5 days make up 84% of the results, even though some outliers of more than 2 or even up to 3 weeks (max.19.2 days) are possible as well.
These same regularities are found for the horse chestnut, even though in this case the histogram of model residuals has a slightly more flattened Gaussian distribution, which has a negative impact especially on the verification statistics in the most probable range (Fig. 2, Tab. 1).Despite this, 50% of all predicted values have residuals in a range of −2.7 to 2.8, while the maximum errors are −21.1 and 20.2 days.The percentiles 0.05 and 0.95 equal −6.8 and 7.6 days, respectively, so it can be roughly interpreted that almost 90% of cases are predicted with an accuracy better than ±1 week.
One of the disadvantages of using a simplistic regression approach is the variance inflation problem [40].Thus, the reconstructed time series were also characterized by the lowest variability relative to the observational dataset (Fig. 3 and Fig. 4).To estimate this rate of losing information, normalized standard deviations were calculated to make it possible to compare between plant species and particular locations.The mean value of standard deviation in terms of lilac is about 18.6% smaller compared to the original time series, while for horse chestnut it is 23.7%.After the recalculation into values given in days, it becomes respectively 1.35 and 1.73 days.

Trends of plant phenology in Poland since the 1950s
In this study, we compared the trends in phenological changes of selected species for a fully reconstructed time series.It means that for the period of 2008-2014, spatially interpolated values were taken into account, while for the period of 1993-2007 the missing data were filled in with reconstructions provided by the statistical modeling procedure.
The linear trends detected for the entire data frame of 1951-2014 (Fig. 3 and Fig. 4) clearly reveal the advancing of late spring phenophases in Poland regardless of the locations of the stations.For both selected plant species, the average trend of advancing phenological late spring (calculated as an average for lilac and horse chestnut) is estimated at −1.7 days per decade and this value varies from −2.4 in Koźla to −1.2 days per decade in Solec nad Wisłą and Mielnik (Tab. 2 and Tab.3).For both plant species, the linear trends in the years 1951-2014 do not vary significantly from the average for the area and do not show any clearly distinguishable regional pattern.
It is worth mentioning that all the detected long-term trends (1951-2014) are statistically significant according to the Mann-Kendall test for both S. vulgaris and A. hippocastanum.In the archival subperiods before the 1990s, most of the calculated trends were negative, but often close to zero, which made them statistically insignificant or significant only occasionally regardless of the analyzed plant species.A remarkable acceleration of late spring phenophase is clearly seen in the slope of linear regressions after the 1990s (Tab. 2 and Tab.3), which strongly affects the Kendall rank correlations and their p-values.
The estimated trend of 1.7 days per decade is also confirmed by differences in the frequencies of the seasons' starting dates in particular subperiods.Before the 1990s, the flowering of Aesculus hippocastanum in the second part of May was quite a common situation (ca.17-26% of all cases), whereas nowadays it is less than 5% of all observations.Similar regularities were found for lilac, which confirms that the detected trends in flowering onset dates have significant shifts in their distributions as well, further corroborating the results of a similar research for Central Europe [26].

Discussion and summary
The reconstruction of late spring phenophases using the MLR approach presented in this study seems to have a high application potential.The constructed models based on coupling phenophases with air temperature and its indices accounted for 60-66% of variance in Syringa vulgaris and Aesculus hippocastanum flowering dates.This result is compatible with other studies which focus on possibilities of reconstructing spring phenophases in Central Europe, also for longer time series [25].The mean accuracy of the model is 3.0-3.4days in terms of MAE values and 3.9-4.4for RMSE, while the mean Pearson correlation is 0.83 and 0.78 for lilac and horse chestnut, respectively (Tab.1).Over 90% of the model residuals are smaller than ±1 week, although phenological observations are in many cases strongly related with an individual species or biased by the individual perception of the observers [46][47][48], and thus some outliers of more than 2 weeks are occasionally possible.The reconstructed phenological time series have slightly smaller standard deviation values, which is typical of regressionbased modeling techniques.Nonetheless, variance inflation described by normalized standard deviation values is on average lower at only 18.6% for lilac and 23.7% for horse chestnut, which is equal to 1.35 and 1.73 days, respectively.All the results clearly show that even a relatively simplistic statistical model may well be a robust tool in reconstructing missing phenological time series.This issue is of special importance in Poland, where phenological observations were abandoned in 1992, with the reactivated network organized in new locations.Therefore, to account for different quality in historical datasets, and the fact that they were collected at different locations, the GIS-based kriging with external drift and statistical modeling techniques were applied.It was also possible to detect observational outliers and to homogenize the database according to some proposals found in literature [18].In the authors' opinion, the proposed statistical modeling procedure might also have a potential in investigating the impact of climate projections [49] on a particular plant species.However, it must be noted that this relation may not be linear in the foreseeable future (e.g., due to the increase in extreme weather events and their harmful impact on some species, etc.), which will require additional testing.
The reconstructed and quality-controlled time series of lilac and horse chestnut have made it possible to recognize shifts in flowering of these plant species in Poland while taking into account their longest possible time series.The length of a time series is crucial for detecting changes in phenological research [50,51], especially when taking into account the 1990s and the decades that followed as being the warmest in the history of instrumental observations [9,49].Phenological observations carried out after the 1990s reveal the strongest biological signal of climatic change [19,26], which is also confirmed by the slopes of linear regression obtained in this study.On average, in the years 1951-2014 the advancing of late spring phenophase is estimated to be about −1.7 days per decade and in all the analyzed cases this trend is statistically significant according to the Mann-Kendall test.
On the other hand, earlier periods (before 1990) were unambiguous when it comes to time and spatial trends in phenological changes (Tab. 2 and Tab.3).This agrees with some discontinuity found in the phenological trends of the late 1980s in many European areas, where almost no such trend was observed [50,52].Studies analyzing long-term phenological records often revealed a heterogeneous pattern of temporal variability with advanced and delayed periods [1,7,24].
All these regularities were also found in Poland in comparison to the phenological trends from the archival period (1951-1992) and the fully reconstructed time series .In the period 1951-1992, negative trends were recognized for Syringa and Aesculus in 67% of cases, and none of them were significant.In the 1951-2014 period, the ratios were quite different: 100% negative trends, and all were significant.This is a result of the rapid increase in air temperature from January to May after the 1990s (Fig. 5), which confirms the statements presented in another research concerning seasonal changes in the Polish climate [30].After the 1990s, all months have positive air temperature anomalies compared to the previous subperiods.As a result, slopes of flowering dates in the 1991-2014 period show acceleration that is twice as fast in comparison to the entire period of analysis (Tab. 2 and Tab.3).Therefore, the termination of the nationwide network in 1992 was extremely unfavorable for phenological and climate change research.In the 1971-1990 period, an increase in monthly air temperature was compensated by a decrease in air temperature for the month of April (Fig. 5), which simultaneously is highly correlated with the late spring phenophase and thus impacts ambiguous plant flowering trends in this period.
The results of the research presented above clearly show the effects of climate change and its seasonal aspects on late spring phenology.Huge differences between phenological trends studied before in the archival period (i.e., 1951-1990) by Jabłońska et al. [17] and the reconstructed one for the entire time series (1951-2014) revealed also the decisive importance of the length of a time series in the study of phenological changes.It also shows a possible direction for further research that needs to link a nationwide phenological research with climate change and fluctuations which might well be significantly affecting the detected long-term trends.Hence, any such studies would be of relevant importance in terms of the following aspects: ecological conservation, possible future scenarios, and plant adaptation strategies.
( ) Fig. 5 Air temperature anomalies and their seasonality in the selected stations since the 1950s.Anomalies for particular subperiods (1951-1970, 1971-1990, 1991-2014) as calculated against the entire period of analysis .

Fig. 1
Fig. 1 Locations of phenological stations used in the study in Poland.Grey dots denote the locations of phenological stations operational in the years 1951-1992.Blue dots denote the stations reactivated in the years 2005-2007.
(r = 0.69), and the same station also has the highest value of RMSE (4.95) as well as the second highest MAE value (3.62).In contrast to the aforementioned data, the best RMSE and MAE statistics were obtained in Zbików Duchnice (MAE = 2.25, RMSE = 3.03) and Gorzów Śląski (MAE = 2.29, RMSE = 3.10).On average, the MAE and RMSE values for lilac were 3.00 and 3.94, respectively.
Linear regression coefficients of Syringa vulgaris flowering dates, divided according to particular subperiods of 1951-2014.Values denote a change in days per year and are bolded if the trend coefficient is significant according to the Mann-Kendall test at 1−α = 0.95.Linear regression coefficients of Aesculus hippocastanum flowering dates, divided according to particular subperiods of 1951-2014.Values denote a change in days per year and are bolded if the trend coefficient is significant according to the Mann-Kendall test at 1−α = 0.95.