Phylogeny of beech in western Eurasia as inferred by approximate Bayesian computation

The Fagus sylvatica L. species complex in Europe and Western Asia comprises two commonly recognized subspecies, F. sylvatica subsp. sylvatica [= F. sylvatica sensu stricto (s. str.)] and F. sylvatica subsp. orientalis (= F. orientalis), and two putatively hybridogenous or intermediate taxa, “F. moesiaca” and “F. taurica”. The present study aimed to examine the demographic history of this species complex using 12 allelic loci of nine allozymes scored in 279 beech populations in western Eurasia. Three sets of phylogenetic scenarios were tested by approximate Bayesian computation: one dealing with the divergence of subspecies and/or regional populations within the whole taxonomical complex, and two others focusing on the potential hybrid origin of “F. moesiaca” and “F. taurica”. The best-supported scenario within the first set placed the time of divergence of regional populations of F. orientalis in the Early Pleistocene (1.18–1.87 My BP). According to this scenario, the Iranian population was the ancestral lineage, whereas F. sylvatica s. str. was the lineage that diverged most recently. “Fagus taurica” was found to have originated from hybridization between the Caucasian population of F. orientalis and F. sylvatica s. str. at 144 ky BP. In contrast, there was no evidence of a hybrid origin of “F. moesiaca”. The bestsupported scenario suggested that the Balkan lineage is a part of F. sylvatica s. str., which diverged early from F. orientalis in Asia Minor (817 ky BP), while both the Italian and Central-European lineages diverged from the Balkan one later, at the beginning of the last (Weichselian) glacial period.


Introduction
Beech trees represent the most important component of montane forests in the temperate zones of Europe and Western Asia.Beech forests with the admixture of local spruce and fir species are the most productive and stable native forest communities in the mountains of Central Europe, as well as the Near and Middle East.As such, various taxa of the genus Fagus L. have long been the focus of the interest of botanists.However, the number and rank of Fagus taxa in western Eurasia have been controversial, and their phylogeny is still not resolved.The state-of-the-art view, accepted by Flora Europaea [1] and recent systematic revisions [2,3], considers all beech populations in western Eurasia as belonging to a single species, Fagus sylvatica L., with two subspecies: F. s. subsp. in most of the species' European distribution [denoted hereafter as F. sylvatica sensu stricto (s.str.) for simplicity] and F. s. subsp.orientalis (denoted hereafter as F. orientalis), whose range begins in the southeastern Balkan Peninsula (Black Sea coast of Bulgaria and northeastern Greece) and continues through Turkey and the Caucasus to the Alborz Mountains in Iran.However, the latter taxon was described by Lipsky [4] as a separate species, F. orientalis Lipsky, and this opinion is generally shared by local botanists [5,6] and forestry practitioners [7].Palibin [8] distinguished beech in the Caucasus and the Alborz Mountains as representing a separate species, F. hohenackeriana, but this view has not been generally accepted, and Greuter and Burdet [9] included it within their F. sylvatica subsp.orientalis.Two other taxa are mostly considered to be intermediates between the above subspecies: "F.taurica" Popl.distributed in the mountains of the Crimean peninsula, and "F.moesiaca" (Domin, Maly) Czeczott of the southern Balkans.Both taxa were originally described as separate species ( [10] and [11], respectively), but again views about their taxonomical statuses are quite controversial, and they are sometimes considered to represent lineages that originated from hybridization between the subspecies [4,8,12,13].The subspecies also differ in the patterns of their geographical distributions: in Europe, the range of beech is almost continuous, except in peripheral regions like the Cantabrian range, Apennine Peninsula, or southern Balkans, whereas the range of oriental beech is much more fragmented into several large populations that are relatively isolated from each other (northern Turkey, the Amanus Mountains, the Caucasus, and the Alborz) [14,15].
Knowledge of the phylogeny of F. sylvatica is also limited.Complex studies based on internal transcribed spacer (ITS) ribosomal DNA sequences and morphology of this taxon plus related Asian and North American Fagus species were carried out by Denk et al. [16,17].Thorough insights into beech phylogeny based on the fossil record, biogeography, and geological history were provided in a study by Denk and Grimm [18].Fagus sylvatica was represented in all three of these studies, but they did not focus specifically on this species.Generally, phylogenetic studies within the genus Fagus have included a broad variety of species covering the whole range of the genus, but they typically lacked the detail necessary to reveal relationships among subspecific taxa or regional populations within F. sylvatica s. l. [19][20][21].
Marker-based studies focusing on beech in western Eurasia, including those based on the same dataset as the current study [22,23] (and see below), usually inferred the evolutionary past of beech populations from the results of clustering methods, which were either distance-based or based on simulations (mostly Bayesian Monte Carlo-Markov chain approaches, such as those of Pritchard et al. [24] or Corander et al. [25]).Recently, more direct and more powerful tools based on coalescent theory have been developed, such as those using approximate Bayesian computation (ABC), which allow simultaneous estimation of the times of divergence of genetic lineages and their effective population sizes [26,27].In this study, we applied the ABC approach to verify the conclusions about the phylogenetic relationships among beech taxa and regional populations from earlier studies [22,23], and to make inferences about the past demography of beech in western Eurasia.
Interpretation of zymograms and genotyping followed Merzeau et al. [28] and Müller-Starck and Starke [29].Alleles at each locus were identified by the relative migration rates of their respective protein fractions, expressed as a percent of the migration rate of the most frequent allele of that gene in the Central-and Western-European beech populations.
Speciation scenarios were tested by applying ABC analyses.However, simulations based on the complete dataset consisting of 15,194 genotypes would last excessively long (estimated duration in the range of several weeks per simulation), so the dataset had to be trimmed in such a way that a certain number of randomly chosen trees per population were retained.In the dataset of F. sylvatica s. str., trimming was the strongest, and only two trees per population were retained except for populations that originated from minor glacial refugia and were underrepresented in the collection, specifically the Apennine Peninsula (all samples retained) and "F.moesiaca" (southern Balkans; 14 samples per population).For F. orientalis, 24 samples per population were retained from the Alborz and Asia Minor regions, 28 from the Caucasus, and 27 of "F.taurica" (Crimea).In this way, the total sample size was reduced to 2,659 trees (resulting in a reduction of computing time to few days per simulation), while all studied taxa and/ or regional populations were represented by comparable sample sizes.
The computations were performed in DIYABC v. 2.05 [30].In all runs, 100,000 datasets were simulated for each scenario.As we have no information on the distribution of mutation rates of allozyme loci in Fagus, these were set to vary between 10 −7 and 10 −5 , which is analogous to observed values in other organisms [31,32].The 1% of the simulated datasets that were the most similar to the observed data were used to estimate relative posterior probabilities.Simulations applied the generalized stepwise mutation (GSM) model.No prior assumptions were made of divergence times.As the ABC algorithms use the number of generations to estimate the divergence time, a generation turnover time of 100 years was chosen for the conversion of generations into a time scale of years.Isolated beech trees may start flowering at 15-20 years of age, and trees in a canopy at approximately 50 years [14,33]; however, the average age of offspring-producing trees is certainly higher in a natural forest ecosystem.At the local scale, abundant regeneration repeatedly occurs every 100-120 years, when the first gaps occur in the closed canopy after a stage of one-layer stand associated with the culmination of the standing stock [34].The lower limit of this interval was taken as the estimated generation time, as the earliest survivors are likely to develop the largest dimensions and thus contribute the most through both male and female gametes to the gene pool of the offspring.
Model checking was done by ranking the summary statistics of the observed data (mean number of alleles and mean heterozygosity across loci, FST) against the posterior predictive distributions, as well as by principal component analysis (PCA), which placed the observed data onto the cloud of datasets simulated with the prior distributions of parameters and datasets from the posterior predictive distribution.
The scenarios tested are shown in Fig. 1, and they are displayed geographically in Fig. S1.The first set of scenarios focused on the phylogeny within F. sylvatica s. l., namely on the divergence of regional populations within F. orientalis (Asia Minor, Caucasus, and Alborz) and the potential paraphyly of F. sylvatica s. str.In the second round of simulations, we tested whether Crimean beech ("F.taurica") is a lineage within F. orientalis or a taxon that originated from hybridization between F. sylvatica s. str.and F. orientalis.Finally, the last set of scenarios addressed the potential hybrid origin of Balkan beech ("F.moesiaca") and the phylogenetic position of beech populations in the Apennine Peninsula.Only scenarios with realistic biogeographic settings were considered.

Results
Allelic variation within F. sylvatica s. l.
were shared between them.In the cases when an allele was absent in one subspecies, it was found at very low frequency in the other one (always <0.2%).In multiallelic loci, relative migration rates of the respective protein fractions mostly changed by regular steps.Moreover, the more the migration rate differed from that of the most common protein fraction, the lower the respective allele frequency was, with a few exceptions (Mdh-A, Mdh-B, and partly Skdh-A).

Phylogeny simulations
ABC simulations were done under three sets of scenarios.The performance of simulations was quite good, as the observed data were placed fairly well in the center of the point clouds produced from simulated datasets in the case of all three selected scenarios on PCA plots (Fig. S2), and (with two exceptions) summary statistics were always within the interval of 5% to 95% of simulated values (Appendix S1).
The parameter estimates for the most probable simulated scenarios are presented in Tab. 2. In most simulations, the posterior estimates of the mutation rate were around 6 × 10 −6 .
The first set of simulations comprised four scenarios, successively testing F. sylvatica s. str.and each of the major regional populations of F. orientalis as the ancestral lineage from which the others diverged.Scenario 3, in which the Alborz population of beech was treated as the ancestral one, received the best support (posterior probability of 0.63).However, the probability of Scenario 1, supporting the current taxonomical classification of the two subspecies, was also relatively high (0.22), which means that the odds for Scenario 3 compared to this one were approximately 3:1.Divergence times for all major regional populations under Scenario 3 ranged between 11,800 and 18,700 generations, placing the divergence events in the Early or Middle Pleistocene.Effective population size estimates for the Caucasus and Asia Minor populations were approximately threefold of those from the Alborz and for F. sylvatica s. str.
Simulations in the second round of scenarios rendered Crimean beech a product of an ancient hybridization between F. sylvatica s. str.and a regional population of F. orientalis.The geographically more plausible Scenario 4, wherein the second parental population was that in the Caucasus, received higher support (posterior probability of 0.69), but again the probability of Scenario 3 (hybridization with the beech population in Asia Minor) was also relatively high (0.25).The timing of hybridization in Scenario 4 was quite late, 1,440 generations, meaning that the event would have occurred at the beginning of the Holocene, but since the upper limit of the 95% confidence interval was high, 6,400 generations, there was much uncertainty in this estimate.The genetic contribution of F. sylvatica s. str. to the gene pools of Crimean beech in Scenario 4 was 30%.
In the last set of simulations, focusing on the origin of Balkan beech, the odds in favor of the best scenario of successive splits were from 16:1 to 74:1 compared to scenarios involving hybridization, so only this scenario was taken into consideration.A hybrid origin of Balkan beech was thus ultimately not supported, and it seemed to rather represent Tab. 1 Allelic frequencies in subspecies of Fagus sylvatica s. l. across all regional populations.Alleles are labeled by their relative migration rates.

F. orientalis
Per

Suitability of the stepwise mutation model for allozymes
The choice of an appropriate mutation model is an essential issue in model-based phylogeny reconstruction.For protein markers such as allozymes, the infinite allele model of Kimura and Crow [35] is mostly considered suitable.Under this model, each mutation is expected to produce a new allele, while all alleles are considered equally different from each other.However, we used a different model from this, which assumed stepwise mutations, i.e., alleles are ranked by some property of themselves or their products and a mutation can only change a particular allelic state into either of the two possible adjacent states.
Stepwise models of mutations are generally preferred in studies employing repetitive sequences, such as minisatellites or microsatellites, among which mutations mostly consist in the addition or deletion of a short sequence or motif and alleles thus differ in their total sequence length (i.e., number of repetitions).
However, the stepwise mutation model was originally developed for electrophoretically detectable variants differing in electric charge [36,37].Brown et al. [38] found good support for the occurrence of stepwise manner of mutations in the available experimental allozyme data.In our case, the relative migration rates of protein fractions produced by different isozyme alleles differed at quite regular intervals.Even though the migration rate is not necessarily a linear function of charge and depends also on the shape and size of a protein molecule, such regularity is a strong indication of stepwise changes in the net charge [39].Moreover, the frequency distributions of protein fractions are unimodal for most allozyme loci, which also suggests that mutations mostly change protein charge by one unit and backward mutations may occur.Taking this into account, we consider the mutation model used in our simulations to be untraditional but adequate.

Phylogeny of beech in western Eurasia
Formulating a priori hypotheses about the phylogeny of F. sylvatica s. l. is not easy: molecular studies have mostly covered only a part of the range of F. orientalis [40,41], and those based on more representative sampling did not yield a clear picture of the relationships among genetic lineages within the species (e.g., the ITS study of Denk et al. [17]).
In earlier studies [22,23] based on the material examined in this one, a much more detailed view of beech phylogeny in terms of the number (24) of small regional populations sampled was attained.However, such a detailed subdivision would result in an enormous number of potential evolutionary scenarios with equal or comparable posterior probabilities.Therefore, the small regions were merged into four geographic areas in the present study: Europe (i.e., F. sylvatica s. str.), Asia Minor (including the Ponthic range, Amanus Mountains, and the southeastern tip of the Balkan Peninsula), the Caucasus (including the Greater and the Lesser Caucasus and the Colchis), and the Alborz.
The results of ABC simulations in this study did not differ substantially from the conclusions of earlier studies derived from distance-based (neighbor-joining tree, reticulogram) and model-based (Bayesian) analyses; however, they were not identical to these either.The Structure analysis [24] yielded two clusters within F. sylvatica s. l.: one corresponding to F. sylvatica s. str., and the other (a more homogeneous one) to F. orientalis, with the gene pools of "F.moesiaca" and "F.taurica" each containing a mixture of samples from both of these clusters [22].In terms of the evolutionary scenarios tested herein by ABC, this corresponds to an early split of the F. sylvatica s. str.clade from some of the regional populations of F. orientalis, and is also in accordance with morphology-based relationships previously determined among taxa and regional populations [17].ABC support for such a scenario in this study was not negligible (posterior probability of 0.22), but was still much lower than the support for the scenario in which the easternmost beech population in Iran was the ancestral one.The best-supported scenario placed the divergence of regional populations in the Early Pleistocene, with F. sylvatica s. str.being the latest-diverging lineage.This timing corresponds with the history of beech in western Eurasia as reflected in the fossil record [18].This would mean that barriers to gene flow (probably due to range fragmentation associated with climate fluctuations) appeared earlier within the Asian range of beech than those between Asia Minor and Europe arose.Fossils reveal a permanent presence of the genus Fagus in Europe and Western Asia since at least the Oligocene [42], and a wide distribution during the Late Tertiary [16,18,43,44].These fossils have been described using many different names, but those that were suggested to be linked with F. sylvatica s. l. can probably all be subsumed under the name F. haidingeri Kováts, which was documented in the Miocene and Pliocene of Europe and Georgia [18].Although the distribution of F. sylvatica-like fossils is not completely identical with the current range of the extant species, their number and geographic coverage suggest that the range of beech in the Pliocene was quite continuous.Fragmentation, which began with the onset of Pleistocene climatic fluctuations, may have disrupted gene flow among regional "islands" and promoted genetic differentiation, while later reestablished contacts during Pleistocene warm phases [16] were too short to homogenize the gene pools again.Such a course of events is compatible with both scenarios tested for the F. sylvatica s. l. complex in this study.Alternatively, beech may have effectively disappeared from Europe at the end of the Pliocene, with a subsequent recolonization of Europe from Asia Minor, while remnant populations surviving in Europe until the Pleistocene (if any) were overlaid by this recolonization event.
In the case of Crimean beech, ABC simulations confirmed a hybrid origin of "F.taurica", with F. sylvatica s. str.as one of the parental lineages and the Caucasus/Colchis regional population of F. orientalis as the second one.Alternatively, beech in Asia Minor may have participated in the hybridization, as the posterior probability of this scenario (0.25) is too large for it to be ignored.The timing of hybridization (mode of 144 ky BP) was placed at the end of the Saalian glacial period, but the confidence interval is broad enough to include the whole Eemian interglacial (130 to 114 ky BP), and is thus not unrealistic.Beech was certainly present in Crimea during the Pleistocene [45,46], and it was also present (although not dominant) in the forests on the other side of the current Kerch Strait [47].A contact between the Crimean and the Caucasian population is thus easy to imagine.The more problematic aspect is a contact with Balkan populations of F. sylvatica s. str.Fagus pollen was discovered in Late Pleistocene sediments along the northern Black Sea coast [48], but a continuous presence of beech in this area is not sufficiently documented.A certain affinity of Moldovan beech populations to the Crimean ones [23,49] may support the existence of such contact, but this would require a permanent presence of beech in southern Romania throughout the Weichselian glacial period, which cannot be excluded but to date has not been corroborated by the fossil record (see [50]).No final conclusions can thus be made concerning the plausibility of a hybrid origin of Crimean beech.
In contrast, there is no support for a hybrid origin of Balkan beech.The simple divergence scenario without reticulations received the best support (posterior probability of 0.92), and the odds against the next-best scenario were 16:1.This scenario again suggests an early divergence of F. sylvatica s. str.from the Turkish lineage of F. orientalis occurring during the early Pleistocene (the mode of 817 ky BP falls into the Cromerian interglacial).On the other hand, the divergence of both the Italian and Central-European lineages from the Balkan one was suggested to occur at the beginning of the last (Weichselian) glacial.In light of the Quaternary history of vegetation in Europe, such a course of events is plausible: the range of beech expanded and contracted along with glacial/interglacial cycles, with a maximum extension during the Holsteinian interglacial [51], and such fluctuations may have contributed to recurrent contacts among local populations and thus have maintained the integrity of gene pools.The expansion during the Eemian was limited to southern Europe, and the onset of the Weichselian glacial may have led to range fragmentation, interruption of gene flow among refugial areas, and, consequently, formation of the present-day genetic lineages.This view was also supported by the discovery of pollen, the chloroplast haplotype of which was identical to that of the Balkan lineage ("F.moesiaca"), in the Venice lagoon from the last pleniglacial [52]; apparently, the formation of the present-day lineages within F. sylvatica s. str.may have begun quite late.

Conclusions
The number of scenarios tested in this study was limited, but not primarily for technical reasons: we chose only those that make sense in terms of biogeography, and which were supported by fossil evidence.For instance, the splitting of the European lineage from the Caucasian one was not considered, as it would suggest the migration of beech into Europe along the northern Black Sea coast, and there is not enough fossil evidence to support such a scenario.Taking this limitation into account, this study partly corroborated earlier views on the phylogeny of beech in the western Eurasian area.ABC simulations confirmed that F. sylvatica subsp.orientalis is a paraphyletic subspecies, as it excludes F. s. subsp.sylvatica, which diverged from the Asia-Minor lineage.A hybrid origin was confirmed for one transitional taxon (Crimean "F.taurica"), but not for the other (Balkan "F.moesiaca").
The applicability of the stepwise mutation model to allozyme data is a crucial point in judging whether the above conclusions are realistic.Posterior model checking indicated that both the model and the choice of prior parameters were appropriate for these data in the present study.It seems that the mutation rates of allozymes fit well within the time frame of the processes under study herein.This does not imply that simulations using this model on longer temporal scales (e.g., phylogeny of the family or the genus as a whole) or shorter scales (e.g., postglacial migration) would be equally successful.Nevertheless, since a broad-scale application of allozyme analysis in the 1970s, plenty of data have been gathered in many plant taxa.Our study suggests that a reanalysis of these data, employing new methodological approaches such as ABC and revision of earlier views of phylogeny, may be meaningful.
Posterior estimates of the parameters of the demographic inference based on the approximate Bayesian computation for the best-supported scenarios in different constellations of taxa and regional populations of Fagus sylvatica s. l.
* Parameters of the alternative scenario.t -timing of the event (number of generations before present); µ -mutation rate.