Heterogeneous patterns of genetic variation at nuclear genes and quantitative traits in a Scots pine provenance trial

We studied the genetic variation in a set of nuclear genes analyzed from 16 populations of Scots pine derived from a 50-year-old provenance trial in Poland. At the same set of loci, the pattern of genetic variation was compared to several reference populations from a latitudinal gradient in Northern and Central Europe. Similar levels of nucleotide diversity were observed between the defined groups of Polish populations representing three climatic regions (π total = 0.0040–0.0051) in comparison with the reference samples (π total = 0.0054–0.0058). Polish populations showed minor but heterogeneous patterns of genetic variation between regional groups ( F ST up to 6%), which were caused by differentiation at specific loci. When outlier loci were excluded from between group comparisons, there were no differences between the Polish populations. Loci related to glycosyltransferase and laccase were identified as outliers, and were correlated with phenotypic differentiation using mixed-linear models. Moreover, these genes were also found as being potentially under selection across the Scots pine distribution range as the patterns of nucleotide variation correlated with latitude and altitude of the maternal stands. The provenance trial measurements have characterized a set of growth and developmental traits over 50 years and forms a suitable experimental system for detailed genetic studies.


Introduction
Provenance trial experiments, in which samples from distinct geographical locations are grown under similar climatic conditions, provide useful experimental designs for studying patterns of phenotypic and genetic variation within forest trees.Genetic variation within species is influenced by population history and evolutionary processes that drive adaptation [1].Natural selection changes allele frequencies through various mechanisms that can lead to deviations in genetic diversity or differentiation of populations at specific regions in comparison with background variation [2].Identification of genomic regions and genes under selection is important for a better understanding of the variations in phenotypic and adaptive traits from populations in different environments.Provenance trial experiments provide useful experimental settings for testing neutral and adaptive genetic variation of populations grown under uniform environmental conditions.
The great adaptive potential of Scots pine (Pinus sylvestris L.) enables the species to grow in a wide range of natural environments.Differentiation has been observed in morphological, physiological, and ecological traits across the species geographical distribution.At the same time, a large phenotypic differentiation has been associated with a generally low genetic diversity at neutral DNA markers across the Scots pine continuous European range [3,4].Only some refugial populations, which have been reservoirs of genetic variation since the last Ice Age and are found in areas of the Balkans, Iberian Peninsula, and isolated stands of the Apennine Peninsula [5,6], show slightly higher levels of heterozygosity and different haplotype diversity patterns compared to more northern areas of the species range [7,8].Scots pine colonization of Europe could have potentially generated clinal variation patterns of allele frequencies [9,10].However, for wind-pollinated forest tree species, gene flow has high homogenizing effects on the allele frequency distribution over large geographical areas [11,12], with species showing generally low genetic structures [3,7,13].Gene flow is another important factor that stabilizes the level of genetic diversity within populations [14].Additionally, the patterns of genetic diversity in Scots pine could have potentially been affected by the trade of reforestation material from European forests in the nineteenth and twentieth centuries [6,15].
Due to the high ecological and economic importance of Scots pine, different populations were extensively studied for breeding value (e.g., [16][17][18][19]) and for determining the influence of forest management on genetic diversity [20][21][22][23].Patterns of high phenotypic variation between populations, which were usually accompanied by low genetic differentiation at neutral markers, indicate the role of some genomic regions in the development of adaptive traits.Most characters evaluated in provenance trials and seed orchard experiments are complex quantitative traits, such as biomass production, wood quality, and biotic and abiotic stress responses [24].Recently, genetic studies were used for the identification of the associations between genomic regions and the individual loci underlying those traits [25].In conifers, studies have focused on the associations of quantitative traits loci (QTL) [26][27][28][29][30][31][32][33] and the identification of variation in genomic regions that were potentially related to the development of forest tree phenotypic differentiation [32,[34][35][36].However, even if some associations related to temperature gradients, photoperiod, or water availability were found [26][27][28][29][30][31][32][33], the variation patterns at individual genes could not always be validated in populations from different locations [37,38].So far, studies of associations in forest trees have identified some SNPs in genes related to the timing of the bud set that showed latitudinal clines of allele frequencies (e.g., [39]).Some variations in candidate genes were associated with cold hardiness [40], cessation of growth [41], and tree height [42].Some evidence for selection in Scots pine has also been found in several gene fragments (e.g., [1,2]).
In the current study, genetic variation was analyzed in a set of nuclear gene loci from Scots pine populations growing in different climatic zones in Poland, which were used in a provenance trial experiment.The trial characterized phenotypical traits for growth performance and wood quality and showed significant variation between populations [43].The differentiation of the quantitative traits was accompanied by the presence of a relatively homogenous structure in these populations at a set of neutral microsatellite loci [44,45].Using this experimental design, we assessed whether gene variability patterns are similar to patterns identified at neutral loci and if any genetic variation was a departure from neutrality.Furthermore, we used the data from nucleotide diversity analysis and phenotypic differentiation of the studied pine populations to look for correlations between genetic variation and in the differentiation of selected phenotypic traits.We compared the patterns of the genetic variation of Polish populations to other population across a broad latitudinal gradient in Europe to determine if some loci showed deviations from neutrality in the provenance trial and the reference populations.

Study locations, DNA extraction and amplification
Sixteen Scots pine populations derived from a species provenance trial from an experimental plot in the Carpathian Mountains were used in the study.The provenance trial was established in 1966 with 1-year-old seedlings derived from seeds collected from natural populations in three climatic zones within the Polish distribution range of Scots pine (Tab. 1, Fig. 1).The climatic zones of the parental populations were defined using the length of the vegetation period and meteorological observations from 1931 to 1960 [43,46].The detailed characteristics of the experimental plot and seed sources were presented in an earlier study [43].
Samples for genetic analyses were collected from randomly selected individuals from each population.In total, 192 samples from the provenance trial were analyzed at the molecular level and included 12 different trees from each population (Tab.1).Nucleotide and haplotype variations of the Polish pine populations were determined for phenotypically diverse groups of samples within the three climatic zones.This procedure was also used to compare similar number of individuals in each group.
The reference populations used for comparing genetic variation of candidate genes included stands from Northern and Central Europe (Tab.1, Fig. 1).Genomic DNA of the provenance trial was extracted from foliage using DNeasy Plant Mini Kit (Qiagen).Fifty-five reference samples sequenced from megagametophytes, a haploid nutritive tissue that surrounds the embryo in a mature seed, were collected from 10 trees growing in most of the populations (Tab.1).The selected nuclear genes, which were identified from Scots pine expression studies, are considered potentially important in the species adaptive variation and phenotypic differentiation.Eight gene fragments used for sequencing included loci related to cellular metabolism, transport, signal transduction, and transcription regulation (Tab.S1).PCR-amplification was performed with Thermo MBS thermal cyclers and used 15-µL samples containing about 15 ng of haploid template DNA, 10 µM of dNTP, 0.2 µM of forward primer, 0.2 µM of reverse primer, 0.15 U Taq DNA polymerase, 1× BSA, 1.5 µM of MgCl 2 , and 1× PCR buffer (BioLabs, New England, USA).Standard amplification procedures were used with initial denaturation at 94°C for 3 min, followed by 35 cycles of 30-s denaturation at 94°C, 30-s annealing at 60°C, 90-s extension at 72°C, and a final 5-min extension at 72°C.PCR fragments were purified using ExoI-Sap (Exonuclease I, Shrimp Alkaline Phosphatase) enzymatic treatment.About 20 ng of the PCR product was used as a template in 10-μL sequencing reactions with the Big Dye Terminator DNA Sequencing Kit (Applied Biosystems, Carlsbad, CA, USA).Analyses were conducted at Genomed (Poland, Warsaw).CodonCode Aligner software ver.3.7.1 (CodonCode, Dedham, MA, USA) was used for editing the chromatograms and for the visual inspection of all detected and aligned polymorphic sites.When nucleotide and haplotype variation were evaluated, the haplotypic sequence phase was determined for samples extracted from diploid tissue using PHASE haplotype reconstruction option as implemented in DnaSP ver. 5 (http://www.ub.edu/dnasp/) [47].The haplotypes at each locus were compared to the haplotypes reported for the Scots pine reference samples that were collected within the European distribution of the species (Tab.S1).The reference sample haplotypic phases were known as those samples that were directly sequenced from megagametophytes that were genetically equivalent to the haploid progeny with a genotype identical to the maternal gamete.

Nucleotide and haplotype polymorphisms
The patterns of nucleotide polymorphism within the loci were analyzed in selected groups from Poland, defined on the basis of climatic zones and phenotypic differentiation of populations [43] and compared to reference geographic locations from Northern and Central Europe (Tab.1).Nucleotide diversity was measured as the average number of nucleotide differences per site (π) between two sequences [48].The number of haplotypes (N) and haplotype diversity (H d ) were computed for each gene using DnaSP ver. 5 [47].Deviations from the frequency distribution spectrum expected under the standard neutral model of evolution were assessed using a frequency spectrum test and coalescence-based approaches [49,50].The distributions of Tajima's and the Fu and Li's D test statistics were investigated for each loci and selected groups of populations.Significance levels of these tests were determined by carrying out 10,000 coalescent simulations.

Outlier detection and tests for population differentiation
The significance of genetic differentiation at the loci measured as the Wright's fixation index, F ST [51], was evaluated with 1,000 permutations of the samples between populations and regional groups using ARLEQUIN ver.3.5 [52].The full SNP dataset was used to test for loci under selection with the hierarchical analysis of Excoffier et al. [52] in ARLEQUIN ver.3.5.Simulations estimated the null distribution and confidence intervals around the observed values, which allowed identification of outliers among locus-specific F ST values.Each simulated group consisted of 100 subpopulations, and 20,000 replicates of the coalescent were used to identify the expected distribution of F ST .The significance thresholds of the F ST values were set at 95% and 99% of the F ST values [52].
The hierarchical distribution of multiloci genetic variation among populations and tested groups was estimated using an analysis of molecular variance (AMOVA) in ARLEQUIN ver.3.5.The standard AMOVA computations for haplotypic data was used and the significance of population genetic structure was tested using 10,000 permutations.The measurements were performed for SNPs identified in eight genes, and in genes limited to where no consistent signature of selection had been detected.Clustering analysis to examine the relationships between individual Polish populations and reference locations of the species was conducted using BAPS 6.0 software [53].The genetic mixture analysis, which was based on all detected polymorphic sites, had ten independent runs conducted for each K  to estimate the number of clusters for the combined samples.

Correlations between SNP, phenotype, and geographic data
We analyzed and compared the patterns of nucleotide sequence variation to phenotypic and neutral variation within the same plant material.Single locus based tests were performed for selected qualitative and quantity traits that significantly differentiated Polish Scots pine populations.Growth characteristics were available from previous research [43] and included: stand volume (Volume, m 3 × ha −1 ) that was calculated from mean tree volume in a population (m 3 ) divided by the area of each population at the experimental site (0.1275 ha); diameter at breast height (Diameter, cm); and diameter of approximately 50 trees per population which had the largest diameters after 47 years of growth in the experiment (Diameter_50 select).Stand volume reflects a relative measure of productivity that could be closely related to local adaptation or diameter of selected trees; possibly indicating how natural selection affects the growth of the pine populations used for genetic correlations.Furthermore, the quality traits stem straightness (SS) and crown width (CW) were included in the analysis.Those traits were scored with a 5-step scale, where: 1 was given for very crooked stems, loss of leader shoots (SS) and very wide crowns (CW), and 5 was given for very straight stems (SS) and very narrow crowns (CW) (for more details, see Tab.S3 in [43]).Differences in the traits between populations were determined using an analysis of covariance (ANCOVA).Detailed descriptions of measured and scored traits were presented in an earlier study [43].
Eight genes were used in a correlation analysis of phenotype and SNP variation and checked using a mixed linear model [54][55][56][57].This model was fitted to represent the phenotypic traits and SNP variation that was determined as mean nucleotide and/ or haplotype variation at the loci.Because phenotypic traits were measured for all trees growing in the experimental trial (about 2,300 individuals), estimated breeding values based upon a covariance analysis (ANCOVA) [43] were used as the observations for phenotypes, taking into account the genetic correlation analyses.This approach was also used to verify the possible dependence of SNP variation to the geographic characteristics latitude, longitude, and altitude of the original stands.The provenance was defined by minimal or lack of population structure, as revealed from the analysis of neutral markers [44,45].Therefore, the signatures of selection could be effectively contrasted with background genetic variation.All correlations included Bonferroni corrections for multiple testing analyses and were conducted with the statistical software STATISTICA [58].

Genetic variation in Polish populations
Over 3,000 nucleotide sites were sequenced from 247 individuals (Tab.2), providing a set of 200 single nucleotide polymorphic sites.Similar levels of nucleotide polymorphisms were observed between defined groups of Polish Scots pine populations (π total = 0.0040-0.0051)and were compared to the reference samples from Europe (π total = 0.0054-0.0058)(Tab.2).There were large nucleotide variation differences between the loci with the highest polymorphisms at Pr1_26 (π total = ~0.02;Tab.S2) and the lowest polymorphism at Pr4_12 (π total = ~0.0003;Tab.S2).Relative to neutral expectations, there were excesses of low frequency variants in almost all regional defined population groups as measured by the negative values of Tajima's, and the Fu and Li's, D statistics (Tab.2).A significant positive Tajima's D statistic, indicating an intermediate alleles frequency, was found at gene Pr1_19 (populations PL_N_NE_3; Tab.S2).Simultaneously, numerous low frequency variants were found at three genes including Pr4_4 (populations EU_N), Pr4_12 (PL_C_2, PL_N_NE_3, EU_N, and EU_C), and Pr_4_19 (PL_C_1 and PL_C_2) (Tab.S2).Haplotype diversity was similar across groups of populations (0.545-0.613;Tab. 2).

Variation at individual loci and populations
In pairwise comparisons, the six groups of Polish pine populations differed significantly at five loci, including Pr1_26, Pr4_4, Pr4_12, Pr4_41, and Pr4_19 (Tab.S3).Pines from central Poland (PL_C_1 and PL_C_2) were the most differentiated and differed by at least one locus from all other groups of populations.Outlier SNPs that had high allele frequency differences between groups of Polish Scots pine populations were found at two loci (Pr4_19 and Pr4_41) (Tab.S4).Polish populations differed at six loci from the reference populations in Europe (Pr1_19, Pr1_26, Pr4_4, Pr4_12, Pr4_19, and Pr4_41) (Tab.S3).At these loci (except Pr4_4 and Pr4_12), there were 13 SNPs that showed significant differences in frequency between the Polish and reference populations (Tab.S4).
For all polymorphic sites, genetic differentiation between selected groups of Polish populations was up to 6% (Tab.3).Higher genetic variation was found between Polish Scots pine populations and reference groups from Northern Europe (15-33%) (Tab.3).However, when loci considered as outliers due to patterns of genetic variation (locus Pr1_19, Pr1_26, Pr4_19, and Pr4_41) were excluded from between group comparisons, there was no differentiation between Polish populations in comparison with the reference samples (Tab.3).Similarly, in a hierarchical AMOVA, nearly 10% of the variation was found at the between population level (Tab.S5).However, when loci showing signs of selection were excluded from the analyses, most of the genetic variation (nearly 100%) was identified within populations (data not shown).No evidence of population structure was found in the BAPS analyses (data not shown).
Tab. 2 Summary statistics of nucleotide and haplotype variation and frequency distribution across eight nuclear genes in tested groups of Polish Scots pine populations and reference stands in Europe.Tab. 3 FST for all polymorphic sites combined across eight genes (below the diagonal), and across groups of nuclear genes where no consistent signature of selection was detected (Pr1_12, Pr4_4, Pr4_5, and Pr4_12; above diagonal, grey distinction) between the tested groups of Polish and reference Scots pine populations.

Correlations between genetic diversity and quantitative traits and geographical location of pine populations from Poland
A statistically significant positive correlation was found between Diameter_50 select.and nucleotide diversity at gene Pr1_19 (r = 0.565, p ≤ 0.05; Tab.S6).Crown width was negatively correlated with haplotype diversity at gene Pr4_19 (r = −0.515,p ≤ 0.05; Tab.S6).Across the tested Polish pine populations, significant correlations were found between the geographic location and the diversity of the three genes Pr1_19, Pr4_19, and Pr4_4 (Tab.S6).Positive correlations were identified with population latitude and the nucleotide and haplotype diversity at Pr1_19 gene (r = 0.526, r = 0.425, respectively, p ≤ 0.05; Tab.S6), which remained significant after a Bonferroni correction for multiple testing (p = 0.007; Fig. 2).Some positive correlations were observed between stand altitude and the nucleotide and haplotype diversity of gene Pr4_19 (r = 0.626 and r = 0.490, respectively, p ≤ 0.05; Tab.S6).A negative correlation was found between altitude and the haplotype diversity of gene Pr4_4 (r = −0.576,p ≤ 0.05; Tab.S6).However, these positive and negative correlations were not significant after Bonferroni adjustments.

Nucleotide polymorphisms
In this study, we analyzed genetic differentiation within and among 16 Scots pine populations that were representative of the species distribution range and three climatic zones in Poland.We studied nucleotide polymorphisms at nuclear loci to investigate patterns of genetic variation between the Polish populations and reference samples from Northern and Central Europe.Our data indicated no differences in the nucleotide diversity at the studied loci between tested groups of Polish populations.Overall, we observed very similar haplotype and SNP frequencies among Polish populations from different climatic zones.The Polish populations also showed very similar levels of nucleotide polymorphism to the reference Scots pine populations from Northern and Central Europe, which were determined from 16 nuclear loci (π = 0.0052 [59]) and 13 cold-related genes (π = 0.006 [1]).Similar low levels of nucleotide diversity and average levels of haplotype diversity were reported for other conifers, including Picea abies [60] and Pseudotsuga menziesii [40].

Population structure
The structural homogeneity of the Polish populations, which were based on reference loci, follows patterns expected for large populations exposed to efficient gene flow [61].Furthermore, the populations from the provenance trial showed an overall excess of low frequency mutations, as shown by the negative multiloci Tajima's D statistics.The pattern of genetic variation observed in the Polish populations and in the reference locations from Northern and Central Europe, is expected for recently expanded populations [61].There were no clear signatures of selection for population structure in the European Scot pine distribution at the set of analyzed loci.Low genetic differentiation between populations (~2%) in this part of the Scots pine distribution was previously found as microsatellite [44,62] and nuclear sequence variations [61].Gene flow by pollen carried over long geographical distances [21] has a general homogenizing effect on genetic variation in outcrossing forest tree species, which we observed as a set of neutrally evolving loci.Furthermore, forest trees species including Scots pine, have a stable genetic structure that preserves an appropriate level of population genetic variability and adaptability [14].
Patterns of genetic variation from the analyzed populations were influenced by past demographic factors [59] and also by anthropogenic factors related to historical seed transfer and forest management activities.However, the seeds that were used to establish the experimental trial in 1966 in the Carpathian Mountains were collected from old trees (aged from 195 to 108 years; 130-year old on average).Forest management in Poland would recognize such trees as of native origin, because they would predate the period of an intensification of the seed trade, which occurred between 1860 and 1910 [63].As the trial represents half-sib progeny from natural stands, cross-pollination by maladapted foreign breeding populations cannot be completely excluded.However, even though foreign alleles possibly exist at some loci, it should not significantly bias the average estimates obtained at the population level.Thus, the trial is considered representative of the natural distribution of Scots pine in Poland, even though anthropogenic influences on the genetic variability patterns of some populations cannot be completely excluded due to the complex history of management in Europe [6,15].

Signatures of selection and genetic correlations
Considering the similar genetic backgrounds of the populations reported in earlier studies, the outlier patterns of differentiation at some loci maybe due to non-neutral processes.We identified some genes where the between-population differentiation was correlated with phenotypic variation of the populations.In our dataset, the most consistent patterns of significant population differentiation based on haplotype structure and allele frequencies were found between selected Polish groups from central, eastern and northern locations (Tab.S2 and Tab.S4).Departures from neutrality were observed at the Pr1_19 locus in populations 4-Ruciane and 5-Rozpuda from northeast Poland.The excellent breeding performance of pines from northeast Poland have been reported based on provenance trials [64,65].Strong outlier patterns of variation for both haplotype structure and allele frequencies were found at four loci (Pr1_19, Pr1_26, Pr4_19, and Pr4_41) in pairwise comparisons between the Polish populations and reference Northern European stands.Despite the similar genetic background in this part of the species distribution at neutral markers [44], we identified several loci and SNPs, where allele frequencies significantly differed between tested regions (Tab.S6).When those loci were excluded from the between group comparisons, there were no differentiation between Polish and North European samples (Tab.3).These results suggest that natural selection may have shaped the patterns of genetic variation in these genes, resulting in departures from neutral expectations.
Our analyses of Polish Scots pine populations showed significant correlations between diameter of selected trees (Diameter_50 select.)and crown width and polymorphisms at the Pr1_19 and Pr4_19 genes.Additionally, some trends of correlations were identified between geographic coordinates of populations and diversity of SNPs.The correlation between latitude and nucleotide diversity of Pr1_19 proved to be statistically significant after a Bonferroni correction.However, it is important to note that the Bonferroni correction is very conservative and therefore, a less powerful test for identifying correlations between SNP variations and quantitative traits that are shaped by small effects from many genes.Gene Pr1_19 is involved in glycosyltransferase functions related to biosynthesis of polysaccharides and glycoproteins in the plant cell wall [66], which have crucial roles in plant growth and responses to biotic and abiotic stresses [67][68][69][70].Alternatively, gene Pr4_19 encodes enzymatic proteins from the laccases group involved in lignification in conifers [67].
Considering the weak divergence between geographical regions, which was related to neutral loci with high differentiation of quantitative traits of adaptive importance, it is possible that differences in the frequency and distribution of polymorphisms at some loci may be due to diversifying selection across the Scots pine range.The capability to detect selection in genomic regions depends on the time since selection and the number of loci involved [36].The large conifer genome size and complex interactions of many traits are the primary reasons for the identification of only a few candidate loci associated with phenotypic traits.A few candidate genes, including dehydrin [1] and the ft/tfl1-like and pseudo response regulator 1 genes [2], were identified in Scots pine as potentially being under selection.In loblolly pine (Pinus taeda), four cell wall genes explained about 3% of the variation in wood traits [55].In Monterey pine (Pinus radiata D. Don), nine genes were associated with wood quality [68].In maritime pine (Pinus pinaster), González-Martínez et al. [55] reported genetic variation and linked mutations underlying phenotypic variability.Our study provides a set of new genes distinct in their variations in comparison with neutral markers that are especially interesting for further investigations.

Conclusions
Our study showed close genetic relationships between populations from a provenance trial.Across a similar genetic background at neutral loci, we identified some outlier patterns of nucleotide diversity in the European species distribution.Additionally, nucleotide polymorphisms at some of the genes studied were significantly correlated with phenotypic variation among populations from different environments.Considering that the provenance trial examined here has been well characterized for growth and developmental traits during nearly 50 years of measurements [43], it forms a suitable experimental system for detailed comparative-association genetic studies using available genomic resources [71].

Tab. 1
Location and characteristics of maternal trees from the provenance trial that provided seeds, and the geographic locations of the reference populations.divided into groups from central Poland (PL_C_1 and 2), north and northeastern Poland (PL_N_NE_1, 2, and 3), southeastern Poland (PL_S), Northern Europe (EU_N), and Central Europe (EU_C) (see "Material and methods" for more details).* Temporary seed production stand.There is no information about the detailed characteristics of trees.Due to the nature of the stand, it should be assumed that it was native, and the collection included trees about 100-year old, characterized by good quality and abundance of seeds.

Fig. 1
Fig. 1 Location of the Polish and reference Scots pine populations (geographic details are provided in Tab. 1) within the species European distribution range (grey).Numbers in circles are the specific populations and numbers adjacent to the circles are climatic subgroups at each population.