Sequence diversity of two chloroplast genes : rps 4 and tRNAGly ( UCC ) , in the liverwort Marchantia polymorpha , an emerging plant model system

The primary purpose of this study is to evaluate the sequence variation for two regions of chloroplast DNA in a collection of 27 taxonomically well-annotated clonal lines of Marchantia polymorpha sensu lato derived from European populations. We attempted to develop molecular markers so as to identify three taxa usually recognized as subspecies. We sequenced two regions: the rps4 gene along with the rps4-trnT intergenic spacer and an intron of the tRNAGly (UCC) gene. Samples of Marchantia paleacea ssp. diptera from Japan were used for comparative purposes. Three haplotypes (MA, MB, and MC) were identified for the species, and almost all sequence divergence between subspecies was found to occur at the level of 0.0023–0.0032 substitutions per site. The sequence divergence between M. polymorpha and M. paleacea was tenfold greater (0.0331–0.0340). We did not detect any differences between M. paleacea and homologous sequences from the reference chloroplast genome of M. polymorpha obtained from the GeneBank (NC_001319). It was confirmed that the cell suspension line A-18 used for the sequencing of the full chloroplast genome in 1986 was incorrectly taxonomically annotated.


Introduction
The liverwort Marchantia polymorpha L. has recently become one of the most important models for plant biology research and evolutionary genomics due to its relatively simple genome and unique phylogenetic position as a member of the early land plant lineage.Genomic research in Marchantia began in the late 1980s, when the full chloroplast, and later, mitochondrial genomes, were sequenced [1,2].These sequences are frequently used as a reference in many comparative studies.After some delays, in comparison with other model plants (e.g., Physcomitrella patens among Bryophyta), the interest in M. polymorpha has recently been revived following the development of various molecular tools and techniques (see [3][4][5] for recent reviews).Furthermore, several new nuclear genome sequencing projects have recently begun and the first genome of M. polymorpha subsp.ruderalis was published this year [6].
Despite the importance of this liverwort in evolutionary studies, little is known about its genetic diversity.Since the publication of the original description, three taxa of various taxonomic ranks have been recognized within M. polymorpha sensu lato.After extensive revision of the genus and detailed nomenclatural studies, Bischler-Causse and Boisselier-Dubayle [7] proposed three subspecies, namely: M. polymorpha subsp.montivagans, M. polymorpha subsp.polymorpha, and M. polymorpha subsp.ruderalis in place of the three species: M. alpestris, M. aquatica, and M. polymorpha [8] This new taxonomic solution -a single species comprising three subspecies -has been generally accepted (see, e.g., [9][10][11][12]).However, some authors still argue that the frequent sympatric distribution of at least two of these subspecies in Europe indicates species-level divergence [13,14].
Several morphological characters, together with habitat preference, may be used to distinguish between these three taxa [5,9].Certain diagnostic isozyme variants, the RFLP profile of 18s rDNA, and specific RAPD markers have also been discovered [15].However, these early molecular markers are difficult to apply for identification purposes in some samples, including herbarium specimens and some laboratory culture lines.
The primary purpose of this work was to evaluate the suitability of two commonly studied chloroplast DNA regions -the rps4 gene and an intron of the tRNA Gly (UCC) gene in distinguishing between three taxa of M. polymorpha sensu lato.This study delivers markers for the reliable identification of various samples, including laboratory culture lines.

Plant material
Our analyses were of 23 carefully selected clones of M. polymorpha from European populations, representing all three subspecies, and a single clone (MC-57) from Japan.In addition, three samples of M. paleacea ssp.diptera from Japan were sequenced and compared against the reference sequence of the M. polymorpha chloroplast genome (GeneBank accession NC_001319).The selected clones were obtained from a phytotron collection maintained in Poznań since 2007.Samples were kept in 0.5-L closed containers (punched to allow ventilation and drainage) on a sterilized mineral substrate at a constant temperature of 15°C and 16-h daylight at 50 μmol photon m −2 s −1 .A list of samples together with details of the geographical location and the GenBank accession numbers are given in Tab. 1. Information on vouchers deposited in POZW can be found as part of GeneBank record.

Subspecies identification
We used certain morphological characters of the thalli (as described by Long [9]) and the electrophoretic pattern of nonspecific esterases (E.C. 3.1.1.-)for the identification of subspecies, in accordance with the work of Boisselier-Dubayle and Bischler [16].Isozyme profiles were used in this paper as a part of the formal taxonomic description for each of the three subspecies.Enzymes were extracted by the homogenization of small apical fragments of young gametophytes in 100 μL of 0.1 M Tris-HCl buffer pH 7.5 with the addition of 10 mM MgCl 2 , 1 mM EDTA (Na 4 salt) 0.4% Triton X-100, and 14 mM 2-mercaptoethanol.The homogenate was filtered through a strip of Miracloth tissue and transferred onto small 4 × 11 mm Whatman 3ET wicks.Samples were subjected to horizontal starch gel electrophoresis (11.5% starch; Starch Art) in a Tris-citrate/ lithium boric acid buffer system in accordance with Odrzykoski and Szweykowski [17].Zymograms displayed a similar esterase isozyme profile to the Tris-glycine system, and acrylamide gels originally used by Boisselier-Dubayle and Bischler [16].The localization of alpha-esterases follows Method 1 of Manchenko [18].

Genomic DNA extraction
The total genomic DNA was extracted from living plants.A fragment of a single gametophyte (approximately 50 mg fresh weight) was placed in 2-mL tubes with two steel beads, and after freezing in liquid nitrogen, ground for 60 s in a tissue disruptor (Retch Ball Mill).They were then subjected to the CTAB-based procedure modified from Doyle and Doyle [19].Into each tube, 750 μL of extraction buffer (100 mM Tris-HCl, pH 8.0, 1.4 M NaCl, 20 mM EDTA, and 2% CTAB) were added, and after rapid vortexing, the contents incubated for 30 min at 65°C in a thermomixer (600 rpm).Subsequently, an equal volume of chloroform/isoamyl alcohol (24:1) mixture was added.Phase separation was conducted by centrifuging for 2 min at 14,000 g), and this step was repeated three times.The final aqueous phase was transferred to a fresh tube, and the DNA was precipitated with an equal volume of isopropanol.The DNA was again precipitated for 2 min at 4,000 g, and the pellet was washed twice with 1,000 μL 70% ethanol.The DNA was dissolved in 100 μL TE buffer (0.1 M Tris-HCl, 0.01 M EDTA, pH 7.5).For RNase digestion, 10 μg RNase A (10 mg/mL) was added and incubated for 30 min at 37°C.The concentration of DNA was determined spectrophotometrically (Nanodrop ND-1000) and analyzed for integrity using 0.8% agarose gel electrophoresis.The extracted DNA samples were stored at −20°C.

PCR amplification and sequencing
The PCR mixture contained the following components for 20-μL reactions: 10.8 μL Millipore H 2 O, 2.0 μL 10× buffer and 0.05 units HiFi Taq DNA polymerase (Novazyme, Poland), 200 mM of each dNTP, 0.25 mM of each primer, and 2.0 ng total DNA.PCR was initiated by denaturation at 94°C for 3 min, followed by 35 cycles: 1 min denaturation at 94°C, 1 min annealing at 54°C, and 5 min elongation at 72°C.The reactions ended with 5 min elongation at 72°C (Applied Biosystems Veriti Thermal Cycler).These parameters were used for both regions.For the amplification of the rps4 gene region, we used the primers RP1F (GCTATGTAGGCTTTTGGTC) and RP1R (CACTTGTAATGCGATGGTC), newly designed based on the reference sequence of M. polymorpha.Primers used for sequencing: RS2F (CTAAACGAATAC-GATACTGAG) and RS1R (TTTTGTAACATAAAGGAG).For the amplification and sequencing of the intron of the tRNA Gly (UCC) gene, we used two primers: TG_F (CGGGTACGGGAATCGAAC) and TG_R (GCG GGT ATA GTT TAG TGC) in accordance with Szweykowska-Kulińska et al. [20].Prior to sequencing, PCR products were purified using the Exonuclease I-Shrimp Alkaline Phosphatase enzymatic treatment.About 10 ng of PCR products were used as templates in 10-μL sequencing reactions with the BigDye Terminator v3.1 DNA Sequencing Kit (Applied Biosystems) on a 3130xl Genetic Analyzer (Applied Biosystems).

Sequence analysis
The CodonCode Aligner v. 5.0.1 (CodonCode Co., USA) was used for the visual inspection and editing of the chromatograms from the forward and reverse sequencing primers, sequence alignment, and detection of polymorphic sites.Regions of the chloroplast DNA corresponding to the chloroplast DNA reference sequences for M. polymorpha (NC-01319) were used as outgroups.The evolutionary distances were computed in MEGA6 [21] using the number of base differences method and the Tamura-3-parameter model of nucleotide substitution.The detailed phylogenetic analysis was not performed, but in order to visualize the results, a maximum likelihood (ML) tree was constructed for a matrix that combines both the investigated regions.

Electrophoretic phenotypes of esterase isozymes
All selected samples were classified into three groups (MA, MB, MC) based on the electrophoretic phenotypes of alpha-esterase.Of the 27 clones, Variant was present in five clones, and Variant 3, specific for M. polymorpha subsp.montivagans (group MA), in six clones (Fig. 1).

Sequence characteristics
In the first region (rps4), the obtained sequences included a small portion of intergenic spacer between tRNA Ser and the rps4 gene, the whole sequence of rps4 and rps4-tRNA Thr intergenic spacer.For further comparison, sequences were trimmed to include a complete sequence of the rps4 gene (609 bp long) and intergenic spacer rps4-trnT (228 bp) only and aligned to the corresponding region of the full chloroplast genome of M. polymorpha (NC_001319, positions 49,425-50,261).Within the rps4 gene, two polymorphic sites were identified in M. polymorpha complex: (i) T to C at position 98 commencing at the beginning of the gene in group MB only, and (ii) C to T at position 492 in group MC.Within rps4-tRNA Thr spacer, SNP polymorphism was detected at two sites: at position 55 (C to T) in a single sample (MC_22), and at position 167 (G to T) in five of 13 samples of group MC only.The entire 836-bp regions were highly differentiated between M. polymorpha and M. paleacea, with 16 substitutions and four single nucleotide indels (Tab.2).
The investigated fragment of an intron of the tRNA Gly (UGG) gene was slightly shorter in M. polymorpha than in M. paleacea.Three diagnostic substitutions were detected in M. polymorpha, characteristic for each group: A-G substitution at position 120 present in MC only, and at position 341 present in MB only.Unique for group MA is a C-T substitution at position 525.In a comparison between Marchantia polymorpha and M. paleacea, differences were detected at 23 positions within 593 bp of investigated sequence (Tab.3).In the case of both investigated regions, all samples of M. paleacea had an identical sequence to that of the reference sample.

Sequence divergence between M. polymorpha, M. paleacea, and the reference sample
In order to estimate the sequence divergence between the investigated samples, we calculated the number of base substitutions per site.We used a total of 1,297 positions, excluding indels.The number of substitutions between the subspecies of M. polymorpha varied from 0.0023 to 0.0032, the greatest being between M. polymorpha subsp.polymorpha (MB) and two other subspecies.The divergence between M. polymorpha sensu lato and M. paleacea was tenfold greater, between 0.0331 and 0.0340 (Tab.4).The maximum likelihood tree for the combination of all investigated sequences and corresponding sequences from the reference sample (NC_001319) showed three clades with a bootstrap of greater than 60%, which corresponded to infraspecific groups of M. polymorpha (Fig. 2).All samples of Marchantia paleacea were placed together with homologous sequences from the complete cpDNA genome of M. polymorpha (NC_001319).

Discussion
Two species of Marchantia (M.polymorpha L. and M. paleacea Bertol.), have played a significant role in our understanding of developmental biology and evolutionary studies [4,5,22].The first species was formally described in Europe, although it has an almost cosmopolitan geographic range.The original description distinguishes between three unnamed infraspecific taxa (α, β, γ), with later varieties being named: alpestris communis, aquatica, and domestica communis [3].Experiments involving the crossing of these varieties under greenhouse conditions revealed partial reproductive isolation, one of the reasons for their recognition as separate species by Burgeff [8].This solution was accepted in some liverwort floras [13], but today, these species (M.alpestris, M. aquatica, and M. polymorpha) are mainly considered to be infraspecific taxa (subspecies) following taxonomic revision of the genus by Bischler-Causse [23] and subsequent lectotypification [7].This taxonomic rank was adopted in a recent checklist of hornworts and liverworts [12].Some additional arguments to support species rank of these taxa have been recently summarized by Bowman et al. [24].
The known morphological markers for distinguishing between these taxa include some features of the thallus (width, color, branching pattern, presence or absence of a black median line on the thallus surface) and type (entire or toothed) of appendage margin [9].Owing to environmental plasticity, morphological markers alone can seldom be used for the identification of some specimens present in scientific collections (e.g., in vitro or cell suspension cultures).Additional markers include electrophoretic variants of some isozymes (nonspecific esterases), RAPD profile, and RFLP of 18s rDNA [15].Recently, several organellar and nuclear sequences have also become available in public sequence databases and have frequently been used in phylogenetic studies of the major taxonomic groups (e.g., [25][26][27]).
In this work, we have attempted to find diagnostic mutations for the infraspecific taxa of M. polymorpha sensu lato within two regions of chloroplast DNA.Our sample of plants was taxonomically annotated carefully using the original description, including the esterase isozyme profile [7,17] and selected to include both a broad geographical range and major types of environments (tundra, mountains, anthropogenic).The selected genes (rps4 and tRNA Gly ) are amongst the most frequently used in phylogenetic studies of bryophytes (e.g., [28][29][30]).
Three haplotypes were recognized (MA, MB, and MC) with only minor variations in MC, and these appear to be diagnostic for the three subspecies.In both regions, single mutations allow for the correct identification of infraspecific taxa within the investigated sample of plants.Much greater differences were detected between M. polymorpha sensu lato and samples of M. paleacea ssp.diptera from Japan.This second species belongs to subgenus Chlamidium (Corda) Bischl., the largest in genus Marchantia, with 21 accepted species [12].Comparisons of both investigated genes against corresponding regions of the reference chloroplast DNA genome obtained from the NCBI GeneBank (NC_001319.1)place this sequence with M. paleacea, not with M. polymorpha subspecies studied here.
This result confirms our hypothesis concerning the incorrect taxonomic annotation of a laboratory cell line (A18) used in the complete cpDNA sequencing project (see Kijak et al. [31]).This conclusion was recently validated by Villarreal et al. [27] in an extensive phylogenetic study of the class Marchantiopsida (79 species and 11 loci from Tab. 4 Estimates of evolutionary divergence (number of base substitutions per site and standard error above diagonal) between three subspecies of M. polymorpha sensu lato (MA, MB, and MC), M. paleacea subsp.diptera (MP), and the reference sequence (NC_001319).Calculations based on a combined matrix of the three investigated regions with 1,297 positions in the final dataset.It appears that the taxonomic rank proposed by Bischler-Cause and Boisselier-Dubayle [7] for the three taxa occurring within M. polymorpha is inadequate for explaining the degree of morphological, ecological, and molecular differentiation present, since the divergence time between the three subspecies of M. polymorpha was estimated by Villareal et al. [27] to be about 5 Ma (2)(3)(4)(5)(6)(7)(8)(9)(10)(11).The results of our study, by comparison, would appear to support the concept of three separate species [13,14].Further studies into genetic differentiation, however, are necessary.

MA
In this paper, we present the results of screening of cpDNA sequences for diagnostic mutations which can be used to identify these taxa.No other sequence regions have previously been tested on such a large scale, including short sequences proposed as "DNA barcodes" for land plants [32].Our results indicate that one of the investigated regions (tRNA Gly intron) could be used as such a marker since it contains a diagnostic mutation for each taxon and can be amplified and sequenced using a single pair of primers.The diagnostic value of this region requires verification based on a larger sample size spanning the whole geographical range of this scientifically important species.

Tab. 1 A
list of studied populations with locations and sequence GeneBank accession numbers.MA, MB, MC represent three haplotypes within Marchantia polymorpha sensu lato corresponding to subspecies: montivagans, polymorpha, and ruderalis.

Taxon Pop. No. Locality Province Country Longitude Latitude rps4 tRNA Gly
Maximum likelihood phylogenetic tree based on the combined set of all investigated cpDNA regions (a total of 1,315 positions in the final dataset) and the Tamura-3-parameter model of nucleotide substitutions (best BIC score in MEGA model test).The tree is drawn to scale, with branch lengths measured based on the number of substitutions per site.Bootstrap values (1,000 replicates) are shown next to the branches.both organellar genomes).It is probable that these studies were taken into account in preparing a new description of Marchantia chloroplast DNA reference genomes.The same sequence has two GenBank accession numbers (NC_001319 M. polymorpha and a new: X04465.1 M. paleacea).