Monophyly of Archaeplastida supergroup and relationships among its lineages in the light of phylogenetic and phylogenomic studies . Are we close to a consensus ?

One of the key evolutionary events on the scale of the biosphere was an endosymbiosis between a heterotrophic eukaryote and a cyanobacterium, resulting in a primary plastid. Such an organelle is characteristic of three eukaryotic lineages, glaucophytes, red algae and green plants. The three groups are usually united under the common name Archaeplastida or Plantae in modern taxonomic classifications, which indicates they are considered monophyletic. The methods generally used to verify this monophyly are phylogenetic analyses. In this article we review up-to-date results of such analyses and discussed their inconsistencies. Although phylogenies of plastid genes suggest a single primary endosymbiosis, which is assumed to mean a common origin of the Archaeplastida, different phylogenetic trees based on nuclear markers show monophyly, paraphyly, polyphyly or unresolved topologies of Archaeplastida hosts. The difficulties in reconstructing host cell relationships could result from stochastic and systematic biases in data sets, including different substitution rates and patterns, gene paralogy and horizontal/endosymbiotic gene transfer into eukaryotic lineages, which attract Archaeplastida in phylogenetic trees. Based on results to date, it is neither possible to confirm nor refute alternative evolutionary scenarios to a single primary endosymbiosis. Nevertheless, if trees supporting monophyly are considered, relationships inferred among Archaeplastida lineages can be discussed. Phylogenetic analyses based on nuclear genes clearly show the earlier divergence of glaucophytes from red algae and green plants. Plastid genes suggest a more complicated history, but at least some studies are congruent with this concept. Additional research involving more representatives of glaucophytes and many understudied lineages of Eukaryota can improve inferring phylogenetic relationships related to the Archaeplastida. In addition, alternative approaches not directly dependent on phylogenetic methods should be developed.


Introduction
Two-membrane plastids are characteristic of three eukaryotic lineages, glaucophytes (Glaucophyta), red algae (Rhodophyta) and green plants (Viridiplantae or Chloroplastida) [1].Glaucophytes comprise a small group of freshwater unicellular algae of only 13 species [2].Rhodophytes constitute a large assemblage of both unicellular and multicellular algae, mainly sea inhabitants, with about 5000-6000 species [3].The green clade is the largest, with 350 000 widespread species in both aquatic (mainly green algae) and terrestrial environments (land plants) [4].It has highly diversified into unicellular, colonial and multicellular forms, including land plants with composite tissue organization.These three major lineages are commonly classified together in the Archaeplastida or Plantae, one of several evolutionary supergroups of the domain Eukaryota [5][6][7].
There is a general consensus that Archaeplastida plastids are of prokaryotic origin, the result of endosymbiosis between a heterotrophic eukaryotic host and a photosynthetic cyanobacterium [8][9][10].In support of this, Archaeplastida plastid genomes share similar gene contents and conserved gene arrangement with cyanobacterial genomes [11,12], including an unusual tRNA-Leu group I intron [13].Glaucophyte plastids (called cyanelles) retain more cyanobacterial characteristics than do red alga and green plant plastids.The most remarkable is the presence of peptidoglycan, a typical component of bacterial cell walls located between the inner and outer membrane [14], as well as carboxysomes with enzymes involved in carbon fixation [15].In addition, glaucophytes and rhodophytes still possess phycobilisomes, typical cyanobacterial photosystem II light-harvesting antennae [16].
There are many features, beside those mentioned above, suggesting a common origin of Archaeplastida plastids.For example, they all are bound by two membranes, corresponding to the outer and inner membranes of their cyanobacterial ancestor.This distinguishes them from complex plastids surrounded by three or four membranes present in many other photosynthetic (and also heterotrophic) eukaryotes.In contrast to the Archaeplastida plastids, complex plastids did not evolve directly from a cyanobacterium but arose by endosymbioses of green or red algae with other eukaryotes [8,9,17,18].As a result, Archaeplastida plastids are called primary and the latter secondary or tertiary plastids.Moreover, primary plastids from all Archaeplastida representatives as well as their descendants (higher-order plastids) share the same unique atpA gene cluster in the genome [11] and translocase supercomplex (Toc/Tic apparatus) responsible for protein import of nuclear-encoded proteins.The complex is composed of a conserved set of cyanobacterium-derived homologs and subunits that are presumed to have arisen de novo in the common host [19][20][21][22].The Archaeplastida also evolved a common mosaic feature of nuclear plastidtargeted genes of Calvin cycle enzymes [23].At least three of the original cyanobacterial genes were replaced by host homologs in their common ancestor.
Transformation of an endosymbiont into a true organelle is considered to be a very complicated process requiring many modifications and inventions both in the host and endosymbiont [24,25].Therefore, it generally is assumed that the primary endosymbiosis happened only once in the common ancestor of all Archaeplastida members [10,26,27].Such a view requires monophyly of both primary plastids and their hosts, which is, however, still controversial because the clear similarity of plastids and their monophyly are also consistent with alternative scenarios for the evolution of the Archaeplastida supergroup [28][29][30][31][32][33] (Fig. 1).A good way to test the monophyly of Archaeplastida plastids and hosts is to carry out phylogenetic analyses of genes present in plastids, compared to those inferred from nuclear and mitochondrial genomes.

Testing the monophyly of Archaeplastida based on plastid genes
In support of a common origin of primary plastids, the first phylogenetic analyses involving plastid SSU (16S) rRNA genes grouped all representatives of Archaeplastida together [34][35][36].In agreement with earlier assumptions based on structural similarities of plastids and cyanobacteria, the three eukaryotic lineages branched from the cyanobacterial clade in phylogenetic trees.These results were also strongly confirmed by analyses of LSU (23S) rRNA, tRNA genes (for example, see [37,38]) as well as numerous studies of single [39][40][41] and concatenated sets of plastid and/or nuclearencoded plastid genes (see [42][43][44][45][46] and Tab. 1 for other references).
The most extensive phylogenetic studies of plastid evolution to date involved 191 genes with a total of more than 90 000 sites [47] and 75 genes with almost 40 000 sites [48].The results agree with those from phylogenetic analyses of 21 complete genomes using correlation of compositional vectors calculated on frequency of amino acid strings [49].In all these studies, representatives of the Glaucophyta, Rhodophyta and Viridiplantae clustered together with significant support.However, such a grouping does not exclude the possibility that each or at least two of these groups acquired their plastids independently from closely related cyanobacterial lineages that may now be extinct, or were passed from one Archaeplastida lineage to others via a secondary endosymbiosis (Fig. 1).This could lead to the same phylogenetic tree topology as in the case of a single plastid origin [28,32,33].Finding a cyanobacterial lineage that clearly breaks up the monophyly of Archaeplastida plastids would refute the single plastid origin concept.So far, however, all Archaeplastida plastids cluster exclusive of cyanobacteria with significant support, even in protein [50] and RNA gene [38,51] phylogenies that are most rich in representatives (up to 127) of known cyanobacteria lineages.
Fig. 1 Three alternative scenarios for primary plastid origin assuming a non-monophyly of Archaeplastida lineages (A) separated by a heterotrophic lineage (H).The first scenario assumes that independent primary endosymbioses involved closely related cyanobacteria and ancestors of separated Archaeplastida lineages.The cyanobacteria finally became extinct and the resulted plastids became similar by convergent evolution.The second hypothesis supposes that a single primary endosymbiosis occurred in one Archaeplastida lineage and next the primary plastid was transferred into the remaining lineages via secondary endosymbioses.The third scenario presumes an ancient primary endosymbiosis to have occurred very early in the eukaryote evolution and losses of primary plastids in lineages that are now heterotrophic.Tab. 1 Characteristics of phylogenetic studies based on plastid genes and nuclear genes for plastid-realted proteins (with "n" prefix) including three lineages of Archaeplastida.Additional notes were added only at the same references describing a different variant of studies.

Testing the monophyly of Archaeplastida based on nuclear genes
In contrast to the general concordance of phylogenetic trees based on plastid genetic markers, different nuclear gene data sets produced monophyly, paraphyly, polyphyly or unresolved topologies of Archaeplastida host lineages (Tab.2).In total, 26 data sets we gathered favor glaucophyte, red alga and green plant monophyly, whereas 50 do not recover the monophyly of these groups.Because their sequences were available and useful as a marker across comparable studies [52,53], initial analyses of representatives of all three Archaeplastida lineages were based on nuclear SSU (18S) rRNA gene [54][55][56].However, the trees obtained were inconclusive and had very poorly resolved deeper branches.The three Archaeplastida lineages did not group together and, surprisingly, glaucophytes clustered with cryptophytes, a group of algae bearing a secondary plastid of rhodophyte origin.The lack of Archaeplastida monophyly was also confirmed in subsequent studies based on the same marker, and joined with LSU (28S) rRNA gene, but including many more taxa, even 2551 species ( [57,58]; see Tab. 2 for other examples).Interestingly, glaucophytes still clustered with cryptophytes in these trees [59,60] and other recently discovered related lineages, such as heterotrophic katablepharids and picozoans [61][62][63]; the latter were formerly called picobiliphytes and considered photosynthetic but available data indicate that they are heterotrophic [64,65].Several later analyses recovered a Rhodophyta-Viridiplantae monophyly [58,62,66,67] and even a clade containing all three Archaeplastida lineages, but still mixed with cryptophytes [67], katablepharids, picozoans and the newly discovered heterotrophic flagellate Palpitomonas bilix [68].However, the relationships were poorly supported or, at best, had moderate support.A relatively high value (Bayesian PP = 1) was obtained only by Kim et al. [69] for the clade including glaucophytes with cryptophytes and katablepharids in a tree based on genes encoding large and small subunit rRNA.
The monophyly of Archaeplastida was recovered by initial global phylogenetic studies of eukaryotes based on several nuclear-encoded proteins [58,70,71], but not with very strong support (Tab.2).The inclusion of 143 protein sequences with a total length of more than 30 000 sites [45] and the application of a more advanced site-heterogeneous mixed-model (CAT) [72], as well as other approaches, recovered monophyly of primary-plastid host lineage with significant support [46,73].However, the analyses did not consider representatives of cryptophytes or their relatives, which grouped with Archaeplastida in rRNA gene trees.In fact, when these taxa and other eukaryotic lineages were taken into account, different results were obtained depending on the data set and method used (Tab.2).
In some cases almost all deep branches, including Archaeplastida lineages, remained completely unresolved [74][75][76].Interestingly, in analyses involving the largest number of taxa (451), major eukaryotic supergroups, including the Opisthokonta, Amoebozoa, Excavata and SAR (Stramenopiles, Alveolates and Rhizaria) obtained significant or some support, but not the Archaeplastida.As in rRNA gene phylogenies, they were mixed with cryptophytes, haptophytes and katablepharids [77].Many other trees inconsistent with the monophyly of Archaeplastida hosts were also obtained Tab. 1 (continued) 1 Maximum likelihood method was used. 2 Neighbor joining and logDet methods were used. 3Amino acids were recoded. 4Codons were degenerated. 5Stationary composition model was used. 6CAT model was used. 7Alignment gaps were excluded. 8Neighbor joining and maximum parsimony methods were used. 9Nonstationary (tree-heterogeneous) composition model was used. 10Data complexity was reduced. 11Compostional bias was reduced. 12Fast evolving sites were excluded.g -genes; n -nuclear; p -proteins; S -SSU rRNA gene; L -LSU rRNA gene; t-Ile-tRNA + Ala-tRNA genes; G -Glaucophyta; N -Nephroselmis olivace; R -Rhodophyta; V -Viridiplantae; Square brackets indicate a significant grouping with a posterior probabilty >0.9 or a bootstrapp support >90%.Tab. 2 Characteristics of phylogenetic studies based on nuclear genetic markers including three lineages of Archaeplastida.Additional notes were added only at the same references describing a different variant of studies.
(indicating paraphyly or polyphyly) but with more significant support (Tab.2).In one of them, glaucophytes formed a separate clade, whereas red algae and green plants grouped significantly with cryptophytes, haptophytes and katablepharids [78] (see also [74] for another interpretation).In other studies, a clade with the three Archaeplastida lineages was indeed recovered, but they still clustered significantly with haptophytes [79], cryptophytes and katablepharids [80], Palpitomonas [68], as well as with picozoans [81].The latter study was based on the most sequence-rich data set to date, 258 nuclear-encoded proteins with 55 881 sites in total.
In addition, other representatives of the main eukaryotic supergroups, such as Excavata and SAR, grouped with the Archaeplastida with high support in other phylogenetic analyses [29,31,64,68,74,76,82,83]. Such relationships were also inferred from many other data sets but with weaker or no support (Tab.2).It should be emphasized, however, that many other phylogenies based on both smaller [84] and much larger data sets [67,80,81,[85][86][87] have high or very high support values for monophyly of glaucophytes, red algae and green algae/land plants.

Testing the monophyly of Archaeplastida based on mitochondrial genes
Because of the inconsistencies in nuclear multigene phylogenies of main eukaryotic lineages, including glaucophytes, red algae and green plants, mitochondrial genes have been explored as alternative phylogenetic markers to trace host cell history.Successful sequencing of glaucophyte mitochondrial genomes enabled testing the monophyly of Archaeplastida.Studies based on 42 mitochondrion-encoded and nucleusencoded, mitochondrion-targeted proteins (11 384 sites) and 84 taxa (including two glaucophytes) supported the monophyly of the primary plastid-bearing eukaryotes with 0.99 and 1.0 posterior probabilities [88].Inclusion of seven glaucophyte sequences confirmed this result for the concatenated amino acid alignment of 14 proteins encoded in mitochondrial genomes (3267 sites) from 49 taxa [89].However, topology tests of the protein tree showed that placing red algae as a sister branch to haptophytes outside the Archaeplastida, as well as a topology uniting red algae and haptophytes within the Archaeplastida, were not statistically worse than the best tree.In addition to that, the Archaeplastida clade obtained very weak (51%) bootstrap support in maximum-likelihood analysis using the corresponding nucleotide alignment, whereas in the Bayesian tree the clade of Hacrobia (katablepharid, cryptophytes and haptophytes) branched within the Archaeplastida as sister to the red algae with 0.99 posterior probabilities for both the Archaeplastida + Hacrobia and the Hacrobia + red algae clades [89].Interestingly, this topology corresponds to results obtained from rRNA genes and some nuclear protein data.
the clustering of mitochondrial sequences from red algae and green plants can be artifactual, resulting from similarly slow evolutionary rates of these sequences compared to many other eukaryotic taxa [90].

Influence of different methodologies on discrepancies in Archaeplastida phylogenies
The striking differences in inferring phylogenetic positions and relationships of Archaeplastida hosts could result from many methodological limitations.On one side there is stochastic error caused by poor phylogenetic signal in the data, which makes it difficult to resolve relationships, especially at deep levels; for example, among very early diverging major eukaryotic lineages [91].One way to overcome this problem is to increase the number of genes analyzed and use phylogenomic approaches based on huge data sets on a genomic scale [92].However, even consideration of more than 40 000 sites does not always result in deep robust phylogenies for the Archaeplastida [80,87,93].Moreover, analyses of many genetic markers are vulnerable to systematic errors, which can lead to false phylogenetic reconstructions, reflecting unreal evolutionary relationships but with high statistical support.
The best-known systematic bias is long-branch attraction (LBA) [94] and the correlated effect that has been called short-branch exclusion (SBE) [95].These biases cause, respectively, artificial clustering of highly diverged sequences, as well as of sequences with less than average substitution rates.The potential influence of LBA on the Archaeplastida was nicely shown by Rodriguez-Ezpeleta et al. [73].The supergroup was moderately supported (64%) when two representatives of red algae, Porphyra and Cyanidioschyzon, were considered.Interestingly, the inclusion of only Porphyra increased the support for the whole Archaeplastida clade to 99%, whereas the tree with only Cyanidioschyzon moved this taxon to kinetoplastids with 100% support.Both kinetoplastids and Cyanidioschyzon are characterized by fast evolving sequences and their clustering is most consistent with LBA.In turn, the SBE effect concerns some sequences of glaucophytes [95] and could be responsible for grouping of short branches leading to glaucophytes and katablepharids in some tested tree topologies [81].This effect could also explain common clustering of red algal and green plant in phylogenetic trees based on slowly evolving mitochondrial sequences [90].
In addition to that, gene-rich data sets can suffer from inadequate taxon sampling because it is not always possible to collect the full sets of sequences from all species.If, however, including many taxa is required to test their relationships, final alignments can lack many sequences from understudied species.Some of the discrepancies observed in above-described phylogenies likely result from such missing data [96].For example, the exclusion of missing data and a more distant excavate outgroup changed the polyphyly of Archaeplastida (grouped with a katablepharid and a cryptophyte) to monophyly [80].However, further removing missing data resulted in weak or lack of support for the clade.It clearly shows that topologies obtained are susceptible to different data selection, and therefore, conclusions based on phylogenomic data should be drawn with caution.
It was shown that increased taxon-sampling could improve phylogenetic accuracy even at the cost of including data that are missing for some organisms [97].Including new eukaryotic lineages, especially haptophytes, cryptophytes, katablepharids, picozoans and Palpitomonas [79][80][81] broke up the monophyly of Archaeplastida that had been obtained previously using less taxon-rich data sets [45,46,68].The analyses based on the largest number of proteins (258 with 55 881 sites), represented by 68 species, grouped Archaeplastida lineages, but also included picozoans in this cluster, which significantly affiliated with glaucophytes [81].As could be expected, exclusion of picozoans from the data recovered Archaeplastida as the monophyletic clade.The influence of taxon-sampling on Archaeplastida relationships with other eukaryotes was also shown by Jackson and Reyes-Prieto [89] using the concatenated alignment of mitochondrial proteins.Archaeplastida monophyly was obtained when all glaucophyte representatives were included.However, after removing glaucophyte taxa other than those previously published (i.e.Cyanophora paradoxa and Glaucocystis nostochinearum), two phylogenetic methods failed to recover the common origin of Archaeplastida lineages, because the glaucophytes grouped with the katablepharid Leucocryptos marina (significantly using the Bayesian approach), whereas red algae clustered with cryptophytes and haptophytes.The whole clade including Archaeplastida and the other eukaryotes obtained significant support by the Bayesian method.Interestingly, the authors recovered the monophyly of Archaeplastida only when excavate taxa were excluded.When they were present, they grouped with glaucophytes and green algae [89].It indicates that the relationships obtained strongly depend on selection of taxa and sequences, and thus should be treated with caution.
Another problem with the multi-gene data is gene sampling.Its influence on the incongruity of Archaeplastida monophyly between results of Nozaki et al. [98] and Hackett et al. [84] was studied by Inagaki et al. [99], who evaluated the significance of Rhodophyta-Viridiplantae monophyly by analyzing multi-gene data sets of varied sizes [99].The authors showed that recovery of this relationship depends on gene sampling in phylogenetic inferences with fewer than 10 000 amino acid positions.The sampled data sets consisted of different genes and supported various topologies but the tree based on the full multi-gene data set recovered the monophyly of red and green lineages.However, no representative of glaucophytes was considered in these studies.
Comparing data gathered from nuclear genes (Tab.2) with more than two markers, we found that the monophyly of the Archaeplastida was, on average, obtained more often with greater numbers of genes (mean 103 vs. 52), longer alignments (26 694 vs. 14 624) but smaller numbers of taxa (59 vs. 69).The first two differences were statistically significant but not the last (Mann-Whitney test, P-value: 0.04, 0.04 and 0.75, respectively).These comparisons, however, should be considered with caution because the data compared are not completely independent.
Phylogenomic data sets consist of concatenated alignments of many genes, which most probably evolved subject to different substitution rates and patterns.Therefore, application of improper, usually simple, evolutionary models can also lead to inaccurate phylogenies.Model violation can also result from saturation of mutations, across-site rate variation, heterogeneous substitutions across sites, rate variation across sites through time (heterotachy), site-interdependent substitutions as well as compositional heterogeneity and nonstationarity of substitutions.It was shown in the case of the Archaeplastida that model misspecification could reinforce systematic errors such as LBA of the red alga Cyanidioschyzon with non-Archaeplastida taxa, giving an incorrect topology even with a strong support [73].
Several approaches are applied to overcome or at least reduce the impacts of these errors; for example, removing rapidly evolving sites, genes and taxa [100], recoding amino acids into functional categories [101], and using a covarion [102] and site-heterogeneous mixture (CAT) models [72].Although initial studies involving Archaeplastida were based on simpler phylogenetic models, many later analyses were usually carried out with advanced approaches using more realistic models.However, they still resulted in varied phylogenetic positions and relationships of the Archaeplastida.On one hand, the application of some of the modern methods increased support for the monophyly of Archaeplastida and alleviated the influence of some systematic errors [46,73,84] in comparison to earlier studies.On the other hand, Hampl et al. [79] obtained the non-monophyly of Archaeplastida (specifically placing of haptophytes within this supergroup) even after application of the CAT model and recoding of amino acids by functional categories, as well as progressive exclusion of rapidly evolving gene sequences and long-branch taxa.Therefore, the authors concluded that these relationships do not appear to be due to long-branch attraction artifacts.Similarly, Nozaki et al. used the CAT model and only slowly evolving genes and recovered polyphyly of Archaeplastida lineages [31].Similarly, the application of a covarion model in rRNA gene-based phylogenies still grouped glaucophytes with cryptophytes but did not improve the resolution of Archaeplastida [67].There also are differences depending on the phylogenetic methods used.Yabuki et al. [87] obtained significant for monophyly of the Archaeplastida in a Bayesian tree but not using maximum-likelihood.In contrast, Jackson and Reyes-Prieto [89], using mitochondrial genes, obtained some bootstrap support (although very weak) for Archaeplastida monophyly with maximum-likelihood but very strong confidence for separation of Archaeplastida lineages by katablepharid, cryptophytes and haptophytes using Bayesian inference.
To avoid problems with concatenated alignments, Chan et al. [103], used novel genomic data from two mesophilic red algae, Porphyridium cruentum and Calliarthron tuberculosum, considering each gene phylogeny separately and found that about 50% of the examined protein phylogenies support the monophyly of red and green algae.However, to resolve monophyly of the whole Archaeplastida inclusion of complete glaucophyte genomes is necessary.
It seems that there is no clear interdependence between the tree topology obtained with respect to Archaeplastida monophyly, and the methodological approaches used in phylogenetic analyses.

Inconsistency of Archaeplastida phylogenies in the context of gene and plastid evolution
The disagreement between plastid gene trees supporting the monophyly of Archaeplastida plastids, and many phylogenies based on nuclear (and some mitochondrial) markers that often favor other topologies, is very intriguing.Besides methodological problems with phylogenetic analyses, differences in the inferred trees can result from disparate gene histories that are incongruent with species trees.One of them is gene paralogy [104,105].When different paralogous gene families are retained across lineages, trees based on these genes will reproduce gene histories and duplication events but not accurate phylogenetic relationships among taxa.Although such genes should be excluded from phylogenetic data sets, it can be difficult to recognize hidden paralogs and separate them from real orthologs.The same problem concerns genes subjected to horizontal gene transfer (HGT), which seems to be a common process, not only in prokaryotic organisms, but also eukaryotes including Archaeplastida lineages [106][107][108][109][110][111].Such genes are most likely also present in many alignments analyzed and could cause artificial clustering.One of the genes possibly subjected to this process is EF2, which encodes translation elongation factor.Phylogenies based on this marker gave a very strong signal supporting the non-monophyly of Archaeplastida: the significant grouping of rhodophytes and viridiplants with katablepharid, cryptophytes and haptophytes but excluding glaucophytes [78].Therefore, it was proposed that the gene was likely horizontally transferred [74].The close relationship of Archaeplastida with katablepharid, cryptophytes and haptophytes was also reproduced by some trees of mitochondrial genes.If these results are reliable (e.g.do not reflect a compositional bias) and we insist on Archaeplastida monophyly, we should also assume HGT at least for some of these genes from Archaeplastida to the other photosynthetic lineages.Widespread horizontal transfer of mitochondrial genes has been reported in plants [112][113][114][115].
Another process that can complicate inferred phylogenies of the Archaeplastida is endosymbiotic gene transfer (EGT).During this process, genes from an endosymbiont, including those from both plastid and nuclear genomes, are moved and integrated into the host nuclear genome [107,[116][117][118].The presence of these genes in an alignment would mask phylogenetic signal from true (endogenous) orthologs (Fig. 2).Interestingly, in many trees based on nuclear genes the primary plastid-containing lineages intermix with other eukaryotes, such as cryptophytes, haptophytes and picozoans, all of which contain genes most likely from EGT during secondary (or tertiary) endosymbioses involving the red algal lineage [8,9,17,18].In agreement with that, Leigh et al. [119] found an influence of EGT in phylogenetic analyses on attraction of the red alga Porphyra to stramenopiles, which possess plastids derived from an ancient secondary endosymbiosis with a red alga (Fig. 2).The interfering effects of EGT could also explain the poor resolution of deep branches and a failure to recover the monophyly of Archaeplastida [77].However, the supergroup remained polyphyletic in phylogenetic analyses after exclusion of cryptophyte and haptophyte taxa [77].It should be noted that Archaeplastida lineages also join other photosynthetic lineages in trees based on conserved nuclear rRNA genes, which are not considered to be involved in the EGT process.Inferences about EGT, especially in aplastidic eukaryotes, should be performed with caution and use appropriate controls; otherwise invoking it could lead to overestimation of EGT events and its impact on eukaryotic phylogenies [120].
It is remarkable that Archaeplastida members show affiliation to cryptophytes, haptophytes and related taxa in rRNA, nuclear and some mitochondrial gene trees.If these groupings are not a result of some compositional bias or another systematic error, but rather reflect true host cell history, it is worthwhile considering alternative views [77,79].Three scenarios have been proposed (Fig. 1) that are compatible with the unquestionable monophyly of plastid genes, and explain both similar and different plastid features of Archaeplastida members (for wide discussion see [27][28][29][30][31][32][33]41,83,121,122]).One of them assumes that multiple independent primary endosymbioses occurred involving closely related cyanobacteria and separate Archaeplastida lineages.The resulting plastids were next subjected to convergent evolution, whereas the intervening cyanobacterial lineages became extinct.The second possibility supposes that a single primary endosymbiosis occurred in one Archaeplastida lineage and the primary plastid was next transferred via secondary endosymbioses into the remaining lineages.The third hypothesis claims that an ancient primary endosymbiosis occurred very early in eukaryote evolution, before divergence of some major lineages, and was followed by subsequent losses of the primary plastids in ancestors of groups that now contain no primary plastid.
To accept or reject these scenarios we should determine if they are consistent with available data on key processes involved in plastid evolution: (i) multiple endosymbiotic events followed by transformation of the endosymbionts into true organelles and (ii) multiple plastid losses.The problem is still hotly debated [9,17,18,123].Plastids in the reduced or vestigial form are still present in some eukaryotes that have lost photosynthesis, and are involved in vital non-photosynthetic functions (e.g.amino acid, heme, isoprenoid, and fatty acid biosynthesis); therefore, their loss seems unfavorable [124][125][126][127].The transformation of endosymbiont to organelle Fig. 2 Influence of endosymbiotic gene transfer (EGT) associated with a secondary endosymbiosis between a rhodophyte and a stramenopile on the inference of phylogenetic relationships between Archaeplastida and other eukaryotic lineages.The gene 1 is a true ortholog (vertically inherited) therefore the tree based on this gene reflects real host relationships, i.e. the monophyly of Archaeplastida.However, the tree based on the gene 2, which was acquired by the stramenopile from the red algae during the secondary endosymbiosis, shows a close affiliation of Stramenopila to Rhodophyta, thereby indicates the paraphyly of Archaeplastida.The simultaneous use of these genes in a tree reconstruction could mask the phylogenetic signal from the true ortholog leading to the attraction of rhodophytes to stramenopiles and its basal placement in respect to other Archaeplastida.
is usually considered a complicated process but successful endosymbiotic events might not be as rare as once assumed [128].In addition to examples of higher-order endosymbiosis [8,9,17,18,129], there is also at least one case of a successful primary endosymbiosis independent of Archaeplastida, namely between a freshwater thecate amoeba, Paulinella chromatophora, and a cyanobacterium from a phylogenetic lineage different from the ancestor of Archaeplastida plastids [37,51,130].Two photosynthetically active chromatophores retained by Paulinella fulfill all criteria to be considered true organelles [131].Their deep integration involved a significant reduction of the endosymbiont's genome [132,133], transfer of many endosymbiont genes to the host nucleus [134,135] and evolution of a machinery for import of host-encoded proteins into the chromatophores [136][137][138][139].Moreover, apart from Paulinella, there are many other eukaryotes in endosymbiotic relationships with cyanobacteria [140][141][142][143].However, the first scenario assuming independent primary plastid endosymbioses in Archaeplastida lineages encounters a problem with an unparsimonious convergent evolution of several specific and complex plastidal features, e.g.Toc/ Tic import apparatus [20] and composition of Calvin cycle enzymes [23].
At present, a single factor influencing inferred relationship of Archaeplastida with other eukaryotic supergroups in phylogenetic studies cannot be pinpointed and, therefore, be used to remove doubts about the monophyly or nonmonophyly of glaucophytes, red algae and green algae/plants.

Testing the relationships among Archaeplastida lineages
If we assume the monophyly of Archaeplastida despite its controversies, we should consider relationships among its three lineages.Glaucophytes are usually considered to be the earliest branch of the Archaeplastida because their plastids retain more ancestral features typical of cyanobacteria than are present in red algae or green plants; for example, peptidoglycan and carboxysomes [14,15].Other characteristics that distinguish the Glaucophyta from the Rhodophyta and Viridiplantae include (i) the presence of a cyanobacterial-like fructose-1,6-bisphosphate aldolase, which was replaced by a duplicated cytosolic copy in the case of the other Archaeplastida lineages [144], (ii) lack of triple-helix chlorophyll-binding, light harvesting antenna complexes [145,146] and (iii) lack of plastidial phosphatetranslocator [20].All these differences support the hypothesis that glaucophytes are sister to red algae and green plants.
On the other hand, glaucophytes and rhodophytes share the presence of unstacked thylakoid membranes with phycobilisomes also characteristic of cyanobacteria [16].Moreover, these two Archaeplastida groups lack chlorophyll b present in Viridiplantae and both synthesize starch in the cytoplasm contrary to green plants, which relocated the pathway to plastids [147][148][149][150].The close affiliation of these two groups was noticed by Cavalier-Smith who united them under the subkingdom Biliphyta, non-phagotrophic and phycobilisome-containing algae [26,151].However, the features shared by Glaucophyta and Rhodophyta represent characteristics of the cyanobacterial ancestor, and therefore should not be used as evidence for the monophyly of the Glaucophyta-Rhodophyta clade.
There is also evidence for the third possibility, i.e. a closer affiliation of glaucophytes and green plants.The Glaucophyta and Viridiplantae retain cyanobacterial RuBisCO genes; in contrast, red algae acquired RuBisCO from a proteobacterium via horizontal gene transfer [152].Moreover, N-terminal sequences of two plastid proteins of photosystems I and II in the glaucophyte Cyanophora show significant similarity to those in green plants but not in red algae [153,154].It was also found that the moss Physcomitrella, representing the "green" lineage, has nine homologous genes related to peptidoglycan biosynthesis that are essential for plastid division but are absent from red algae [155,156].
The first phylogenetic studies based on single genes that included all three representatives of the Archaeplastida were inconclusive and contradictory (see for example [39,40,157,158]), but subsequent analyses significantly resolved relationships within Archaeplastida (Tab. 1 and Tab. 2).In fact, all three possible topologies were proposed; however, phylogenetic analyses based on nuclear-encoded proteins (sometimes concatenated with rRNA genes) more often indicated the early-branching of the Glaucophyta (in 15 data sets) than the early divergence of the Rhodophyta (in 9 data sets) -Tab.2. A basal Viridiplantae was proposed only from an Hsp90 tree, but without significant support [159].What is more, the glaucophyte-first hypothesis was significantly supported by the larger number of studies, e.g.[46,67,84], whereas the rhodophyte-first concept obtained significant support only in one investigation, by Burki et al. [160], and with a simultaneous weak support for the whole Archaeplastida.In analyses by Rodriguez-Ezpeleta et al. [73], the glaucophytes + green clade received 90% support with 64% support for the whole Archaeplastida; however, when the highly diverged red alga Cyanidioschyzon was removed, and only Porphyra retained, these support values changed to 46% and 99%, respectively.It should be noticed that the earlier separation of glaucophytes was obtained using the data set with the largest number of alignment sites (55 881) among those tested [81].Although, the earlier branching of the Rhodophyta was recovered using data sets with, on average, greater numbers of markers (128 vs. 83), alignment sites (29 885 vs. 22 748) and taxa (62 vs. 57), the differences were not statistically significant (Mann-Whitney or t-Student tests, P > 0.12).
Analyses of gains and losses of nuclear genes encoding transcription-associated proteins (TAPs, comprising transcription factors and other transcriptional regulators) were also used to assess support for the alternative branching orders in Archaeplastida [20].The Viridiplantae-first scenario required 48 such events, the Rhodophyta-first scenario 46 events, whereas the Glaucophyta-first scenario 47 changes.However, when the pattern of TAP evolution typical of Viridiplantae (a greater probability of gains than losses) was taken into account, an earlier divergence of glaucophytes was favored.All these results based on nuclear markers tend to support the earlier separation of glaucophytes from the "red-green" lineage.
In contrast, phylogenies based on plastid genes and/or nuclear genes derived from a cyanobacterial endosymbiont are much less conclusive.Thirty data sets (15 significantly) supported the earlier divergence of glaucophytes (e.g.[44,[161][162][163]), 27 (15 significantly) -green plants (e.g.[46,51,164]) and 17 (6 significantly) -rhodophytes (e.g.[47,165,166]) -Tab. 1.In one case involving SSU rRNA gene data, a fourth strange topology was obtained: the prasinophyte Nephroselmis olivacea took a significant basal position to glaucophytes and a clade of rhodophytes with the rest of green algae and land plants [167].Considering cumulative results, support for the three possible topologies does not seem to depend on the type of data sets used, that is, plastid-encoded genes vs. nuclear-encoded genes or rRNA vs. protein genes.As for nuclear gene data, the rhodophyte-first concept was recovered by data sets with larger on average numbers of markers (74 vs. 52 and 48), alignment sites (21 972 vs. 14 715 and 12 301) and taxa (45 vs. 37 and 39) compared to topologies favoring the earliest separation of green plants and glaucophytes, respectively.The differences were, however, not statistically significant in Kruskal-Wallis test (P > 0.28).Accordingly, the most siterich data set (more than 90 000 nucleotide sites) indicated the Rhodophyta-first hypothesis [47].
Other studies based on plastid genomes also produced inconsistent results.Comparisons of conserved gene arrangements showed either the earlier divergence of red algae [42] or glaucophytes [11], using 6 and 10 representatives of Archaeplastida, respectively.On the other hand, general comparisons of 50 plastid genomic sequences using BLAST results in distance phylogenies [168], as well as phylogenetic analysis of 21 complete plastid genomes using correlation analysis of compositional vectors calculated based on frequency of amino acid strings [49], recovered the Viridiplantae as the earliest-diverging lineage.In turn, studies of the presence and absence of 261-277 genes in 17-20 plastid genomes indicated the early-branching of Rhodophyta [169,170] or Viridiplantae [171].Bayesian and maximum-likelihood methods applied to mitochondrial proteins showed the earlier divergence of glaucophytes with rather weak support [88,89], whereas the maximumlikelihood phylogenetic tree based on aligned nucleotides indicated the earlier separation of red algae [89].Alternative topologies positioning rhodophytes or green plants as the earliest branch within the Archaeplastida were not rejected by the approximately unbiased test in the protein tree.

Reasons behind inconsistencies in inferring relationships between Archaeplastida lineages
Disagreements over the phylogeny of Archaeplastida may result from different methodological limitations of phylogenetic inferences and complex gene histories, especially those related to plastids as discussed above.The influence of fast-evolving sites was tested by Rodriguez-Ezpeleta et al. [172] and Hackett et al. [84].Their trees based on complete alignments of nuclear-encoded proteins placed red algae at the basal position of the clade containing glaucophytes and green plants.However, the removal of fast-evolving sites recovered glaucophytes as the basal clade to the sister-group of red algae and green plants with a stronger support.
An important process that may influence Archaeplastida phylogenies is heterotachy, or lineage-specific rate variations across sites within a protein through time.It results from structural and functional constraints acting on proteins [173].Vogl et al. [174] showed that many plastid genes have significantly different phylogenies under standard substitution models and proposed that this incongruence could indeed result from heterotachy.In agreement with that, the application of heterotachy models found that many plastid genes of red and green plants are, in fact, subject to such evolution [175][176][177], and this process can cause errors in inference of phylogenetic trees leading to the LBA effect [178,179].
Deschamps and Moreira [46] studied conflicts in Archaeplastida relationships related to nuclear-encoded proteins in great detail.Following the findings of Leigh et al. [119], who noticed an attraction between rhodophytes and stramenopiles (containing red alga-derived plastid), Deschamps and Moreira assumed that the basal position of rhodophytes in relation to other Archaeplastida lineages [45,84,160,172] could result from endosymbiotic gene transfers linked to the secondary plastid endosymbiosis (Fig. 2).In agreement with that, the exclusion of eukaryotic representatives involved in such endosymbioses shifted the position of rhodophytes in phylogenetic trees from the base of the Archaeplastida clade to sister to the "green" lineage.
Comparable EGT-related effects could be responsible for the basal position of the Viridiplantae lineage, as its members were also engaged in secondary plastid endosymbioses.Such a position was obtained in trees using proteins of plastid origin encoded in nuclear genomes.However, the exclusion of fast-evolving proteins in the Viridiplantae did not change this result.Conserved plastid-encoded proteins also favored this topology, even under methods designed to decrease LBA effects and are more robust against compositional and evolutionary rate biases (i.e.amino acid recoding and a site-heterogeneous mixture CAT model).In contrast, a tree based on these proteins, but constructed using a simpler model that is more sensitive to systematic error, produced a topology with the basal placement of the Glaucophyta.This would seem to suggest that this topology could be artificial and result from (i) an attraction of glaucophyte sequences to a cyanobacterial outgroup due to a compositional bias, and (ii) the affiliation of "green" and "red" lineages because of a LBA artefact [46].
Criscuolo and Gribaldo [47] studying more plastidencoded proteins at the amino acid, degenerate codon, and recoded amino acid levels obtained the earlier divergence of Rhodophyta.The same topology was favored using a joint set of cyanobacterial homologous plastid-and nuclear-encoded proteins for degenerated codons and recoded amino acids, but the Viridiplantae-first scenario was supported by the combined set and the nuclear-encoded proteins alone at the amino acid level.The early-branching Glaucophyta topology was in turn recovered from the nuclear-encoded proteins using degenerated codons and recoded amino acids.These results clearly indicate an influence of data types and methods on inferences of relationships within the Archaeplastida.
These discrepancies among Archaeplastida phylogenies may be caused by conflicts between protein-coding genes and their amino acid translations into proteins, which persists even when sophisticated and better-fitting models are used, i.e. the CAT model and the nonstationary (treeheterogeneous) composition model [48].Since convergent composition biases are induced at synonymous codon positions, analyses based on amino acids should be preferred.Such analyses using better-fitting models on amino acid alignments showed the early divergence of Glaucophyta, whereas the ancient (and probably artificial) separation of the Viridiplantae was recovered using standard homogeneous models at the amino acid level as well as standard and advanced models at the nucleotide level [48].
Another reason for the inconsistency in inferring relationships within Archaeplastida could be ancient gene paralogy [163].Comparisons of the reduced genome of Paulinella plastids (acquired independently from those of the Archaeplastida via a primary cyanobacterial endosymbiosis) with free-living strains of the picocyanobacterium Prochlorococcus showed differential gene loss and concerted evolution among paralogous gene families.If similar events occurred after the primary endosymbiosis, they could mislead phylogenetic analyses based on plastid genes (Fig. 3).In fact, phylogenetic analyses of serially duplicated genes for photosystem II (psbA, psbB, psbC and psbD) indicated the basal emergence of Viridiplantae and a strong sisterhood of the Rhodophyta and Glaucophyta.In contrast, trees inferred from single-copy orthologs and anciently duplicated single-copy genes (atpA, atpB, psaA and psaB) favored the early-branching of glaucophytes, with significant support for a Rhodophyta-Viridiplantae clade.It should be noted that the tree built using the concatenated alignment of all these genes supported the Viridiplantae-first scenario, which suggests that phylogenetic signal from unrecognized paralogs dominated this data set [163].
A similar effect also could be caused by horizontal gene transfer into plastid genomes, which cannot be excluded.Although such transfers are less numerous compared to mitochondrial genomes [117], at least seven have been reported so far.The best known is the ribosomal protein rpl36 gene, which was transferred from a proteobacterium or a planctomycete bacterium to plastid genomes of cryptophytes and haptophytes [180].The other examples are genes encoding the large and small subunits of RuBisCO form I, acquired by the primary plastid genome of the common ancestor of red algae from a proteobacterium [152] (for additional cases, see [181,182] and references therein).
The difficulties in resolving relationships among the Archaeplastida lineages through phylogenetic methods suggest that early Archaeplastida members, just after the origin of primary plastids, underwent a rapid radiation including many molecular evolutionary processes, such as diversification (often increase) in substitution rate, and replacements, transfers, duplications and losses of genes.It seems that the much smaller subset of plastid genes is in general more influenced by these phenomena than nuclear genes, which quite unanimously indicate the earlier divergence of Glaucophyta from Rhodophyta and Viridiplantae lineages.

Conclusions
Although it is commonly accepted that primary plastids of glaucophytes, red algae and green plants are of cyanobacterial origin, the monophyly of these groups is not unquestionably supported by all phylogenetic studies.Phylogenies based Fig. 3 Influence of differential loss of duplicated genes on the inference of phylogenetic relationships among Archaeplastida lineages.An ancient gene was duplicated in a cyanobacteria lineage into two paralogous genes, which were subsequently transferred via the primary endosymbiosis to the common ancestor of three Archaeplastida lineages (a).The genes were next subjected to differential losses.The paralog 1 stayed in rhodophytes and glaucophytes but was lost in viridiplants, which retained the paralog 2 that in turn decayed in rhodophytes and glaucophytes.This process misleads phylogenetic inference about the branching order of Archaeplastida lineages because the phylogenetic tree based on these genes indicates the earlier divergence of Viridiplantae (b) but, in fact, Glaucophyta separated as the first lineage (a).The Viridiplantae branch of the tree coalesces with Rhodophyta-Glaucophyta branch to the duplication event but not to the lineage separation.
on plastid genes clearly recover monophyly of the three Archaeplastida plastid lineages; however, trees based on nuclear markers, rRNA and protein coding genes, vary substantially and have yet to resolve clear evolutionary positions of Archaeplastida host lineages.When non-monophyly is recovered, the primary plastid-containing eukaryotes often cluster with cryptophytes, haptophytes and related taxa.The new results based on mitochondrial genes suggest the monophyly of glaucophytes, red algae and green plants but do not exclude the other possibilities.The discrepancies among tree topologies obtained could result from many methodological limitations, for example stochastic and systematic errors involving horizontal/endosymbiotic gene transfers and hidden paralogy.However, even if these problems are taken into account, it is difficult to definitively exclude alternative scenarios for the origin of primary plastids.Nevertheless, trees based on nuclear genes that support the monophyly of Archaeplastida generally point to an earlier separation of glaucophytes in contrast to phylogenies based on mitochondrial and plastid genes.The smaller sets in the latter two cases appear more disturbed by complicated evolutionary phenomena.To conclusively solve the problem of the monophyly and the branching order of Archaeplastida plastids, more sequence data are required, especially those including glaucophytes and other poorly studied eukaryotic lineages.Giving the problems in phylogenetic reconstruction, other methods are desirable.EGT/HGT events can be identified by diverse composition-based methods.Other approaches alternative to phylogenetic analyses can include phylogenetic profiles, genomic contexts of studied genes, and presence or absence of genes in genomes.Analyses of rare mutation events called molecular signatures (e.g.conserved indels and gene fusion or splits) can also be helpful.