Unique genome evolution in an intracellular N2-fixing symbiont ofa rhopalodiacean diatom

Cyanobacteria, the major photosynthetic prokaryotic lineage, are also known as a major nitrogen fixer in nature. N2-fixing cyanobacteria are frequently found in symbioses with various types of eukaryotes and supply fixed nitrogen compounds to their eukaryotic hosts, which congenitally lack N2-fixing abilities. Diatom species belonging to the family Rhopalodiaceae also possess cyanobacterial symbionts called spheroid bodies. Unlike other cyanobacterial N2-fixing symbionts, the spheroid bodies reside in the cytoplasm of the diatoms and are inseparable from their hosts. Recently, the first spheroid body genome from a rhopalodiacean diatom has been completely sequenced. Overall features of the genome sequence showed significant reductive genome evolution resulting in a diminution of metabolic capacity. Notably, despite its cyanobacterial origin, the spheroid body was shown to be truly incapable of photosynthesis implying that the symbiont energetically depends on the host diatom. The comparative genome analysis between the spheroid body and another N2-fixing symbiotic cyanobacterial group corresponding to the UCYN-A phylotypes – both were derived from cyanobacteria closely related to genus Cyanothece – revealed that the two symbionts are on similar, but explicitly distinct tracks of reductive evolution. Intimate symbiotic relationships linked by nitrogen fixation as seen in rhopalodiacean diatoms may help us better understand the evolution and mechanisms of bacterium-eukaryote endosymbioses.


Spheroid bodies in rhopalodiacean diatoms
Nitrogen is one of the most important and fundamental elements for all living cells. However, only prokaryotic species are able to fix and utilize the dinitrogen that abundantly exists in the atmosphere [1]. As no eukaryotic cell possesses a N 2 -fixing capacity, multiple eukaryotic lineages separately developed symbiotic relationships with N 2 -fixing prokaryotes to secure a nitrogen source (e.g., the rhizobia-legume symbiosis [1]).
Cyanobacteria, one of the major contributors to aquatic primary production, contain many species that are able to fix molecular nitrogen and to perform photosynthesis. These N 2 -fixing cyanobacteria are frequently found to form various symbiotic relationships with diverse eukaryotes (as seen in a water fern Azolla, hornworts, cycads, Gunnera; [1,2]). In particular, cyanobacterial symbioses have been documented often in phylogenetically diverse diatoms [3,4], suggesting that separate diatom lineages established symbiotic partnerships with cyanobacteria.
Rhopalodiacean diatoms, a taxonomically small group of pennate diatoms, are one of those symbiont-possessing lineages [5,6]. The family Rhopalodiaceae comprises only three genera, Rhopalodia, Epithemia and Protokeelia [7]. With exception of marine Protokeelia species, rhopalodiacean diatoms can be seen widely in freshwater habitats. The cyanobacterial symbionts, so-called "spheroid bodies", has been found in species of Rhopalodia and Epithemia, and previous observations of the R. gibba spheroid body (4-6 mm in width, 5-7 mm in length) showed that the symbiont resides in the host cytoplasm and has two envelope membranes with putatively distinct origins -the outer one was derived from the eukaryotic host cell, while the inner one is the plasmamembrane of the symbiont [8][9][10]. It has been shown that the number of the spheroid bodies per diatom cell can vary depending on the availability of nitrogen compounds in culture media [11]. Phylogenetic analyses clearly showed that the spheroid bodies in rhopalodiacean diatoms were derived from a cyanobacterium closely related to a N 2 -fixing genus, Cyanothece [5,9]. The host-symbiont association in rhopalodiacean diatoms is seemingly more intimate than those in other diatoms: (i) the spheroid bodies reside inside of the host plasmamembrane, while some other cyanobacterial symbionts are extracellularly attached to the diatom hosts or found in the periplasmic space between the silicated cell wall and the plasmamembrane [8,12]; (ii) the spheroid bodies are believed to be inseparable from the diatoms, since these "cyanobacteria" have never been successfully cultivated independently from the hosts [9]; (iii) the most distinctive characteristic that can separate the spheroid bodies from other cyanobacterial symbionts is the lack of chlorophyll autofluorescence [13], implying that the spheroid bodies do not or barely possess photosynthetic activity.
Investigations of rhopalodiacean diatoms bearing unique cyanobacterial symbionts provide new insights into eukaryote-prokaryote symbioses linked by nitrogen fixation. However, only a handful of molecular studies have been done on the spheroid bodies and their host diatoms to date, and the biological, evolutionary, and/or environmental backgrounds, which facilitated this unique symbiosis, remain uncertain. Recently our research group successfully determined the first whole genome sequence of a spheroid body in the rhopalodiacean diatom Epithemia turgida [14]. The detailed metabolic functions deduced from the spheroid body genome indicated that the cyanobacterial symbionts reduced its metabolic capacity including photosynthesis, suggesting that the symbiont has abandoned a photoautotrophic lifestyle and energetically depends on its host.

The complete spheroid body genome sequence
To our knowledge, there is a single pioneering study on the genome of a spheroid body. Kneip et al. [15] carried out shotgun sequencing of the spheroid body of Rhopalodia gibba, and provided the first clue for the evolutionary status of the cyanobacterial symbionts in rhopalodiacean diatoms. They generated over 140 Kbp of non-contiguous DNA sequence along with a contiguous 51 Kbp fragment. As anticipated from the N 2 -fixing ability of R. gibba, an almost complete nif gene cluster, which encodes a set of the proteins required for nitrogen fixation, was found in the 51 Kbp genome fragment. In addition, the first genome sequencing effort revealed the signatures of genome reduction such as pseudogenizations, losses, truncations, and fusions of genes. However, the partial nature of this genome data impedes comparisons to the genomes of closely related cyanobacteria (free-living or symbiotic), and consequently it remained unclear how the intracellular lifestyle altered the spheroid body genome.
Against this background, we determined a complete genome sequence of the spheroid body in the rhopalodiacean diatom E. turgida [14]. A 16S rDNA phylogenetic tree of the symbionts indicated that the spheroid bodies of genera Rhopalodia and Epithemia have a single origin, implying that the spheroid bodies diverged along with the speciation of rhopalodiacean diatoms [5]. The genome of the E. turgida spheroid body (EtSB) consists of a single circular chromosome with a size of 2.79 Mbp, which is slightly larger than the size predicted by Kneip et al. [15]. As universally observed in comparisons between obligate bacterial symbionts and their free-living close relatives [16][17][18][19], the EtSB genome was found to be greatly reduced compared to the genome of Cyanothece sp. PCC 8801 (4.68 Mbp in size [20]), a free-living close relative of the spheroid bodies. The difference in genome size between EtSB and Cyanothece sp. PCC 8801 coincides with the number of protein-coding genes: The EtSB genome possesses 1720 protein-coding genes, which is only 39% of the number of protein-coding genes in the genome of Cyanothece sp. PPC 8801. In addition, the G+C content of the EtSB genome (33.4%) is lower than that of Cyanothece sp. PCC 8801 (39.8%).

Gene retentions and losses in the spheroid body genome
The EtSB genome successfully provided the first comprehensive picture of the metabolic activities in the cyanobacterial endosymbiont. Confirming the result from the partial sequence of the R. gibba spheroid body genome [15], the EtSB also contains the complete nif gene cluster, with the exception of fdxN and nifU (discussed in the following section). Importantly, incorporation of gaseous nitrogen into chlorophyll a of the host diatom plastids was clearly confirmed by a 15 N-isotope tracing analysis [14], indicating that the host diatoms indeed utilize nitrogen fixed by the spheroid bodies. Fig. 1 shows gene status in the EtSB genome compared against a consensus set of protein-coding genes from three free-living cyanobacterial relatives (Cyanothece spp. PCC 8801, PCC 8802 and ATCC 51142). A phylogenomic analysis suggested that the three cyanobacteria have a close evolutionary affinity to the spheroid bodies [14], and consequently the ancestral spheroid body likely possessed genes similar to the consensus gene set for the three species. The hypothesized ancestral gene repertoire is represented by KEGG Orthology IDs (KO IDs [21]) in Fig. 1. In comparison with this gene set (1174 KO IDs in total), the spheroid body was found to possess 69% of the "ancestral" genes ( Fig. 1a). As seen in Fig. 1b, genes in major functional categories related to basic cellular functions (i.e., categories for "translation", "transcription", "nucleotide metabolism", and "replication and repair"; Fig. 1b) mostly remained intact. In addition to these "housekeeping" genes, the EtSB genome retains genes for all amino acid biosynthetic pathways, which are often discarded in obligate bacterial symbionts in various symbiotic systems [17,19,22,23], implying that the spheroid bodies do not require an external amino acid supply.
The greatest gene loss was found in the category for "energy metabolism" (Fig. 1b). The reduction of gene numbers in this category is related to the lack of photosynthesis, which had been suspected from previous works [9,15]. Indeed the EtSB genome was found to possess none of the functional genes for photosystem I/II, phycobilisome (cyanobacterial light harvesting complex), or chlorophyll biosynthesis. This situation clearly indicates that EtSB is unable to carry out photosynthesis, and consequently this intracellular symbiont energetically depends on its diatom host. The EtSB lacks a functional RuBisCO, the fundamental enzyme for the Calvin cycle, further supporting the above idea. To our knowledge, none of cyanobacteria, free-living or symbiotic, are found to have completely abandoned a photoautotrophic lifestyle other than the spheroid bodies.
Another important feature of the EtSB genome is that a number of pseudogenes still remain detectable based on sequence similarity. In total, the EtSB genome was found to retain 225 pseudogenes. Amongst the missing genes in comparison with the ancestral gene set (Fig. 1a), nearly one third of those missing genes have been detected as pseudogenes (9% of total ancestral gene set). From the point of view of functional categories, pseudogenes were found to be the most abundant in "metabolism of cofactors and vitamins" (Fig. 1b). Within this category, biosynthetic pathways for chlorophyll a and vitamin B 12 are intriguing in terms of pseudogenization. In both pathways, nearly all of the genes were identified as pseudogenes (Fig. 2). These observations imply that not enough evolutionary time to eliminate pseudogenes from the genome has passed since the two pathways were inactivated. This assumption is consistent with an estimation that the rhopalodiacean diatoms can be traced back to approximately 12 Mya [5,24,25], while the 40-60 Myr is likely required to have pseudogenes completely disintegrated [26].
Another interesting pseudogene in the EtSB genome is that encoding NifU, a scaffold protein for the assembly of [4Fe-4S] clusters, which are fundamental compounds for the nitrogenase. Kneip et al. [15] identified the nifU gene in the genome fragment of the R. gibba spheroid body, albeit the putative protein appeared to be severely truncated at the N-terminus and lacked four out of the five catalytically  important cysteine residues (Fig. 3). In the EtSB genome, we detected the homologous sequence to the R. gibba nifU gene, but its coding region was interrupted by a stop codon. Thus, we concluded that the EtSB nifU gene is dysfunctional (Fig. 3). The EtSB should possess a different protein for [4Fe-4S] cluster assembly instead of NifU, as nitrogenase activity in this symbiont was confirmed by a nitrogen isotope tracing analysis [14]. We hypothesize that the NifU function in Fe-S cluster synthesis is fulfilled by a small protein with a considerable sequence similarity to NifU C-terminal region ( Fig. 3; NifU-like protein).

UCYN-A -another symbiotic N 2 -fixer related to the spheroid bodies
A unicellular cyanobacterial phylotype called UCYN-A was identified by environmental DNA analyses of ocean water samples [27,28], and was recently nominated as a major nitrogen-fixer in oceans, second to a filamentous cyanobacterium, Trichodesmium [29]. A detailed cellular identity related to the UCYN-A phylotype has not been entirely revealed, as no culture strain has been established. Nevertheless, there are two complete genome sequences corresponding to two closely related UCYN-A phylotypes, UCYN-A1 and UCYN-A2 [23,30]. Interestingly, the UCYN-A genomes (~1.5 Mbp) were found to be much smaller than the EtSB genome (~2.8 Mbp), suggesting a severe diminution of metabolic capacity. According to the genomic information, the UCYN-A cyanobacterium lacks the entire tricarboxylic acid (TCA) cycle and biosynthetic pathways for several amino acids and purine nucleotides, which are partially or completely retained in EtSB.
Based on the genome size and metabolic capacity deduced from the genome data, the UCYN-A cyanobacteria most likely experienced a more severe genome reduction than EtSB. The most prominent difference between the UCYN-A cyanobacteria and EtSB is photosynthetic ability: the former is most likely photosynthetic, as the genomes still retain the complete gene set for photosystem I and that for chlorophyll a biosynthesis. In sharp contrast, EtSB has entirely discarded photosynthetic ability, as described in the previous section. The magnitude of genome reduction and gene repertoires is different between the two cyanobacterial genomes, as the spheroid bodies of rhopalodiacean diatoms and the UCYN-A cyanobacteria have independent progenitors, which are phylogenetically closely related, but explicitly distinct from each other [14].
The incomplete nature of the metabolic capacity deduced from the UCYN-A genomes suggests that the corresponding cyanobacteria depend on external nutrient supplies. Consistent with the above prediction, there is some evidence for a symbiotic relationship between the UCYN-A cyanobacteria and a prymnesiophyte unicellular alga [10,31]. Currently, it remains unclear how intimate the host-symbiont relationship is: Thompson et al. [31] suggested the cyanobacteria reside in the epicellular space of the host alga, while electron microscopic images identified the UCYN-A cyanobacteria as an endosymbiont [10]. To reveal the host dependency of the UCYN-A cyanobacteria, transcriptomic and genome data from the host are indispensable for the future. As the spheroid bodies and UCYN-A cyanobacteria have independently established deep symbiotic associations with photosynthetic algae, comparisons between the two symbiotic systems may provide key insights into bacterium-eukaryote partnerships built on a nitrogen supply in the hydrosphere.

Concluding remarks
The complete spheroid body genome greatly advanced our understanding of the functions and metabolism of the cyanobacterial symbiont. Nevertheless, we currently know little about how the host system controls the division of the symbiont and distribution of the divided symbionts to the daughter cells, and enables the trafficking of metabolites from and to the symbiont. To address the above issue, future investigations should focus on the host diatom. We anticipate that the host (diatom) system controlling the spheroid bodies is useful for understanding the processes that gave birth to two major bacterium-derived organelles, mitochondria and plastids, as well as the early evolution of eukaryotes.