Tracking of intercalary DNA sequences integrated into tandem repeat arrays in rye Secale vavilovii

The structure of repetitive sequences of the JNK block present in the pericentromeric region of the 2RL chromosome was studied in Secale vavilovii. Amplification of sequences present between the JNK sequences led to the identification of seven abnormal DNA fragments. Two of these fragments showed high similarity to the glutamate 5-kinase gene and putative alcohol dehydrogenase gene of trypanosomatid from the genus Leishmania, whose presence can be explained by horizontal gene transfer (HGT). Other fragments were similar to mitochondrial gene for ribosomal protein S4 in plants and to the glycoprotein (G) gene of the IHNV virus. Presumably, they are pseudogenes inserted into the JNK heterochromatin region. Within this region, also fragments similar to the rye repetitive sequence and chromosome 3B in wheat were found. There is no known mechanism that would explain how foreign sequences were inserted into the block region of tandem repetitive sequences of the JNK family.


Introduction
Understanding the diversity, complexity, and evolution of eukaryotic genomes is impossible without an accurate knowledge of their repetitive sequences [1][2][3].This especially applies to plants, in which the proportion of repetitive sequences in the genome varies significantly, and may even reach 97% of the nuclear DNA [4][5][6].This translates to significant differences in the size of the genomes of these organisms [7,8].It is believed that a large number of these sequences ensures the integrity of the genome in higher eukaryotes [9], inter alia, by affecting the expression of genes, organization of chromatin, processes such as recombination, DNA replication, or cell division [10].Studying the distribution, origin, and function of repetitive sequences helps to understand their potential and role in the organization and evolution of eukaryotic genomes [11].The results of recent studies have provided a lot of new information about the origin and functioning of repetitive sequences in the genome [12][13][14][15][16]. Repetitive DNA is highly polymorphic, which is caused by amplification and rapid evolution.This leads to changes in genomic structure and contributes to enlargement of the genome, thus affecting its complexity [17,18].
Repetitive sequences can occur in tandem or dispersed arrangement.Family of repetitive sequences can be commonly present in a given genus or be specific to a species or a chromosome.Repetitions may be present only in certain regions in the chromosome, as in the case of telomeres, or scattered throughout the genome [19].Repetitive sequences may occur in a dozen or a few thousand repetitions even within individual genus, e.g., family of JNK repetitive sequences in Secale vavilovii on chromosome 2R is repeated 4000 times, while it is dispersed in other rye species and repeated about 20 times [20,21].The JNK sequence was named after rye (Secale cereale L.) landrace 'JNK' , originally collected by Dr. H. Kishikawa in Japan [20].
The JNK sequence is a heterochromatin block identified in 2R chromosomes of a local Japanese cultivar, Secale cereale L. [20] and inbred lines of Secale vavilovii Grossh.[21] using Giemsa reagent.These are highly methylated sequences, rich in GC and composed of tandem repeats of 1200 bp in size [20].Blocks of the JNK family are flanked by R173 repeats characteristic for rye.Downstream, R173 is adjacent to JNK in an antiparallel orientation, while upstream in a parallel orientation [22].The JNK sequence shows high similarity to the fragment of 5S rDNA and the Angela retrotransposon.Probably it is a set of defective genes, which are deposited in one region on chromosome 2RL [23].
A study showed that the blocks of repetitive sequences do not have a uniform structure, and foreign sequences are often inserted between them [22].An example can be the Ty3-gypsy CRR (centromeric retrotransposons of rice) rice retrotransposons and CRM (centromeric retrotransposons of maize) in corn, which preferentially transpose into the centromeric satellite DNA [24].
The exact mechanism that would explain how foreign sequences are tandemly inserted between repetitive sequences is unknown.Therefore, extending knowledge on the heterochromatin is invaluable particularly in the light of the present research which shows that these sequences are rich in regulatory RNA coding sequences, playing an important role in the epigenetic regulation of gene expression [25,26].This fact indicates that the study of unknown sequences occurring in repetitive sequence regions is also necessary to understand the mechanisms of the genome control and phenotypic expression [27].
The current study aimed at understanding the structure of a block of repetitive JNK sequences.Revealing the structure of repetitive sequence blocks in detail may help in understanding the mechanism of accumulating them in one place in the genome, which is associated with decreased vigor and fertility of inbred S. vavilovii lines with the JNK block.It can be said that in order to know and understand the organization and functioning of the genome, its repetitive sequences must also be well understood.

Plant material
We performed studies with two inbred lines of Secale vavilovii Grossh.(109 and 116) obtained by self-pollination of plants, whose anthers and caryopses were mosaic colored, and in 2R chromosomes additional heterochromatin band was present [23,28,29].Plants were cultivated in a plant house of the West Pomeranian University of Technology in Szczecin.Plants were cultivated in containers (volume: 10 L) filled with loamy sand with the addition of Azofoska fertilizer in the dose 2 g/L.Caryopses were sown to the containers in January in a greenhouse in which the temperatures were as follows: 20°Cday, 16°C -night.Young plants were exposed to vernalization (1-4°C) in a greenhouse.In March, plants were put outside the greenhouse and in a first decade of May were transferred again to the greenhouse.To ensure self-pollination of plants, appropriate cellophane insulators were applied.The drip system was used during the vegetation.Fungicides and insecticides were used symptomatically.Plants were fertilized twice, at the stage of tillering and booting (NPK fertilizer 20-20-20).Caryopses from the plants were collected and coleoptiles were obtained under laboratory conditions.

DNA isolation
Rye genomic DNA was extracted from etiolated coleoptiles (1 g per sample) as described by Kalinka and Achrem [22].The purity and quantity of the extracted DNA was evaluated using the NanoDrop 2000c spectrophotometer (Thermo Scientific, Lithuania).

Amplification of sequences located among JNK tandem repeats
The sequences of primers used in the PCR reactions are given in Tab. 1. Various combinations of these primers were tested (Tab.2).Each primer was designed for inverted PCR.Thus, by using forward and reverse primer designed for JNK sequence we were able to amplify each sequence interposed between two JNK repeats.Primers were designed [22] with the FastPCR software v. 6.2.89 [30,31], using an inverted PCR option, for a JNK sequence (GenBank: AB008922.1).

PCR reactions
Polymerase chain reactions were performed in 50-μL reaction mixture containing: 100 ng DNA, 1× Advantage 2 PCR Buffer [with Mg(OAc) 2 35 mM], 0.2 mM dNTP (Fermentas, Lithuania), primer forward and reverse (each 0.2 μM), and 1× Advantage 2 Polymerase Mix (Clontech Laboratories, USA).The PCR temperature profile was as follows: seven cycles at 94°C for 20 s, 72°C for 3 min, and 32 cycles at 94°C for 25 s, 67°C for 3 min, ending at 67°C for 7 min.PCR was performed using Peltier Thermal Cycler (Bio-Rad, USA).Positive and negative controls were included in each experiment.The expected length of the PCR product was included, assuming that two JNK sequences are adjacent to each other and are located as direct repeats (counted for 1192-bp-long JNK sequence, GenBank accession No. AB008922.1).

Gel electrophoresis.
Germany) was added to the gel to final concentration suggested by the manufacturer.The gel images were captured with the Gel Doc XR system (Bio-Rad).The size of the products was determined by comparison with MASSRULER DNA Ladder Mix (Thermo Scientific).Bands were scored and analyzed with Quantity One software (Bio-Rad).
Sequencing.Distinct bands of different lengths than the expected (contiguous sequences JNK; Tab. 2) were excised from the gel and purified using PrepEase Gel Extraction Kit (Affymetrix, USA) according to the manufacturer's instructions.Sequencing was performed at Genomed S.A. (Poland).The sequences were analyzed using BLASTN 2.2.26+ [29] software.

Results
The PCR reaction, in addition to the products of expected size, i.e., the adjacent JNK sequences, amplified sequences of different size (Fig. 10).Seven atypical fragments were identified located between the sequences of the JNK family.
One of the fragments with a length of 551 bp (accession No. KY853403) showed very high identity (93%) to the DNA sequence of chromosome 26, of Leishmania braziliensis (Fig. 1, Tab. 3) and Leishmania panamensis.Within this sequence, a region in between 157-551 nt was 96% similar to the mRNA sequence of the putative glutamate 5-kinase Leishmania braziliensis gene (Fig. 2, Tab.3).The entire mRNA sequence of this gene in Leishmania has 795 nt and the identified fragment overlapped with the initial 394 nt (71% overlap) of this sequence.Identity between the two sequences is very high (96%).
A sequence of 827 bp (accession No. KY853404) located between the JNK showed the greatest similarity to the putative alcohol dehydrogenase mRNA of Leishmania mexicana (Fig. 3, Tab.3).The gene encoding this protein is located on chromosome 29, while in other Leishmania species (Tab.3) on chromosome 30.The entire mRNA sequence of this gene in Leishmania has a length of 1200 nt, whereas the overlap of the detected 827-bp fragment between 12-862 nt of the dehydrogenase sequence reaches 100% in most species (Tab.3).
Within the JNK sequences, a sequence was located with a very high similarity to mRNA of the mitochondrial gene encoding ribosomal protein S4 in many plant species, in particular of the family Brassicaceae.The highest identity was found for Arabidopsis thaliana gene (96%).The mRNA of this gene has a size of 1089 nt (Tab.3), and the identified fragment in Secale had a length of 376 bp (accession No. KY853405) (Fig. 4), thus this was not the full sequence.The gene encoding this protein has a rather conserved structure, as the analyzed fragment also showed identity to the homologous sequences in other plant species, including Brassica napus (94% identity), Medicago truncatula (92% identity), Fragaria iinumae (91% identity), and Beta vulgaris (88% identity).
The analysis also showed the presence of a very unusual fragment (423 bp -accession No. KY853406), having a certain similarity to the glycoprotein gene (G) of infectious hematopoietic necrosis virus (IHNV) of salmonids.This identity was in the range of 84-87% in relation to the sequences present in the GenBank under different access numbers.The highest identity (87%) concerned the glycoprotein gene of the Cro/05 virus strain (Fig. 5).The entire coding sequence of this gene had a length of 1527 nt, the identity applied to Tab. 3 Sequences showing significant sequence homology to the identified sequence.366 of 423 nt (87%, E value = 1e−125) at the number of gaps amounting to 6 (1%).Among the identified sequences, there also was a large fragment of a previously identified repetitive rye sequence (Fig. 6).The identified fragment was slightly shorter (770 bp -accession No. KY853407) than the sequence in the GenBank (984 bp), but the identity was high (92%; Tab. 3).A similar sequence is located in the wheat genome in chromosome 3B (Tab.3).Interestingly, cDNA sequence data also exist from this wheat region (Tab.3).Part of this sequence was also identified in rye as a transposase pseudogene of the Revolver-1 element (Fig. 7, Tab. 3).The analyzed fragment is similar to the terminal portion of this pseudogene.

Query
Two of the identified fragments (772 bp -accession No. KY853408 and 808 bp -accession No. KY853409, respectively) showed similarity to different DNA regions of the wheat chromosome 3B (Fig. 8, Fig. 9).In both cases, many fragments with different overlap were found, whereas the maximum identity was 81-84% at 100% overlap of the sequence.The sequence similarity was significant (E values <1e−50).Based on recent studies it can be concluded, that 808-bp-long fragment is a part of repetitive motif (1759-bplong) found also in Secale.The identity of 808-bp sequence to the newly described motif (KY327931.1) in Secale cereale amounted 87% (E value = 0).

Discussion
A thorough analysis of the JNK sequence block in the S. vavilovii genome showed that it is flanked by R173-2 or R173-3 sequences characteristic of rye.In addition, it was found that the JNK sequences did not form a homogeneous arrangement [22].The research has shown that between the tandemly arranged sequences, there are other sequences, showing often very high similarity to gene sequences characteristic of other organisms.Amplification of the sequences lying between the JNK sequences showed that two atypical fragments present between the sequences of the JNK family are similar to the genes (alcohol dehydrogenase and glutamate 5-kinase) occurring in trypanosomes of the genus Leishmania.It may seem very strange, but it should be kept in mind that the family Trypanosomatidae comprises protozoan parasites that infect not only animals, but also different plants [32,33].Different insects serve as vectors for plant Trypanosomatidae [34][35][36][37].In 2014, Porcel et al. [38] described the genomic sequence of two plant trypanosomes.As shown by comparison of the genomes of the two parasites with other trypanosomes, they shared a common simplified organization of the genome.The genomes of Trypanosomatidae are organized into large policistronic gene clusters [39][40][41].What is characteristic, almost all protein-coding genes lack introns.Introns were found only in poli(A) polymerase and ATP-dependent RNA helicase (DEAD/H) genes in T. brucei, T. cruzi, and Leishmania spp.However, in Phytomonas even these genes do not have introns [38].This fact supports the results obtained in this study, since the identified sequences were similar to both DNA and mRNA of alcohol dehydrogenase and glutamate 5-kinase.It can be hypothesized that there are/ were some trypanosomes of the genus Phytomonas, which are/were parasites of rye and horizontal gene transfer (HGT) has occurred of some sequences from the genome of these protozoa into the plant genome.Phytomonas parasitize on plants of the family Poaceae [42], and although according to the current data, there are no trypanosomes  detected infecting rye, they have been discovered in maize [42,44], in which they reproduce in kernels.Because trypanosome genes (with some exceptions) do not contain introns and the sequences that we found in JNK sequence block are partial, it is difficult to say whether the copy of DNA or cDNA has been inserted.
The identification of fragments similar to the G glycoprotein gene of infectious hematopoietic necrosis virus (IHNV) of salmonids seem equally unusual as finding sequences similar to Leishmania genes.However, a similar hypothesis can also be postulated in this case.IHNV is a member of the genus Novirhabdovirus and belongs to the family Rhabdoviridae [45,46].The family Rhabdoviridae currently includes six genera: Vesiculovirus, Lyssavirus, and Ephemerovirus, which were isolated from various animal species [47], Novirhabdovirus infecting fish, and Cytorhabdovirus and Nucleorhabdovirus infecting arthropods and plants.Genomes of all rhabdoviruses are a single-stranded RNA (−) which encodes five structural proteins designated as N, P, M, G, and L, which are: nucleoprotein (N), phosphoprotein (P), matrix protein (M), glycoprotein (G), and polymerase (L).[48].Division of each genus into species is supported by comparisons of nucleotide and amino acid sequences of one or more genes [49,50], however, the sequence of the entire genome is available only for a few species.This might also explain why the sequence identified in rye is similar to the gene found in the virus occurring in fish.The similarity of rhabdoviruses infecting plants and fish should be established.Bourhy et al. [51] conducted a phylogenetic analysis of 38 rhabdoviruses of all genera based on the sequence of the polymerase gene (L).As a result of this study, it was found that the viruses classified to the genera Novirhabdovirus, Cytorhabdovirus, and Nucleorhabdoviruses were closely related and formed one clade in the dendrogram, evidently differing from the other three genera that formed the second clade.This fact suggests that the fragment identified in rye genome, showing a strong similarity to the glycoprotein gene (G) of IHNV, would probably also be very similar to that gene of plant viruses, but such data are not available in the GenBank.
A sequence has been also identified within the JNK sequence block showing very high sequence similarity to sequence of the mitochondrial ribosomal protein S4 of   Arabidopsis thaliana involved in translation.In this case, it is obvious that in plants the protein coding sequences may be similar.Some mitochondrial genes exist in multiple copies in plant mitochondrial genomes due to the rearrangement of the genome or a gene duplication.An example might be Petunia gene atp9 [52] or the cox2 gene in Petunia and sugar beet [53,54].The latter has been transferred from the mitochondrial genome into the nuclear genome in plants [55].It is believed that after the transfer into the nuclear genome, these genes are inactivated, which then often leads to the loss of their sequence.There is some kind of the transition state, in which both copies, located in the nucleus and the mitochondrion, coexist and are fully functional.After some time, redundant copy is inactivated or eliminated from the genome.Remnants of ribosomal proteins pseudogenes (or their total absence) in mtDNA correlated with recently transferred functional copies to the nucleus [56].The example of gene undergoing the transition state is ribosomal protein L5 gene in wheat [57].The copy transferred the nuclear genome can take over the function, then the mtDNA copy degenerates.Alternatively, if nuclear copy loses its function during transition state, a copy located in mtDNA retains function.A good example of these both possibilities is gene encoding S19 protein.In most of the cereals, this gene is present only in nuclear DNA, but in rice it is located in mitochondrial genome [58].Here, we report for the first time about the presence of ribosomal protein S4 pseudogene in the nuclear DNA.The sequence shows similarity both to DNA and mRNA because rps4 do not contain introns [56].As indicated above, it could be the remnant of the functional copy transferred from the mtDNA.It is quite surprising, as it is believed, that the rps4 protein is encoded only the gene located exclusively in mitochondrial genome [56].There are no literature data about presence of this gene's sequence in nuclear genome.Successful transfers between mitochondrial and nuclear genomes are a rare phenomenon, the examples are rps14 and rps19 [59].The most common cause of ineffective transfer is displacement of incomplete sequence.This may perfectly explain the presence of defective sequences in the nuclear DNA [60].
Sequences similar to mitochondrial sequences located in the nuclear genome are called NUMTs (nuclear mitochondrial pseudogenes) and have been localized in the genomes of eukaryotes [61][62][63].Some of these sequences are highly conserved, other contain deletions or duplications.They vary in size and are randomly distributed in different chromosomes, and may be arranged in tandems.In many plant genomes, scientists identified large NUMT insertions [64], but also short sequences, which are sometimes organized in clusters [65,66].Although occasionally NUMT genes become functional in the nucleus [67], most insertions of organelle DNA in the nucleus is mutated and partially deleted [66].Studies in many plant species have shown that NUMTs are located preferentially in the centromeres in species with a small genome (Arabidopsis and rice) [68] or on B chromosomes in rye [69], while in species with a larger genome they are more fragmented and located along the entire chromosomes, as in additional heterochromatin in the long arm of chromosome 2R in S. vavilovii.Perhaps the sequence showing high similarity to the mRNA sequence of the mitochondrial ribosomal protein S4 in S. vavilovii, similar to the genes of plant species such as Arabidopsis thaliana, Brassica nigra, B. napus, Medicago truncatula, Fragaria iinumae, or Beta vulgaris, is a NUMT gene that has mutated and was inserted into the heterochromatin region.Although the NUMT inserts have been considered little functional [70,71], a study in humans has shown that by mutagenicity they may contribute to the damage of the functional genome integrity [72].
The wide variety of sequences types which are inserted among tandem repetitive JNK sequences is astonishing.In such cases, a possibility of contamination should be always considered.The are some reports showing the sources of contamination.For example, there is high risk of contamination if the DNA is isolated from roots.It is very difficult to remove the rhizospere from roots.The rhizosphere contains many bacteria, protozoa, and nematodes [73].Being aware of the possibility of error, we isolated DNA from the coleoptiles grown under sterile conditions, and good laboratory practice was followed throughout the procedure as well.However, it could be still not enough to avoid errors caused by contamination.Also, when the sequences of nuclear DNA are studied by PCR reaction, there is a risk to amplify the sequences from cpDNA or mtDNA.The primers were designed to repetitive elements, thus, the possibility of unspecific annealing was high.We reduced it significantly by designing long primers with high melting temperature and, as a control, nested PCR was performed.Moreover, we analyzed only these PCR products, which were flanked by JNK sequence fragments.Doing this, we were confident that we found sequences inserted among two JNK copies in the genome of S. vavilovii.In the case of such analyses, it is particularly important to avoid errors [74] and make false assumptions about horizontal gene flow [75].
It is easier to explain the presence of rye or wheat sequences within the repeated JNK blocks.Research has shown that between the tandemly arranged sequences, there are other sequences, showing often very high similarity to gene sequences characteristic of other organisms.This is confirmed by numerous studies, and it is believed that mobile genetic elements are most frequently observed within repetitive sequences.An example is heterochromatin knobs in the corn genome composed of 180-bp tandem repeats that are separated by retrotransposon sequences [76] or location of the Athila retrotransposon at 180-bp repeats in pericentromeric regions of all Arabidopsis chromosomes [77].The presence of inserts between the repetitive sequences has been also found in rye chromosomes.Fragments were tested between the pSc200 and pSc250 monomers [78], and the obtained fragments were in the length range from 35 to 250 bp and showed similarity (80-90%) to various repetitive elements described earlier, such as barley BAGY-1 [79] or cereba retrotransposons [80], or sequences associated with telomeres in wheat [78,81].A fragment similar to the terminal portion of the transposase pseudogene Revolver-1 element has been identified among the sequence occurring within the JNK blocks.The Revolver sequence is present in high copy number in some species.Twenty thousand copies of this sequence were found in Dasypyrum villosum and Secale sp. as well as in diploid Triticum monococcum species or tetraploid wheat species [27].However, in the hexaploid wheat, the number of copies of Revolver element is very low [82].Another study has shown that the Revolver family of mobile elements in rye is highly dispersed and clones of this element are chromosomally-specific.It has been found that those elements possess transcriptional activity in rye.There is a vast variation in the copy number, length, and level of activity among these elements in different cereal genomes, clearly indicating that they are in constant evolution [27].Many mobile genetic elements are present in high copy number in the eukaryotic genomes, but the vast majority of them is inactive, and only a small part of them retains the ability to transpose [17,83].The Revolver element is characterized by a very high variation (especially with respect to the length of the sequence), number, and activity between the different cereal genomes, which indicates its constant evolution [27].During the evolution, the majority of inserted elements become truncated and non-functional.Many plants show the strong tendency to enlarge the size of the genome.Most often it is accomplished by amplification, rearrangements, differentiation, and insertions of repetitive sequences.The perfect example is large subtelomeric heterochromatin blocks in Secale, which consist of rye-specific repetitive sequences [84].During the evolution of tandemly repeated sequences, insertion of different types of sequences may occur.The evolution of heterochromatin blocks is coupled with an enrichment in foreign sequences which were inserted by transposition or RNA-dependent DNA transfer [85,86].Viral sequences are thought to integrate rarely to plant genome and it can be achieved either by the viral integrase or by illegitimate recombination [87].The newly inserted sequence starts to evolve together with the tandem repetition and may become a part of a new unit, which will be then amplified [88].Foreign sequences can spread throughout the genome by unequal recombination or gene conversion [85,89].Insertion of coding sequence into tandem repetitive sequences usually does not favor its expression, these regions are usually epigenetically silenced.This may protect genome from potential instability and genetic redundancy.In this work, we found different types of sequences inserted among tandem repetitive motifs.It is not clear whether this is the original site of their insertion or they appeared as a result of, e.g., irregular recombination.Due to the fact that unequal crossing-over is assumed to be the primary evolutionary force acting on satellite sequences [9], there is a high possibility that JNK sequence block is the secondary location of atypical insertions.Along with this, a new question arises, especially in the case of pseudogenes, whether they were initially functional or if there exist other copies in the rye genome.According to the literature data most of the transfers are nonfunctional and evolve neutrally [25][26][27].They can be characterized as dead-on-arrival horizontal gene transfer (doaHGT).Further studies are needed to answer these questions.

Conclusions
The study helped to better understand the structure of repetitive sequences of the JNK block.The work demonstrated a remarkable complexity of this heterochromatin region, even though it did not allow to understand the mechanism which would explain how foreign sequences were inserted within the blocks of repetitive sequences of the JNK family.It can be hypothesized that they are fragments of different genes or pseudogenes, woven into heterochromatin sequences that silence them, as it is known that the active pseudogenes can regulate the activity of genes by ncRNA.In addition, it can be assumed that the R173 sequence, which flanks the JNK block sequences, acts as a barrier that does not allow the spread of the heterochromatin state to adjacent regions [22].

Fig. 1 Fig. 2
Fig. 1 Comparative alignment of the identified sequence (query) to the sequence of the 26th chromosome of Leishmania braziliensis (GenBank accession No. FR799001.1).
Amplification products were separated in 1.5% agarose gel in 1× TBE buffer at 80 V for approximately 3 h.Midori Green DNA Stain (Nippon Genetics,