Viral metagenomics revealed diverse CRESS-DNA virus genomes in faeces of forest musk deer

Background Musk deer can produce musk which has high medicinal value and is closely related to human health. Viruses in forest musk deer both threaten the health of forest musk deer and human beings. Methods Using viral metagenomics we investigated the virome in 85 faeces samples collected from forest musk deer. Results In this article, eight novel CRESS-DNA viruses were characterized, whole genomes were 2148 nt–3852 nt in length. Phylogenetic analysis indicated that some viral genomes were part of four different groups of CRESS-DNA virus belonging in the unclassified CRESS-DNA virus, Smacoviridae, pCPa-like virus and pPAPh2-like virus. UJSL001 (MN621482), UJSL003 (MN621469) and UJSL017 (MN621476) fall into the branch of unclassified CRESS-DNA virus (CRESSV1–2), UJSL002 (MN621468), UJSL004 (MN621481) and UJSL007 (MN621470) belong to the cluster of Smacoviridae, UJSL005 (MN604398) showing close relationship with pCPa-like (pCRESS4–8) clusters and UJSL006 (MN621480) clustered into the branch of pPAPh2-like (pCRESS9) virus, respectively. Conclusion The virome in faeces samples of forest musk deer from Chengdu, Sichuan province, China was revealed, which further characterized the diversity of viruses in forest musk deer intestinal tract.


Introduction
Forest musk deer is a national protected animal, mainly distributed in Sichuan province, Guangxi province and other places, China [1,2]. The death of forest musk deer occurs mainly in the young musk deer. Diseases were the most important factor in causing fawn death [3]. There have been studies on the diagnosis and prevention of some known diseases [4][5][6][7], but there is a lack of research on the unknown etiology.
In this study, the virus community in the intestinal tract of forest musk deer was analyzed by virus metagenomics. The results of this study put forward for the first time on CRESS-DNA viruses propagating among forest musk deer.

Samples
In 2016, 85 forest musk deer faeces samples were collected from Chengdu, Sichuan province, China. Samples were collected by disposable materials and transported to the laboratory on dry-ice and store in the − 80°C refrigerator. Samples were put into 1.5 ml tubes containing phosphate buffered saline (PBS). The supernatants of fecal samples were collected after vigorous eddy current for 5 min and centrifugation for 10 min (15,000 g) [36,37].

Viral metagenomic analysis
500 μl of supernatant was filtered through a 0.45 μm filter (Millipore) to remove eukaryotic and bacterial cell sized particles. The viral particle enrichment filtrate was then treated with uncleases to digest nonparticle protected nucleic acid at 37°C for 90 min [38]. Remaining total nucleic acid, protected from digestion with in viral capsids, were then extracted using the QiaAmp Mini Viral RNA kit (Qianen) according to manufacturer's protocol [37,39,40]. Eight separate pools of nucleic acids from 85 faecal specimens were generated randomly, of which six contained ten faecal apecimens, the other one contained 12 faecal specimens and another one contained 13 faecal specimens. These eight viral nucleic acid pools, containing both DNA and RNA viral sequences, were then subjected to RT reactions with SuperScript III reverse transcriptase (Invitrogen) and 100 pmol of a random hexamer primer, followed by a single round of DNA synthesis using Klenow fragment polymerase [37,41]. Eight libraries were constructed using Nextera XT DNA Sample Preparation Kit (Illumina) and sequenced using the MiSeq Illumina platform with 250 bases paired ends with dual barcoding for each library. The data is processed using an internal analysis pipeline running on a 32-nodes Linux cluster. Clonal reads were removed, and low quality sequence tails were trimmed with Phred quality score ten as the threshold. The adapter is trimmed using of VecScreen's default parameters, NCBI BLASTn, with specialized parameters designed for adapter removal [42]. After deleting repeated reads and reads less than 50 in length followed by de novo assembly [43]. The contigs and singlets were matched against an internal viral proteome database using BLASTx with an E-value cutoff of < 10-5. BLASTx were used to identify viral sequences in order to annotated viral proteins available in GenBank's viral RefSeq database [44].

Genome acquisition and PCR screening
Putative open reading frames (ORFs) in the circular genomes were predicted by Geneious software version 2019.0.3 [45], and the stem-loop in the circular genomes were located through the The Mfold [24] (Table 1 and Fig. 1b). If the whole genome sequence of the virus was not obtained through sequence reads analysis, inverse PCR was needed. Two whole genomes of UJSL004 and UJSL005 were acquired by screen PCR and inverse PCR. Primers are shown in an additional file (see Additional file 1). The PCR conditions in screen PCR were: 95°C for 5 min, 31 cycles 95°C for 30 s, 50°C (for the first round) or 57°C (for the second round) for 30 s and 72°C for 40 s, a final extension at 72°C for 5 min, resulting in an expected amplicon of 300 bp-500 bp. The PCR conditions in inverse PCR of UJSL004 were: 95°C for 5 min, 35 cycles 95°C for 30 s, 50°C (for the first round) or 55°C (for the second round) for 30 s and 72°C for 1.5 min, a final extension at 72°C for 5 min, resulting in an expected amplicon of 1000 bp. The PCR conditions in inverse PCR of UJSL005 were: 95°C for 5 min, 35 cycles 95°C for 30 s, 50°C (for the first round) or 51°C (for the second round) for 30 s and 72°C for 1.5 min, a final extension at 72°C for 5 min, resulting in an expected amplicon of 1000 bp.

Phylogenetic analysis
The Rep protein sequences of these novel virus were homology alignment with the reference sequences in GenBank using the ClustalW program in MEGA7.0. Phylogenetic analyses were constructed using fulllength rep protein of novel virus and other genetically close relatives [22,46]. Save the aligned sequence as a Nexus form file, which was used to construct the phylogenetic tree using Bayes' theorem in Mrbayes3.2.7 program. Using mixed models and Markov chain Monte Carlo (MCMC) methods. In phylogenetic analyses, tree samples are typically most divergent, so we introduced the average standard deviation of split frequencies (ASDSF) in MrBayes to allow quantitative evaluation of similarity among these samples. MrBayes allow users to set cut-off frequency (default value 0.10, [47][48][49]). We used the "sump" and "sumt" commands to get more detailed diagnostic information after the run has completed.

Results
The 85 faeces samples of the eight libraries generated a total of 6, 153, 736 unique sequence reads using illumine Miseq sequencing runs with 250 base pair terminals. The Ensemble program was used to read the de novo assembly sequence [43] and BLASTx was used to compare it with Genbank's non-redundant protein database.  Table 2). Three complete CRESS-DNA genomes showing the highest identity to Smacovirus. Genomes were 2665 nt (UJSL002, from library 3), 2866 nt (UJSL004, from library 5) were obtained through inverse PCR, and 2526 nt (UJSL007, from library 9) in length, respectively. Figure 1a manifested the genomic organization of UJSL002, UJSL004   Table 2). A complete CRESS-DNA genome showing the highest sequence identity to Circoviridae. Genome was 3852 nt (UJSL005, from library 6) in length. UJSL005 genome was acquired through inverse PCR based on a large contigs from library 6 and Sanger sequencing. Figure 1a indicated the genomic organization of UJSL005, where the predicted Rep and Cap of the UJSL005 in the opposite direction. BLASTp search in GenBank based on the protein sequence of Rep showed UJSL005 shared the highest sequence identity of 32.63% to unclassified Circoviridae (NC_026635.1) ( Table 2).
Based on the alignment of the Rep amino acid sequences herein detected with the best matches of BLASTp search in GenBank and those of representative CRESS-DNA genomes including 6 groups of unclassified CRESS-DNA virus (CRESSV1-6), two GasCSV-like viruses, Bacterial plasmids (pCRESS1-9) and a small group of Eukaryotic plasmids (P. pulchra plasmids) from GenBank, a phylogenetic tree was constructed [50][51][52]. For phylogenetic analyses, we used a dataset with 672 sequences of the Rep amino acid (Fig. 2) (Additional file 2). UJSL001, UJSL003 and UJSL017 fall into the branch of unclassified CRESS-DNA virus (CRESSV1-2), UJSL001 and UJSL003 belong to the cluster of CRESSV2, UJSL001 showing close relationship with CRESS_AUM21936, UJSL003 showing close relationship with CRESS_AXH77830 (Fig. 3a) (see Additional file 3) and UJSL017 belong to the cluster of CRESSV1, showing close relationship with CRESSV1_ KJ206566 and CRESSV1_KU043411 (Fig. 3e) (see Additional file 7). UJSL002, UJSL004 and UJSL007 belong to the cluster of Smacoviridae (Fig. 3b) (see Additional file 4), UJSL005 fall into the branch showing close relationship with pCPa-like (pCRESS4-8) clusters ( Fig. 3c) (see Additional file 5) and UJSL006 fall into pPAPh2-like (pCRESS9) clusters, showing close relationship with pCRESS9_KXT29032 ( Fig. 3d) (see Additional file 6).  (CRESSV1-6), two GasCSV-like viruses, Bacterial plasmids (pCRESS1-9) and a small group of Eukaryotic plasmids (P. pulchra plasmids). All clads are shown with curves and the names are shown beside the corresponding clades. Viruses identified in this study were labeled with red colored dots and the virus names and sequence accession numbers are shown in green arrow and text box beside their corresponding

Nucleotide sequence accession numbers
The viral genomes described in detail here were deposited in GenBank under the following accession numbers: MN604398, MN621468-MN621470, MN621480-MN621482 and MN621476.

Discussion
Our report describes viral nucleic acids enriched in forest musk deer faeces, shows that CRESS-DNA virus sequences are present in all libraries and have the most reads compared to other viruses. This suggests that these viruses are likely to replicate in forest musk deer host cells, but there is no evidence for this. Based on phylogenetic analysis, four different groups of CRESS-DNA genomes in forest musk deer faeces were detected, which belonged to CRESS-DNA virus, Smacoviridae, pCPa-like virus (pCRESS4-8) and pPAPh2-like (pCRESS9). For the first time, CRESS-DNA virus in the faeces of forest musk deer was mentioned, which was beneficial to further understanding of the genetic and evolutionary diversity of these viruses.
CRESS-DNA viruses with small, circular replicationassociated protein (Rep)-encoding single stranded (CRESS) DNA genomes, are largely identified based on conserved rolling circle replication proteins [11]. It consists of a large group of highly specific viruses that can infect many types of host [53]. These virus included: Circoviridae [39], which can infect vertebrates, Geminiviridae [14] and Nanoviridae [54] which can infect plants.
The genomes of Circoviridae range in size from 1.7 to 2.1 kb and contain two major ORFs, which encode Rep and Cap proteins. According to the International Committee on Taxonomy of Viruses (ICTV), the ssDNA has genomes between 1.7-6 kb. Eight CRESS-DNA virus extracted in this study, the genomes range in size from 2.1 kb to 3.5 kb. Previous research on the stem-loop structure of diverse circovirus and cycloviruses, a highly conserved stem-loop structure is also found [31,52,55], because they study multiple viruses of the same genus. Eight viruses in our study located in different genera based on rep protein phylogenetic analysis, so the stemloop structure sequences are different from each other.
In the recent years, a large number of CRESS-DNA genomes have been determined in human and any other mammals, birds, insects, plants, fungi, and environment samples which bringing to light a high level of genetic diversity among these virus [25,26,31,33,52,56]. Although use metagenomics to identify these viruses from forest musk deer faeces, we cannot rule out that they may also represent food contaminants and environmental pollution [57]. These viruses exploit host polymerases for DNA synthesis and code for proteins that modulate the host's cell cycle favourably for virus multiplication [58]. There are reports that the virus is associated with disease, but it has not been proven to cause the disease directly [59,60]. And the effects and disease correlation of these viruses on the health of forest musk deer need further study.
In conclusion, this study is the first to discover a variety of new CRESS-DNA viruses in the intestinal tract of forest musk deer and analyze their genomic characteristics, which is of great significance for the study of forest musk deer virus and the genetic and evolutionary diversity of CRESS-DNA virus. At the same time, the host adaptability and pathogenicity of the new CRESS-DNA virus need further study.

Conclusions
The virome in faeces samples of forest musk deer from Chengdu included the viruses showing sequence similarity to CRESS-DNA viruses, where eight divergent genomes of CRESS-DNA viruses were identified in detail. The contents include genome protein structure, stemloop structure and rep protein phylogenetic analysis. Although CRESS-DNA virus is prevalent in forest musk deer, its pathogenicity has not been known. This study increased the knowledge of the diversity of viruses in forest musk deer faeces.