Entire genome sequence analysis of genotype IX Newcastle disease viruses reveals their early-genotype phylogenetic position and recent-genotype genome size

Background Six nucleotide (nt) insertion in the 5'-noncoding region (NCR) of the nucleoprotein (NP) gene of Newcaslte disease virus (NDV) is considered to be a genetic marker for recent genotypes of NDV, which emerged after 1960. However, F48-like NDVs from China, identified a 6-nt insert in the NP gene, have been previously classified into genotype III or genotype IX. Results In order to clarify their phylogenetic position and explore the origin of NDVs with the 6-nt insert and its significance in NDV evolution, we determined the entire genome sequences of five F48-like viruses isolated in China between 1946 and 2002 by RT-PCR amplification of overlapping fragments of full-length genome and rapid amplification of cDNA ends. All the five NDV isolates shared the same genome size of 15,192-nt with the recent genotype V-VIII viruses whereas they had the highest homology with early genotype III and IV isolates. Conclusions The unique characteristic of the genome size and phylogenetic position of F48-like viruses warrants placing them in a separate geno-group, genotype IX. Results in this study also suggest that genotype IX viruses most likely originate from a genotype III virus by insertion of a 6-nt motif in the 5'-NCR of the NP gene which had occurred as early as in 1940 s, and might be the common origin of genotype V-VIII viruses.

Phylogenetically, NDVs have been classified into two major divisions, class I and class II [8,12]. Class I NDVs with the genome size of 15,198-nt are occasionally isolated from wild aquatic birds and domestic poultry and all but one of them are avirulent [8,[13][14][15][16]. Class II viruses include most virulent and some avirulent NDVs: genotypes I-IV viruses are early lineage before 1960 with the genome size of 15,186-nt; whereas genotypes V-VIII are recent lineage after 1960 with the genome size of 15,192-nt [4,7,8,17,18]. Genotype I of class II contains mainly avirulent isolates from wild waterfowl and poultry species of the world; genotype II consists of North American isolates, which display different virulence ranging from lentogenic, mesogenic to velogenic; genotypes III and IV viruses represent early isolates from the Far East and Europe respectively during the first pandemic from mid 1920 s to late 1950 s; NDV strains isolated from the second pandemic during 1960 s and 1970 s belong to new genotypes V and VI; subtype VIb viruses are responsible for the third pandemic of pigeon origin during the 1980 s; novel genotypes of VIII and VII (many subgenotypes) which result in the fourth and latest pandemic have emerged since late 1980 s in the Far East, Europe, and South Africa [8,[19][20][21].
NDV strain F48 ("F48E8" or "F48E9" was used in previous publications in which E8 or E9 means the 8 th or 9 th egg-passage of the original virus) was isolated from a diseased chicken in Northern China in 1946 and has been used as standard challenge strain for vaccine evaluation in this country [21][22][23]. The phylogenetic grouping of F48-like viruses is controversial in the literature: genotype IX of class II by some researchers [14][15][16]21,23,24] while genotype III by others for their highest homology of F gene with genotype III viruses [8,25,26]. At all events, it is evident that genotype IX is a sister clade of genotype III isolates which emerged in 1930 s. On the other hand, F48-like viruses have the 6nt insert in the 5'-NCR of NP gene, which is considered to be a genetic marker of NDV strains emerged after 1960 [7,8]. However, the full-length genome of F48-like NDVs has not been determined. In order to clarify the phylogenetic position of F48-like viruses and explore the origin of NDVs with 6-nt insert and its significance in NDV evolution, five F48-like viruses isolated in China between 1946 and 2002 were characterized and sequenced.

Analysis of genome size
To determine the exact genome size of F48-like NDV isolates, the full-length genome sequences were compiled from sequences of nine overlapping cDNA fragments along with the sequences of the GC-rich region of NP gene and both ends of the genomes. Those sequences were submitted to GenBank and the accession number was FJ436302 -FJ436306. The results of sequencing displayed that these F48-like NDVs carried 6-nt insert in the 5'-NCR of the NP gene, the same as that of genotypes V-VIII NDVs which emerged after 1960 s (see Figure 1). Besides, no other insert or deletion was found when compared with all known NDV isolates. Therefore, the genome size of all the five genotype IX isolates was 15,192-nt, just as predicted before. Moreover, those 5 viruses isolated during 1948-2002 shared 99% nucleotide sequence identity of their genomes and the same 6-nt insert motif CCCCCC.

Phylogenetic analysis
Phylogenetic analysis of the five F48-like NDV strains together with NDVs representing the established genotypes was first performed using the variable region seqences (nt 47-420) of the F gene ( Figure 2). The tree consisted of two major divisions, class I and class II, the latter was further divided into two lineages, early and recent. The early lineage included five genotypes (I to IV and IX) while the recent lineage consisted of four genotypes (V to VIII). It is obvious that F48-like strains (genotype IX) were close to but diverged from the early genotypes III and IV strains, forming a separate subclade. Table 1 shows the range of F gene sequence similarity of NDV strains within one genotype and between different genotypes. The sequence similarity of F gene between genotype IX and III was the highest, ranging from 91.2% to 94.3%. The sequence similarity of other genes between genotype IX and III NDVs was also the highest when compared with those between genotype IX and other genotypes (data not shown).
Indeed, no matter which gene was used, the phylogenetic trees indicated very similar relationship of genetic groups. Figure 3 is the phylogenetic tree based on the entire genome sequences of the five F48-like NDVs in this study and those of other NDVs representing genotypes I through VIII which are available from the Gen-Bank. The phylogenetic position of the F48-like NDVs here is consistent with the tree in Figure 2. Genotype IX strains were also clustered into early lineage, closely related with but diverged from genotypes III and IV strains.

GC content of the genomic sequences
The GC content of the sequences of Newcastle disease virus is also an important molecular characteristic. In table 2, we calculated the GC content of different region from 25 strains of NDV, including the entire genome, 6 complete viral genes, and also the 5' NCR of NP gene in which extra 6 nt were detected. It was noted that the GC content of full-length genome of all the 25 NDV strains were similar, however, the GC content of 5' NCR of NP gene showed significant difference. The 5' NCR of NP gene of genotype IV-IX NDV strains displayed more than 60% GC content, while that of genotype I and II strains showed about 53% and 55% GC content. Interestingly, the GC content of Genotype III strains in the same region was about 58%, higher than genotype I and II NDVs but lower than genotype IV-IX NDVs.

Molecular characterization of F protein
All the genotype IX strains in this study as well as other genotype IX isolates whose F gene sequences are available in the Genbank displayed the F protein cleavage site motif as 112RRQRR↓F117, the same as that of genotype III-IV strains ( Figure 4). This finding is coincident with the biological characteristics of F48, which is used as the standard challenge strain in China (ICPI, 1.99). Moreover, F protein of genotype IX NDVs also had six potential N-glycosylation sites which were highly conserved among NDV isolates. The transmembrane (TM) and cytoplasmic regions of genotype IX NDVs contained several conserved substitutions and a non-conserved N for D substitution at residue 545.

Molecular characterization of HN protein
NDV strains of different genotypes show differences in the size of the HN protein which is the major determinant for virulence. The HN protein of all genotype IX strains is 571 amino acids long, the same size as HN of genotypes III-VIII viruses. The HN proteins of genotype IX strains contained all the six sites N-linked potential glycosylation sites at position 119, 341, 433, 481, 508, and 538 [27,28]. In addition, the HN of genotype IX NDVs contained positions E401, R41 and Y526 associated with receptor binding, and residues R174, R416 and R498 involving in NA activity [29][30][31].
Alignment of untranslated region of NDV genome Figure 5 shows the alignment of the leader (A) and trailer (B) sequences of genotype IX NDVs with those of other genotype strains. Genotype IX NDVs contained the same gene-start (GS) signal and gene-end (GE) signal which are highly conservative for all NDV strains. Besides, the NP-P intergenic region of only one nucleotide was G in most genotype I-II NDV strains, whereas it was A in genotype IX NDV strains as well as genotype III-VIII strains. Several unique nucleotide substitutions were found in trailer region of genotype IX viruses, for example, C15095, C15107, C15125, and C15151 ( Figure  5B).

Discussion
The outbreaks of the genotype V-VIII NDVs were still an enima in the history of NDV evolution. Where did those viruses come from? Did they evolve directly from genotype I, II or III? The genetic character that obviously differentiating those viruses from early genotypes was the six nucleotide (nt) insertion in the 5'-noncoding region (NCR) of the nucleoprotein (NP) gene.
In this study, the 5 viruses isolated from 1948 to 2002 displayed the genome size of 15,192 nt due to the same 6-nt insert CCCCCC in the NP gene. However, those NDV strains shared high identity with genotype III, and were obviously clustered in a sister clade of genotype III in phylogenetic trees (see Figure 2, 3). It is well known that genotype III is a typical "early" NDV geno-group, while 6-nt insert of NP is characteristic of "recent" genotypes [8]. That is to say, two contradictive genetic features were both identified in F48-like NDV strains, suggesting the transitional role for those viruses in NDV evolution.
It is reasonable to infer that genotype IX viruses most likely originate from an early genotype III virus by the insertion of a 6-nt motif in the 5'-NCR of the NP gene, and recent V-VIII genotypes may come from genotype IX viruses, or evolve directly from genotype III or IV viruses in the same way. There are several evidence can be provided to support this hypothesis.
Firstly, it was noteworthy that a classic genotype III strain Australia-Victoria/32 shared the highest sequence similarity with F48 strains. The nucleotide sequence identity of the F, HN and L gene of Australia Victoria/ 32 (AV/32) with F48 strains was 94.3%, 93.7% and 94.4% respectively. In previous studies, early lineage viruses AV/32 (genotype III) and Herts/33 (genotype IV) have been positioned as the possible progenitor of recent virulent strains according to the sequence and phylogenetic analysis based on the HN, M-F, M, P and L gene sequences respectively and also their early date of isolation [32][33][34][35][36]. Results in this study indicated that F48-like viruses (genotype IX) invariably shared the closest homology with AV/32 viruses and displayed the genome size of 15192-nt.
Secondly, F48 which was isolated from an ND outbreak in Northern China in 1946 [22] is the earliest NDV isolate known to have the 6-nt insert in the 5'-NCR of NP gene, which suggested that NDV strains with 15,192-nt genome size emerged as early as 1940 s, rather than 1960 s when genotype V strains came out.
The recent genotypes V-VIII strains, as early as they were first isolated, have displayed wide genetic distances and geographical distributions, which is indicative of a long period of evolution prior to the emergence of the recent viruses. On the other hand, the insertion of a 6nt motif is an rare event in NDV evolution in view of the extremely low probability of nucleotide addition or deletion in the genome RNA of Paramyxoviruses [37,38]. Thus, it was most possible that genotype IX NDV was the common origin of genotype V-VIII.
Thirdly, it is noteworthy that all the recent viruses are virulent and their HN protein is exclusively 571 amino acids long, suggesting that their common progenitor must possess those genetic characteristics. As described below, genotype IX displayed the F protein cleavage site motif of 112 RRQRR↓F 117 and a HN protein of 571 amino acids.
At last, the GC content of the full-length and partial genome of Newcastle disease virus was compared in this study (table 2). It was noted that most region of genomic sequences of all class II NDV strains shared similar GC content, however, the 5' NCR of NP gene, where the 6-nt insert was found, showed significant difference in GC content. All the viruses with 15,192-nt genome displayed more than 60% GC content, while that of most 15,186-nt genome strains were no than 50% in this region. Interestingly, F48-like viruses showed high GC content in the 5' NCR of NP gene, the same as that of recent genotypes, suggesting the relationship between genotype IX and IV-VIII.
Moreover, the significance of 6-nt insert in NP gene for NDV has never been explored. A phenomenon has been observed in this study: genotype IX viruses were isolated ranging from 1940 s to 2000 s; In comparison, the early genotype III viruses such as AV/32 from Australia, Miyadera/51 and Sato/30 from Japan and Mukteswar from India are prevalent in Australia and Asia during the first pandemic of ND before 1960 but no longer detected thereafter with the exception of Mukteswar which is used as a vaccine virus in some Asian The nucleotide sequence similarity of NDV strains were calculated by the MegAlign program in the Lasergene package (DNASTAR Inc. Madison, WI 53715, USA). 30 sequences of full-length genome together with more than one hundred sequences of complete F gene were involved.

Conclusion
Results in this study indicated that F48-like viruses are transitional class II NDVs with early-genotype phylogenetic position and recent-genotype genome size of 15,192-nt, which makes them to be a separate genogroup, genotype IX; genotype IX viruses most likely originate from an early genotype III virus by the insertion of a 6-nt motif in the 5'-NCR of the NP gene, and recent V-VIII genotypes may come from genotype IX viruses, or evolve directly from genotype III or IV viruses in the same way; and this insertion is an important event in NDV evolution which had occurred as early as in 1940 s.

Materials and methods
Viruses NDV strains for entire genome sequence analysis in this study are as follows: F48E8 (the 8 th egg-passaged stock of F48) was isolated from chicken outbreak in Northern China in 1946 [22]; FJ/1/85/Ch, ZJ/1/86/Ch, and JS/1/ 97/Ch were isolated from chickens in Eastern China in 1985, 1986 and 1997 respectively; strain JS/1/02/Du was isolated from a healthy duck in our laboratory in 2002 [21]. All the five strains have been characterized as virulent NDVs and assigned to genotype IX previously (detailed data see Table 3). They were grown in 10-dayold embryonated specific-pathogen-free (SPF) chicken eggs and the allantoic fluids were harvested and stored in -70°C until use.

Preparation of viral RNA and RT-PCR
Viral genomic RNA was directly extracted from the allantoic fluid of each isolate using a Trizol RNA extraction kit (Invitrogen, Carlsbad, CA), according to the manufacturer's instructions. The cDNA was reverse transcribed from viral RNA with 6-nt random primer or a specific primer 5'-ACC AAA CAG AGA ATC-3' complementary to the 3' end of the NDV genomic RNA. A set of nine primer pairs specific for genotype III and IX The GC content of the full-length genome as well as the 6 viral genes was calculated by the EditSeq program in the Lasergene package (DNASTAR Inc. Madison, WI 53715, USA). Data are shown as the average of each genotype ± standard deviation. isolates (see Table 4) were then used in PCR to generate successive and overlapping DNA fragments of each fulllength genome from 1 μl cDNA transcript. Detailed procedure of reverse transcription and PCR was performed without modification as described elsewhere [21].

Amplification of the GC-rich 5'-NCR of the NP gene by RT-PCR
In order to obtain the exact sequence of 5' NCR of NP gene, a 750 bp GC-rich PCR product was amplified with specific primers Pgc Forward (  Madison, WI) using a specific primer 5TLF, 5'-GTC CAT TCT GTG CAG AGA GTT TAG TGA G-3', which was located from 14,502 nt to 14,527 nt in the viral genome (mentioned in the direction of 3' end to 5' end of genomic RNA). The reaction mixture was incubated at 42°C for 60 min, and then the cDNA was treated with an equal volume of 0.6 N NaOH for 20 min at 60°C to hydrolyze the mRNA and denature the first-strand cDNA. After purification by using PCR purification kit (Axygen, Union City, CA), the cDNA was ligated with adaptor CL+ by T4 RNA ligase according to procedures as described in 3' RACE. The resulting adaptor-ligated cDNA was amplified using primer 5TLF and anti-adaptor primer CL-. Heminested PCR reaction was then conducted using the primer CL-and specific primer 5TSF, 5'-CAA TAC TGG GTC TCA GAG TCA AAA ATC-3', which was located from 14724 nt to 14751 nt in the viral genome (mentioned in the direction of 3' to 5' of genomic RNA), and then 1 μl of 1:100 diluted primary PCR product was used as template.
Cloning and sequencing of the amplified products RT-PCR products of overlapping fragments covering entire genome and GC-rich 5' NCR of NP gene and RT-PCR products by RACE were extracted from agarose gel, ligated into the TA cloning system (Promega, Madison, WI), then transferred into E. coli DH5α strain. At least four clones of each segment were sequenced in both directions using the ABI-3700-based (Applied Biosystems Inc.) fluorescent cycle sequencing technology by Sangon Biotechnology (Shanghai, China), and then the correct sequences were determined.

Sequence analysis
Prediction of amino acid sequences, aligment of sequences and phylogenetic analysis were conducted using the MegAlign program (Windows 32, MegAlign 4.00) in the Lasergene package (DNASTAR Inc. Madison, WI 53715, USA). The sequences of overlapping DNA fragments were aligned and compiled into complete genome. Phylogenetic analysis was performed by using the Lasergene software package and MEGA version 4 (Tamura, Dudley, Nei, and Kumar 2007). Data and accession numbers of complete genome sequences of NDV strains in this study were presented in additional file 1 Table S1. Additional F gene sequences used in Figure 2 were taken from the EMBL/GenBank, the origins of which were described previously [8].

Additional material
Additional file 1: Table S1: Background information of NDV strains with complete genome sequences used in this study. The genotyping, accession number and references of those NDV strains are shown.