Rapid and sensitive virulence prediction and identification of Newcastle disease virus genotypes using third-generation sequencing

Newcastle disease (ND) outbreaks are global challenges to the poultry industry. Effective management requires rapid identification and virulence prediction of the circulating Newcastle disease viruses (NDV), the causative agent of ND. However, these diagnostics are hindered by the genetic diversity and rapid evolution of NDVs. A highly sensitive amplicon sequencing (AmpSeq) workflow for virulence and genotype prediction of NDV samples using a third-generation, real-time DNA sequencing platform is described using both egg-propagated virus and clinical samples. 1D MinION sequencing of barcoded NDV amplicons was performed on 33 egg-grown isolates, (23 unique lineages, including 15 different NDV genotypes), and from 15 clinical swab samples from field outbreaks. Assembly-based data analysis was performed in a customized, Galaxy-based AmpSeq workflow. For all egg-grown samples, NDV was detected and virulence and genotype were predicted. For clinical samples, NDV was detected in ten of eleven NDV samples. Six of the clinical samples contained two mixed genotypes, of which the MinION method detected both genotypes in four of those samples. Additionally, testing a dilution series of one NDV sample resulted in detection of NDV with a 50% egg infectious dose (EID50) as low as 101 EID50/ml. This was accomplished in as little as 7 minutes of sequencing time, with a 98.37% sequence identity compared to the expected consensus. The high sensitivity, fast sequencing capabilities, accuracy of the consensus sequences, and the low cost of multiplexing allowed for identification of NDV of different genotypes circulating worldwide. This general method will likely be applicable to other infections agents.

Effective control of ND is dependent on rapid, sensitive, and specific diagnostic testing, 50 which for ND are typically oriented towards detection, genotyping, or prediction of virulence. 51 Virulence of NDV is best assayed through infection-based studies (4), but due to the time 52 constraints associated with such methods, reverse transcriptase-quantitative PCR (RT-qPCR) and 53 sequencing of the fusion (F) gene cleavage site are used to predict NDV virulence (5, 6). 54 Genotyping of NDV is commonly achieved through Sanger sequencing of the coding sequence of 55 the fusion gene (7), which also allows for prediction of virulence. Preliminary genotyping can be 56 accomplished through partial fusion gene sequencing (i.e., variable region) (8). PCR-based tests 57 aimed at rapid detection often lack applicability for virulence determination due to NDV's genetic 58 diversity, and the current methods that rely on Sanger sequencing lack multiplexing capability and 59 have limited sequencing depth, which complicates detection of mixed infections. While fusion-60 based assays can be used for detection (9), the variability of this region, which makes it useful for 61 genotyping, hinders the universal applicability of any single primer set. Thus, detection-focused 62 assays are often designed towards more conserved regions, such as the matrix or polymerase 63 genes(9-11). These assays, however, fail to provide virulence or genotype predictions. In 64 summary, there is a need for a method that will sensitively and rapidly detect numerous genotypes 65 of NDV and provide genotype and virulence prediction. 66 Rapid advances in nucleic acid sequencing, have led to different sequencing platforms (12, 67 13) being widely applied for identification of novel viruses (14), whole genome sequencing (15), 68 transcriptomics, and metagenomics (16, 17). However, high capital investments and relatively long 69 turnaround times limit the widespread use of these NGS platforms, especially in developing 70 countries (18). Recent improvements in third-generation sequencing, including those introduced sequences (e.g., Zika virus (26) and poxviruses (21)) by sequencing PCR amplicons (AmpSeq). 80 The MinION, therefore, represents an opportunity to take infectious disease diagnostics a step 81 further and to perform rapid identification and genetic characterization of infectious agents at a 82 lower cost. 83 As with any deep sequencing platform, the sequence analysis approach is integral for 84 accurate interpretation. Primarily, two approaches for taxonomic profiling of microbial sequencing 85 data have been employed: read-based and de novo assembly-based classifications. Read-based 86 metagenomic classification software has been used for identification of microbial species from 87 high-throughput sequencing data (19,(27)(28)(29). Although the sequencing accuracy of the MinION 88 is improving, the raw single-read error rate of nearly 10% (30) may limit the accuracy of this 89 approach for Nanopore data (27), especially when attempting to subspecies level differentiation. 90 De novo approaches that use quality-based filtering and clustering of reads (31), or use consensus-91 based error correction of Nanopore sequencing reads have been reported (32); however, these are 92 not optimized for amplicon sequencing data.

93
In this study, a specific, sensitive, rapid protocol, using the MinION sequencer, was 94 developed to detect representative isolates from all current (excluding the Madagascar-limited 95 genotype XI) genotypes of NDV. This protocol was also tested on a limited number of clinical 96 swab samples collected from chickens during disease outbreaks. Additionally, a Galaxy-based, de 97 novo AmpSeq workflow is presented that efficiently reduces systematic sequencing errors in 98 Nanopore sequencing data and uses amplicon-based sequences to obtain accurate final consensus 99 sequences.  Table S1 and Table S2, respectively.

111
Total RNA from each sample was extracted from infectious allantoic fluids or directly from  Approximately 20 ng (in 5 µl) of RNA was reverse transcribed, and cDNA was amplified 117 with target-specific primers using the SuperScript™ III One-Step RT-PCR System (Thermo Fisher quantitative polymerase chain reaction (RT-qPCR) assay, both methods were run on a dilution 147 series from a single isolate. NDV (LaSota strain) from the SEPRL repository was cultured in SPF 148 9-11-days-old eggs and the harvested allantoic fluids were diluted to titers ranging from 10 6 to 10 1 149 EID50/mL in brain-heart infusion broth. RNA was extracted from dilutions, and DNA libraries 150 were prepared following the same protocols as described above. Amplicons from each of the extractions, library construction, and sequencing were performed twice.

154
The same extracted RNA was also used as the input into the RT-qPCR using the AgPath-

155
ID one-step RT-PCR Kit (Ambion, USA) on the ABI 7500 Fast Real-Time PCR system following 156 the previously described protocols (9).  Table S3. 166 The complete steps from RNA isolation to MinION sequencing were performed twice for  To determine the accuracy of consensus sequences at different sequencing time points for 171 accurate identification of the NDV genotypes, the raw data (FAST5 files) obtained from the 10-172 fold serial dilution experiment (see above) were analyzed in subgroups based on time of 173 acquisition and processed through the AmpSeq workflow as described below.

174
MinION data analysis workflow 175 To analyze the Nanopore sequencing data, a custom, assembly-based AmpSeq workflow 176 within the Galaxy platform interface (35) was developed, as diagrammed in Figure 1.

265
Quality metrics 266 The Nanopore QC tool was used to obtain quality metrics plots of all sequencing runs. For

271
In addition, analysis of five consecutive batches of reads (each batch = 20,000 reads) obtained at 272 different time intervals from run 4 indicated that the overall mean read quality for each 20,000 read 273 batch remained above 10 (Table S4). Similarly, the mean Q≥10 over time remained consistent in the 274 clinical sample runs (runs 5-7), which had long (12 hrs) sequencing runs ( Figure S3, green lines).    To confirm the ability of the MinION-acquired partial matrix and fusion gene sequences 316 to be used for accurate analysis of evolutionary relatedness, phylogenetic analysis using consensus 317 sequences (734 bp) obtained from two independent MinION runs (run 3 and 4) was performed.

318
Additionally, the 24 sequences from MiSeq were also included in the phylogenetic tree ( Figure 2) 319 to further illustrate the agreement between these two sequencing methods. In the phylogenetic tree, 320 the isolates (n = 33; green font) grouped together with the viruses that showed highest nucleotide 321 sequence identity to them, including those in which MiSeq sequences were available (red font).

322
The six isolates that were sequenced twice (blue font) clustered together.