Skip to main content

Table 2 Brief description of ML-based bioinformatics platforms used in studying plant virus interactions

From: Application of machine learning in understanding plant virus pathogenesis: trends and perspectives on emergence, diagnosis, host-virus interplay and management

Name

Application

Input and output

Salient features

References

V-PIPE

Assess genetic diversity of viral population and ensure identification of true viral variants from high throughput data

Input: raw sequencing data (FASTQ format)

A hidden Markov model-based read aligner, ngshmmalign, is developed

[48]

Output: viral diversity in terms of single nucleotide variants, local and global viral haplotypes

NBSPred

Identify potential NBS-LRR and NBS-LRR like proteins

Input: Genome, transcripts and protein sequences

Gene prediction tool, Augustus2.7, is used to convert genomic sequences to protein sequences

[59]

Output: Identification of NBS-LRR and NBS-LRR like proteins

TransDecoder is used to convert transcripts sequences to protein sequences

(i) Frequency of aminoacids, dipeptides, tripeptides and multiplet; (ii) charge (iii) hydrophobicity are considered for the calculation of sequence compositional property

LOCALIZER

Predict the sub cellular localization of plant proteins and effector proteins encoded by plant-infecting fungus and oomycete

Input: sequence of plant proteins and eukaryotic effector proteins

Trained by support vector machine model

[69]

Output: (i) probability of localization of a protein in nucleus, chloroplast or mitochondria

Maximum range: 2000 sequences

(ii) Identification of transit peptides (for chloroplast and mitochondria) and nuclear localization signal (NLS)

MU-LOC

Predict the mitochondrial localization of plant proteins

Input: protein sequence (FASTA format)

The predictor has been trained using support vector machine and deep neural network

[70]

Output: sub cellular localization

pVsupPred

Predict RNA silencing suppressor activity of viral proteins (VSR)

Input: sequence of viral proteins

Random forest model guided tool

[73]

Output: (i) prediction score, (ii) Whether positive VSR or negative VSR

Prediction on the basis of presence of (i) GW/WG motif and (ii) dsRNA binding domain in the viral protein

Alphafold

Predict the structure of a protein

Input: amino acid sequence of a protein

Neural-network based model

[78]

Output: 3D structure of the protein

Median accuracy: (i) 6.6 Å for Alphafold, (ii) 1.5 Å for Alphafold2

Virfinder

Identify sequences of viruses from metagenomic data

Input: assembled metagenomic data

k-mer based prediction tool has been made using a trained logistic regression model

[81]

Output: true viral contigs