Application of machine learning in understanding plant virus pathogenesis: trends and perspectives on emergence, diagnosis, host-virus interplay and management

Ghosh, Dibyendu; Chakraborty, Srija; Kodamana, Hariprasad; Chakraborty, Supriya

doi:10.1186/s12985-022-01767-5

Virology Journal

Table 2 Brief description of ML-based bioinformatics platforms used in studying plant virus interactions

From: Application of machine learning in understanding plant virus pathogenesis: trends and perspectives on emergence, diagnosis, host-virus interplay and management

Name	Application	Input and output	Salient features	References
V-PIPE	Assess genetic diversity of viral population and ensure identification of true viral variants from high throughput data	Input: raw sequencing data (FASTQ format)	A hidden Markov model-based read aligner, ngshmmalign, is developed	[48]
V-PIPE		Output: viral diversity in terms of single nucleotide variants, local and global viral haplotypes		[48]
NBSPred	Identify potential NBS-LRR and NBS-LRR like proteins	Input: Genome, transcripts and protein sequences	Gene prediction tool, Augustus2.7, is used to convert genomic sequences to protein sequences	[59]
		Output: Identification of NBS-LRR and NBS-LRR like proteins	TransDecoder is used to convert transcripts sequences to protein sequences
			(i) Frequency of aminoacids, dipeptides, tripeptides and multiplet; (ii) charge (iii) hydrophobicity are considered for the calculation of sequence compositional property
LOCALIZER	Predict the sub cellular localization of plant proteins and effector proteins encoded by plant-infecting fungus and oomycete	Input: sequence of plant proteins and eukaryotic effector proteins	Trained by support vector machine model	[69]
		Output: (i) probability of localization of a protein in nucleus, chloroplast or mitochondria	Maximum range: 2000 sequences
		(ii) Identification of transit peptides (for chloroplast and mitochondria) and nuclear localization signal (NLS)	Maximum range: 2000 sequences
MU-LOC	Predict the mitochondrial localization of plant proteins	Input: protein sequence (FASTA format)	The predictor has been trained using support vector machine and deep neural network	[70]
MU-LOC	Predict the mitochondrial localization of plant proteins	Output: sub cellular localization		[70]
pVsupPred	Predict RNA silencing suppressor activity of viral proteins (VSR)	Input: sequence of viral proteins	Random forest model guided tool	[73]
pVsupPred		Output: (i) prediction score, (ii) Whether positive VSR or negative VSR	Prediction on the basis of presence of (i) GW/WG motif and (ii) dsRNA binding domain in the viral protein	[73]
Alphafold	Predict the structure of a protein	Input: amino acid sequence of a protein	Neural-network based model	[78]
Alphafold	Predict the structure of a protein	Output: 3D structure of the protein	Median accuracy: (i) 6.6 Å for Alphafold, (ii) 1.5 Å for Alphafold2	[78]
Virfinder	Identify sequences of viruses from metagenomic data	Input: assembled metagenomic data	k-mer based prediction tool has been made using a trained logistic regression model	[81]
Virfinder	Identify sequences of viruses from metagenomic data	Output: true viral contigs		[81]

Back to article page

ISSN: 1743-422X

Contact us

Submission enquiries: journalsubmissions@springernature.com