Skip to main content

Table 3 Performance of different classification models

From: A computational approach to identify point mutations associated with occult hepatitis B: significant mutations affect coding regions but not regulative elements of HBV

input

encoding

model

feature

selection

average

AUC

(st. dev.)

average

accuracy

(st. dev.)

average

TNR

(st. dev.)

average

TPR

(st. dev.)

bases

RF

none

0.847 (0.100)

83.948 (5.024)

0.220 (0.230)

0.992 (0.021)

triplet

RF

none

0.847 (0.097)

83.952 (4.820)

0.224 (0.222)

0.990 (0.022)

bases

RF

Fisher

0.699 (0.160) *

81.462 (5.812)

0.215 (0.226)

0.962 (0.055)

triplet

RF

Fisher

0.759 (0.127)

81.781 (5.306)

0.234 (0.219)

0.961 (0.047) *

bases

LR

Fisher/AIC

0.670 (0.134) *

81.310 (6.036)

0.283 (0.220)

0.943 (0.054) *

triplet

LR

Fisher/AIC

0.680 (0.147) *

81.343 (6.878)

0.324 (0.226)

0.934 (0.058) *

bases

DT

none

0.570 (0.137) *

79.862 (5.904)

0.143 (0.217)

0.960 (0.059)

triplet

DT

none

0.549 (0.106) *

80.136 (4.850) *

0.130 (0.203)

0.967 (0.059)

bases

RI

none

0.574 (0.094) *

80.662 (5.054)

0.190 (0.219)

0.958 (0.072)

triplet

RI

none

0.579 (0.110) *

79.943 (5.442)

0.215 (0.249)

0.943 (0.074)

  1. * worse than RF on whole set of bases at p < 0.05
  2. AUC: area under the receiver operating characteristic; RF: random forest; LR: logistic regression; DT: decision tree; RI: rule induction; TNR: true negative rate; TPR: true positive rate; AIC: Akaike information criterion.