A SVR-based prediction server for MHC-binding peptides
 
Return to SVRMHC server main page

Performance comparison between SVRMHC models and "additive method" models

    We built “additive method” models for the 42 MHC molecules as described in (Doytchinova, et al., 2002; Doytchinova and Flower, 2003) , with the same dataset used to construct corresponding SVRMHC models. The 36 class I SVRMHC models produced an average cross-validated q2 of 0.414, compared to 0.294 for the 36 corresponding “additive” models. Thus, cross-validated q2 values for the SVRMHC models indicate a significant improvement over corresponding “additive” models (P<0.0001, Wilcoxon rank sum test). Next, we determined and removed outliers as described in (Doytchinova and Flower, 2002; Liu, et al., 2006) . After outlier removal, all models improved. The average cross-validated q2 of the 36 class I SVRMHC models was 0.471, in comparison to 0.487 for the 36 class I “additive” models. After outlier removal, cross-validated q2 values for the SVRMHC and “additive” models do not differ significantly (P=0.32, Wilcoxon rank sum test). The average number of outliers determined and removed by the SVRMHC models was 2.0, compared to 7.1 for the “additive method” models. This suggests that without removing outliers, class I SVRMHC models out perform class I “additive” models. After removal of outliers, SVRMHC models and “additive” models offered comparable performance, though a smaller number of outliers were removed for the SVRMHC models.

    The 6 class II SVRMHC models produced an average cross-validated r of 0.60, in comparison to 0.39 produced by the 6 class II “additive” models. The cross-validated r values of the class II SVRMHC models were significantly higher than those for the class II “additive” models (P=0.028, Wilcoxon rank sum test).

Table 1 Performance comparison of class I additive method models and SVRMHC models, with and without removal of outliers. All models are nomamer models unless marked otherwise.

MHC allele q2(without removal of outliers) q2(outliers removed) Number of outliers removed Original dataset size Cross validation type SVRMHC configuration
Additive SVR Additive SVR Additive SVR
A*0101 0.329 0.353 0.554 0.405 6 2 73 LOO Polynomial SPARSE
A*0201 0.308 0.485 0.432 0.542 12 6 725 7-fold CV RBF 11-factor
A*0202 0.147 0.273 0.375 0.273 6 0 97 LOO Polynomial SPARSE
A*0203 0.279 0.352 0.410 0.352 6 0 86 LOO Polynomial 11-factor
A*0204 -0.249 0.031 0.336 -0.029 2 1 38 LOO RBF 11-factor
A*0206 0.338 0.380 0.338 0.380 0 0 80 LOO RBF SPARSE

A*0207 (decamer model)

0.616 0.682 0.616 0.682 0 0 31 LOO Polynomial 11-factor
A*0301 0.415 0.534 0.415 0.534 0 0 89 LOO RBF 11-factor
A*0302 0.151 0.208 0.170 0.208 22 0 38 LOO Polynomial SPARSE
A1 0.349 0.382 0.612 0.437 6 2 81 LOO Polynomial SPARSE
A11 0.230 0.336 0.510 0.371 22 2 166 LOO RBF 11-factor
A*1101 0.140 0.206 0.259 0.216 3 1 105 LOO RBF 11-factor
A2 0.228 0.342 0.348 0.479 16 16 751 7-fold CV RBF 11-factor
A24 0.211 0.378 0.535 0.378 3 0 61 LOO RBF 11-factor
A3 0.175 0.357 0.513 0.373 19 2 171 LOO RBF 11-factor
A31 0.392 0.395 0.483 0.544 2 2 68 LOO RBF 11-factor
A*3101 0.431 0.743 0.431 0.743 0 0 40 LOO Polynomial 11-factor
A33 0.176 0.245 0.292 0.350 4 2 55 LOO Polynomial SPARSE
A*3301 0.176 0.245 0.292 0.350 4 2 55 LOO Polynomial SPARSE
A68 0.372 0.421 0.573 0.499 8 4 143 LOO RBF SPARSE
A*6801 0.258 0.408 0.595 0.453 4 1 76 LOO Polynomial 11-factor
A*6802 0.273 0.344 0.370 0.461 3 2 67 LOO RBF SPARSE
B*0702 0.345 0.422 0.615 0.422 4 0 115 LOO RBF 11-factor
B35 0.273 0.382 0.643 0.481 6 2 90 LOO RBF 11-factor
B*3501 0.147 0.260 0.449 0.585 6 5 76 LOO Polynomial SPARSE
B51 0.332 0.507 0.513 0.557 5 1 80 LOO RBF 11-factor
B53 0.439 0.508 0.559 0.508 2 0 68 LOO RBF SPARSE
B*5301 0.439 0.508 0.559 0.508 2 0 68 LOO Polynomial SPARSE
B54 0.178 0.468 0.600 0.635 20 2 81 LOO Polynomial 11-factor
B*5401 0.178 0.468 0.600 0.635 20 2 81 LOO Polynomial 11-factor
B7 0.442 0.543 0.667 0.545 6 1 161 LOO RBF SPARSE
H-2Db 0.371 0.552 0.561 0.677 11 5 104 LOO Polynomial 11-factor
H-2Kb (octamer model) 0.145 0.280 0.529 0.469 22 8 73 LOO RBF 11-factor
H-2Kk (octamer model) 0.492 0.763 0.497 0.763 1 0 154 LOO RBF 11-factor
Mamu-B*17 0.640 0.653 0.640 0.653 0 0 111 LOO RBF SPARSE
Patr-A*0602 0.408 0.476 0.625 0.514 2 1 39 LOO RBF SPARSE

Table 2 Performance comparison of class II additive method models and SVRMHC models. All models are nomamer models.

MHC allele r Dataset size Cross validation type SVRMHC configuration
Additive SVRMHC
DRB1*0401 0.5274 0.6124 368 5-fold CV Polynomial SPARSE
DRB1*0101 -0.09225 0.6342 260 5-fold CV RBF 11-factor
DRB1*1501 0.5288 0.7076 174 5-fold CV RBF 11-factor
DQA1*0501 0.5134 0.5812 99 5-fold CV Polynomial SPARSE
DRB1*0405 0.46 0.4802 77 5-fold CV Linear SPARSE
DRB5*0101 0.404 0.5894 75 5-fold CV Polynomial SPARSE

References

Doytchinova, I.A., Blythe, M.J. and Flower, D.R. (2002) Additive method for the prediction of protein-peptide binding affinity. Application to the MHC class I molecule HLA-A*0201, J Proteome Res , 1 , 263-272.

Doytchinova, I.A. and Flower, D.R. (2002) Physicochemical explanation of peptide binding to HLA-A*0201 major histocompatibility complex: a three-dimensional quantitative structure-activity relationship study, Proteins , 48 , 505-518.

Doytchinova, I.A. and Flower, D.R. (2003) Towards the in silico identification of class II restricted T-cell epitopes: a partial least squares iterative self-consistent algorithm for affinity prediction, Bioinformatics , 19 , 2263-2270.

Liu, W., Meng, X., Xu, Q., Flower, D.R. and Li, T. (2006) Quantitative prediction of mouse class I MHC peptide binding affinity using support vector machine regression (SVR) models, BMC Bioinformatics , 7 , 182.

 

Send questions and comments to SVRMHC@biolead.org.
Copyright 2006-13 Biolead.org. All rights reserved.