SVRMHCdb

We built “additive method” models for the 42 MHC molecules as described in (Doytchinova, et al., 2002; Doytchinova and Flower, 2003) , with the same dataset used to construct corresponding SVRMHC models. The 36 class I SVRMHC models produced an average cross-validated q2 of 0.414, compared to 0.294 for the 36 corresponding “additive” models. Thus, cross-validated q2 values for the SVRMHC models indicate a significant improvement over corresponding “additive” models (P<0.0001, Wilcoxon rank sum test). Next, we determined and removed outliers as described in (Doytchinova and Flower, 2002; Liu, et al., 2006) . After outlier removal, all models improved. The average cross-validated q2 of the 36 class I SVRMHC models was 0.471, in comparison to 0.487 for the 36 class I “additive” models. After outlier removal, cross-validated q2 values for the SVRMHC and “additive” models do not differ significantly (P=0.32, Wilcoxon rank sum test). The average number of outliers determined and removed by the SVRMHC models was 2.0, compared to 7.1 for the “additive method” models. This suggests that without removing outliers, class I SVRMHC models out perform class I “additive” models. After removal of outliers, SVRMHC models and “additive” models offered comparable performance, though a smaller number of outliers were removed for the SVRMHC models.

The 6 class II SVRMHC models produced an average cross-validated r of 0.60, in comparison to 0.39 produced by the 6 class II “additive” models. The cross-validated r values of the class II SVRMHC models were significantly higher than those for the class II “additive” models (P=0.028, Wilcoxon rank sum test).

Table 1 Performance comparison of class I additive method models and SVRMHC models, with and without removal of outliers. All models are nomamer models unless marked otherwise.

MHC allele	q2(without removal of outliers)		q2(outliers removed)		Number of outliers removed		Original dataset size	Cross validation type	SVRMHC configuration
MHC allele	Additive	SVR	Additive	SVR	Additive	SVR	Original dataset size	Cross validation type	SVRMHC configuration
A*0101	0.329	0.353	0.554	0.405	6	2	73	LOO	Polynomial SPARSE
A*0201	0.308	0.485	0.432	0.542	12	6	725	7-fold CV	RBF 11-factor
A*0202	0.147	0.273	0.375	0.273	6	0	97	LOO	Polynomial SPARSE
A*0203	0.279	0.352	0.410	0.352	6	0	86	LOO	Polynomial 11-factor
A*0204	-0.249	0.031	0.336	-0.029	2	1	38	LOO	RBF 11-factor
A*0206	0.338	0.380	0.338	0.380	0	0	80	LOO	RBF SPARSE
A*0207 (decamer model)	0.616	0.682	0.616	0.682	0	0	31	LOO	Polynomial 11-factor
A*0301	0.415	0.534	0.415	0.534	0	0	89	LOO	RBF 11-factor
A*0302	0.151	0.208	0.170	0.208	22	0	38	LOO	Polynomial SPARSE
A1	0.349	0.382	0.612	0.437	6	2	81	LOO	Polynomial SPARSE
A11	0.230	0.336	0.510	0.371	22	2	166	LOO	RBF 11-factor
A*1101	0.140	0.206	0.259	0.216	3	1	105	LOO	RBF 11-factor
A2	0.228	0.342	0.348	0.479	16	16	751	7-fold CV	RBF 11-factor
A24	0.211	0.378	0.535	0.378	3	0	61	LOO	RBF 11-factor
A3	0.175	0.357	0.513	0.373	19	2	171	LOO	RBF 11-factor
A31	0.392	0.395	0.483	0.544	2	2	68	LOO	RBF 11-factor
A*3101	0.431	0.743	0.431	0.743	0	0	40	LOO	Polynomial 11-factor
A33	0.176	0.245	0.292	0.350	4	2	55	LOO	Polynomial SPARSE
A*3301	0.176	0.245	0.292	0.350	4	2	55	LOO	Polynomial SPARSE
A68	0.372	0.421	0.573	0.499	8	4	143	LOO	RBF SPARSE
A*6801	0.258	0.408	0.595	0.453	4	1	76	LOO	Polynomial 11-factor
A*6802	0.273	0.344	0.370	0.461	3	2	67	LOO	RBF SPARSE
B*0702	0.345	0.422	0.615	0.422	4	0	115	LOO	RBF 11-factor
B35	0.273	0.382	0.643	0.481	6	2	90	LOO	RBF 11-factor
B*3501	0.147	0.260	0.449	0.585	6	5	76	LOO	Polynomial SPARSE
B51	0.332	0.507	0.513	0.557	5	1	80	LOO	RBF 11-factor
B53	0.439	0.508	0.559	0.508	2	0	68	LOO	RBF SPARSE
B*5301	0.439	0.508	0.559	0.508	2	0	68	LOO	Polynomial SPARSE
B54	0.178	0.468	0.600	0.635	20	2	81	LOO	Polynomial 11-factor
B*5401	0.178	0.468	0.600	0.635	20	2	81	LOO	Polynomial 11-factor
B7	0.442	0.543	0.667	0.545	6	1	161	LOO	RBF SPARSE
H-2Db	0.371	0.552	0.561	0.677	11	5	104	LOO	Polynomial 11-factor
H-2Kb (octamer model)	0.145	0.280	0.529	0.469	22	8	73	LOO	RBF 11-factor
H-2Kk (octamer model)	0.492	0.763	0.497	0.763	1	0	154	LOO	RBF 11-factor
Mamu-B*17	0.640	0.653	0.640	0.653	0	0	111	LOO	RBF SPARSE
Patr-A*0602	0.408	0.476	0.625	0.514	2	1	39	LOO	RBF SPARSE

Table 2 Performance comparison of class II additive method models and SVRMHC models. All models are nomamer models.

Doytchinova, I.A., Blythe, M.J. and Flower, D.R. (2002) Additive method for the prediction of protein-peptide binding affinity. Application to the MHC class I molecule HLA-A*0201, J Proteome Res , 1 , 263-272.

Doytchinova, I.A. and Flower, D.R. (2002) Physicochemical explanation of peptide binding to HLA-A*0201 major histocompatibility complex: a three-dimensional quantitative structure-activity relationship study, Proteins , 48 , 505-518.

Doytchinova, I.A. and Flower, D.R. (2003) Towards the in silico identification of class II restricted T-cell epitopes: a partial least squares iterative self-consistent algorithm for affinity prediction, Bioinformatics , 19 , 2263-2270.

Liu, W., Meng, X., Xu, Q., Flower, D.R. and Li, T. (2006) Quantitative prediction of mouse class I MHC peptide binding affinity using support vector machine regression (SVR) models, BMC Bioinformatics , 7 , 182.