Significant features. The survey of features leading to significant improvement
in the siRNA efficacy was performed on a dataset consisting of 3669 siRNA experiments (Set A). A total of 276 binary features (including all known
features implicated to have impact on siRNA efficacy) were used in the survey, and the Wald test of
monotone trend and two permutation tests of odds ratios (one for the achieving of >70% efficacies, the other for the achieving of >90% efficacies) were performed for each feature. A feature was regarded significant if the P-value for the Wald test is less than 0.01, and at least one of the P-values for the permutation tests is less than 0.01 in the mean time. To improve the confidence of the significant features obtained, an additional measure was taken, in that 200 bootstrapped datasets were generated from Set A. The survey of significant features was performed on each of the 200 bootstrapped datasets. Only if a feature was found significant in at least 95% of the bootstrapped datasets, this feature was determined to be significant. The non-redundant list of significant features found in this survey is shown in the table below:
Feature Index |
Feature Names |
F1 |
1st nucleotide=G |
F2 |
6th nucleotide=T |
F3 |
9th nucleotide=C |
F4 |
17th nucleotide=A |
F5 |
18th nucleotide≠C |
F6 |
19th nucleotide=A |
F7 |
At least three (A/U)s in the seven nucleotides at the 3' end |
F8 |
No occurrences of four or more identical nucleotides in a row |
F9 |
No occurrences of G/C stretches of length 7 or longer |
F10 |
G/C content is between 35 and 60% |
F11 |
Binding energy of N16-N19 > -9 KCal/Mol |
F12 |
Local folding potential (mean)
≥ -22.31 KCal/Mol |
F13 |
Target site is not on the 5'UTR |
Standard rulesets. The rule sets obtained obtained through investigating the combinations of 13 significant features (F1 - F13), followed by DRM analysis (merging and organizing the rules).
RS2 :
Feature |
F1 |
F2 |
F3 |
F4 |
F5 |
F6 |
F7 |
F8 |
F9 |
F10 |
F11 |
F12 |
F13 |
Rule 1 |
|
√ |
√ |
√ |
√ |
√ |
|
|
|
|
|
√ |
|
Rule 2 |
|
√ |
√ |
√ |
|
√ |
|
|
|
|
√ |
√ |
|
Rule 3 |
√ |
√ |
√ |
√ |
√ |
√ |
|
|
|
|
|
|
|
Rule 4 |
√ |
√ |
√ |
√ |
√ |
|
|
|
|
|
√ |
√ |
|
Rule 5 |
√ |
√ |
√ |
√ |
|
√ |
|
√ |
|
|
|
√ |
|
Rule 6 |
√ |
√ |
√ |
√ |
|
|
|
√ |
|
|
√ |
√ |
|
Rule 7 |
|
√ |
√ |
√ |
√ |
√ |
|
√ |
|
|
√ |
|
|
High sensitivity rulesets. Rule-based siRNA design tools are often prone to low sensitivity, i.e., they tend to produce rather limited numbers of candidate siRNAs for a given gene. The DRM rule sets are no exception in this regard. One strategy of countering the low sensitivity problem is to organize the rules into disjunctive sets, and use the rule sets, rather than individual rules, to select siRNAs, as the sensitivity of a disjunctive rule set is approximately the sum of the sensitivity of all rules included in the rule set. This strategy is, in fact, already incorporated into the DRM methodology.
Another strategy to improve the sensitivity of the rule sets is focused on restricting the occurrences of features with lower carrying rates. Among the significant features identified in the survey are 5 direct sequence features F1 , F2 , F3 , F4 and F6 , each of which holds a distinctly low expected carrying rate of 25%. This is because each of the 4 nucleotides A, U, G and C can appear at any position of a 19-mer sequence with roughly equal probabilities. Thus, the feature F1 : 1st nucleotide = G is expected to be carried by only 25% of all 19-mer sequences. The same reasoning applies to the other four features listed above. We repeated the feature combination analysis and rule merging procedure, with an additional restriction that no more than 3 of the 5 features of low carrying rates should occur in any rule. As expected, the resulting rule sets demonstrate substantially improved sensitivity. These rule sets are denoted as RS_HS1 through RS_HS4 (where HS stands for "high sensitivity").
RS_HS1:
Feature |
F1 |
F2 |
F3 |
F4 |
F5 |
F6 |
F7 |
F8 |
F9 |
F10 |
F11 |
F12 |
F13 |
Rule 1 |
√ |
|
√ |
√ |
|
|
|
√ |
|
|
√ |
√ |
|
Rule 2 |
|
√ |
√ |
√ |
|
|
|
√ |
|
|
√ |
√ |
|
Rule 3 |
√ |
|
√ |
√ |
√ |
|
|
|
|
√ |
√ |
√ |
√ |
RS_HS4:
Feature |
F1 |
F2 |
F3 |
F4 |
F5 |
F6 |
F7 |
F8 |
F9 |
F10 |
F11 |
F12 |
F13 |
Rule 1 |
|
|
√ |
√ |
|
√ |
|
|
|
|
√ |
|
|
Rule 2 |
|
|
√ |
√ |
√ |
√ |
|
|
|
√ |
|
|
|
Rule 3 |
√ |
|
√ |
√ |
|
|
|
|
|
|
√ |
√ |
|
Rule 4 |
|
|
√ |
√ |
√ |
√ |
|
|
|
|
|
√ |
|
Rule 5 |
|
√ |
√ |
√ |
|
|
|
|
|
|
√ |
√ |
|
Rule 6 |
√ |
|
√ |
√ |
|
|
|
√ |
|
|
√ |
|
|
Rule 7 |
|
√ |
√ |
√ |
|
|
|
√ |
|
|
√ |
|
|
Rule 8 |
√ |
√ |
√ |
|
|
|
|
√ |
|
√ |
|
|
√ |
Rule 9 |
|
|
√ |
√ |
√ |
√ |
√ |
√ |
|
|
|
|
|
Fast rulesets. The calculation of the values of different features takes different amount of time. For all features except F12 ( Local folding potential (mean) = -22.31 KCal/Mol ), the calculation of the feature values can be completed almost instantly, but the computing of F12 values costs much longer time due to the secondary structure calculation (using MFold) involved. For an mRNA of an average length, it typically takes about 1-2 hours for the F12 calculation to complete. In seeking of rule sets that can be calculated more rapidly, we repeated the feature combination and rule merging analysis, with F12 excluded from the list of feature used. The result is a bundle of "fast rule sets", which are denoted as RS_Fast1 through RS_Fast4.
RS_Fast2:
Feature |
F1 |
F2 |
F3 |
F4 |
F5 |
F6 |
F7 |
F8 |
F9 |
F10 |
F11 |
F12 |
F13 |
Rule 1 |
√ |
√ |
√ |
√ |
√ |
√ |
|
|
|
|
|
|
|
Rule 2 |
√ |
|
√ |
√ |
|
√ |
|
√ |
|
|
√ |
|
|
Rule 3 |
|
√ |
√ |
√ |
√ |
√ |
|
√ |
|
|
|
|
|
RS_Fast4:
Feature |
F1 |
F2 |
F3 |
F4 |
F5 |
F6 |
F7 |
F8 |
F9 |
F10 |
F11 |
F12 |
F13 |
Rule 1 |
√ |
|
√ |
√ |
|
√ |
|
|
|
|
√ |
|
|
Rule 2 |
|
√ |
√ |
√ |
√ |
√ |
|
|
|
|
|
|
|
Rule 3 |
√ |
√ |
√ |
√ |
|
|
|
√ |
|
√ |
|
|
|
Rule 4 |
√ |
|
√ |
√ |
|
√ |
√ |
√ |
|
|
|
|
|
Rule 5 |
√ |
√ |
√ |
√ |
|
√ |
|
√ |
|
|
|
|
|
Rule 6 |
√ |
√ |
√ |
√ |
|
|
|
√ |
|
|
√ |
|
|
Rule 7 |
|
√ |
√ |
√ |
|
√ |
|
√ |
|
|
√ |
|
|
Filter for innate Immune responses. It has been reported that siRNA duplexes can activate innate immune responses by interacting with certain toll-like receptors on the cell surface or in the endosomes (Hornung, et al., 2005; Judge, et al., 2005). The invoking of these responses requires the presence of specific motifs, 5'-UGUGU-3' or 5'-GUCCUUCAA-3' in the guide strand of siRNA duplexes. This effect is found in only a small proportion of the gene silencing experiments, and transcribed shRNAs are believed to be insusceptible to these responses. A filter is put in place for avoiding the occurrences of these two motifs in the selected siRNAs.
Filter for cell toxic effects. In a recent study, the presence of the motif 5'-UGGC-3' in the siRNA duplex was found to lead to strong cell toxicity (Fedorov, et al., 2006). A filter is put in place in the siDRM design tool to avoid the presence of this motif in the selected siRNAs.
Filter for off-target activities.
Substantial off-target inhibition can take place when an siRNA and a non-targeted gene have exact sequence matches for all but the last two positions, which are tolerant to mismatches (Birmingham, et al., 2006; Dahlgren, et al., 2008). Off-target inhibition can also be induced when matches at the seed region (positions 2-7 and 2-8) are followed by several mismatches nucleotides (Birmingham, et al., 2006; ; Jackson, et al., 2006). For each candidate siRNA, siDRM checks and reports if (i) its sequence has full homology to the whole transcript (5'UTR or CDS or 3'UTR), ii) its 17-mer subsequence (all but the last two positions) has full homology to the 3'UTR region of another transcript, (iii) its seed region (positions 2-8) has full homology to the 3'UTR region of another transcript, or (iv) its seed region (position 2-8) has full homology to the 3'UTR region of another transcript, and this homology region is followed by four consecutive mismatches.
Updated Analysis
Additional analysis of the updated DRM rule sets conducted in March 2007 can be found here.
References
Birmingham, A., Anderson, E.M., Reynolds, A., Ilsley-Tyree, D., Leake, D., Fedorov, Y., Baskerville, S., Maksimova, E., Robinson, K., Karpilow, J., Marshall, W.S. and Khvorova, A. (2006) 3' UTR seed matches, but not overall identity, are associated with RNAi off-targets, Nat Methods , 3 , 199-204.
Dahlgren, C., Zhang, H.Y., Du, Q., Grahn, M., Norstedt, G., Wahlestedt, C. and Liang, Z. (2008) Analysis of siRNA specificity on targets with double-nucleotide mismatches, Nucleic Acids Res.
Fedorov, Y., Anderson, E.M., Birmingham, A., Reynolds, A., Karpilow, J., Robinson, K., Leake, D., Marshall, W.S. and Khvorova, A. (2006) Off-target effects by siRNA can induce toxic phenotype, Rna , 12 , 1188-1196.
Holen, T., Moe, S.E., Sorbo, J.G., Meza, T.J., Ottersen, O.P. and Klungland, A. (2005) Tolerated wobble mutations in siRNAs decrease specificity, but can enhance activity in vivo, Nucleic Acids Res , 33 , 4704-4710.
Hornung, V., Guenthner-Biller, M., Bourquin, C., Ablasser, A., Schlee, M., Uematsu, S., Noronha, A., Manoharan, M., Akira, S., de Fougerolles, A., Endres, S. and Hartmann, G. (2005) Sequence-specific potent induction of IFN-alpha by short interfering RNA in plasmacytoid dendritic cells through TLR7, Nat Med , 11 , 263-270.
Jackson, A.L., Bartz, S.R., Schelter, J., Kobayashi, S.V., Burchard, J., Mao, M., Li, B., Cavet, G. and Linsley, P.S. (2003) Expression profiling reveals off-target gene regulation by RNAi, Nat Biotechnol , 21 , 635-637.
Jackson, A.L., Burchard, J., Schelter, J., Chau, B.N., Cleary, M., Lim, L. and Linsley, P.S. (2006) Widespread siRNA "off-target" transcript silencing mediated by seed region sequence complementarity, Rna , 12 , 1179-1187.
Judge, A.D., Sood, V., Shaw, J.R., Fang, D., McClintock, K. and MacLachlan, I. (2005) Sequence-dependent stimulation of the mammalian innate immune response by synthetic siRNA, Nat Biotechnol , 23 , 457-462.
Lin, X., Ruan, X., Anderson, M.G., McDowell, J.A., Kroeger, P.E., Fesik, S.W. and Shen, Y. (2005) siRNA-mediated off-target gene silencing triggered by a 7 nt complementation, Nucleic Acids Res , 33 , 4527-4535.
|