1、面向缺陷的软件系统可靠性管理规范的研究寇纲电子科技大学经济与管理学院基于数据挖掘和多目 标 决策的 软 件 风险评 估和管理1. What can be done and what options are available?2. What are the associated trade-offs in terms of all costs, benefits, and risks? 3. What are the impacts of current management decisions on future options?1. What can go wrong?2. What is
2、 the likelihood that it could go wrong?3. What are the consequences?4. What is the time domain?Kaplan and Garrick 1981Haimes 1991Risk Communication(Data Mining)Risk Communication(MCDM)Yacov Haimes 2009No Free Lunch (NFL) theorem “if algorithm A outperforms algorithm B on some cost functions, then lo
3、osely speaking there must exist exactly as many other functions where B outperforms A” (Wolpert and Macready, 1995). In other words, there exists no single classifier that could achieve the best performance for all measures. Approach 1 overview Aim: Design a performance metric that combines various
4、measures to evaluate the quality of classifiers for software defect prediction; Data: 11 datasets from NASA MDP repository; Tool: WEKA Techniques: StatisticApproach 1 Classifiers Trees: classification and regression tree (CART), Nave Bayes tree, and C4.5 Functions: linear logistic regression, radial
5、 basis function (RBF) network, sequential minimal optimization (SMO), Support Vector Machine (SVM), and Neural Networks Bayesian classifiers: Bayesian network and Nave Bayes lazy classifiers: K-nearest-neighbor Rules: decision table and Repeated Incremental Pruning to Produce Error Reduction (RIPPER
6、) rule inductionApproach 1 Step 1For a specific dataset i (i=1,2,11), and a specific performance measure j (j=1,2,13),Do t test for pairs of classifiers (k=1, 2,13):(the statistical significance is set as 0.05) Measure jC_1C_2. C_13 SUM rankingC_1 0 1C_2 0 0C_13 0I f C_1 performs better atmeasure j
7、than C_2.The top three ranking classifiers are assigned to the score of 3, 2, and 1, respectively.Approach 1 Step 2 For a specific dataset i: Dataset i Ranking(C_1)Ranking(C_2) Ranking(C_13)Measure 1Measure 1Measure 13Sum_rankThe larger the “Sum_rank”, the better the classifier is. The value of “Sum
8、_rank” is normalized.SumApproach 1 Step 3 For a specific dataset i: Dataset Sum_rank (C_1)Sum_rank C_2) Sum_rank(C_13)Dataset 1Dataset 2Dataset 11ScoreSumThe lager the score, the better the classifier.Approach 1 ResultsClassifiers Dataset CM1 JM1 KC1 KC3 KC4 MC1 MW1 PC1 PC2 PC3 PC4 Scorebayes.BayesN
9、et 0.032258 0.066667 0 0.109091 0.107692 0.104167 0.129032 0.111111 0.108108 0.140625 0.016393 0.925145bayes.NaiveBayes 0 0.05 0 0 0.092308 0.0625 0.145161 0 0.047297 0.0625 0 0.459766Functions.LibSVM 0.290323 0.350.355932 0.490909 0.092308 0.125 0.241935 0.222222 0.148649 0.296875 0.344262 2.958415
10、Functions.Logistic 0 0.05 0 0 0 0.020833 0.112903 0.047619 0.054054 0.046875 0.147541 0.479826functions.MultilayerPerceptron 0.048387 00.050847 0.036364 0 0 0.064516 0.142857 0.101351 0.03125 0.04918 0.524753functions.RBFNetwork 0.032258 0 0 0 0 0.0625 0.032258 0.095238 0.054054 0.03125 0 0.307558Fu
11、nctions.SMO 0.064516 0.10.050847 0.072727 0 0.0625 0.096774 0.095238 0.087838 0.09375 0.098361 0.822552lazy.IBk 0.209677 0.250.305085 0.2 0.138462 0.09375 0.048387 0.126984 0.027027 0.09375 0.081967 1.575089rules.DecisionTable 0.096774 0.0166670.033898 0.036364 0.030769 0.072917 0.048387 0.015873 0.
12、074324 0.0625 0 0.488473rules.JRip 0.209677 0 0 0 0 0 0 0 0.060811 0 0 0.270488trees.J48 0.016129 0.050.135593 0.018182 0.276923 0.09375 0.080645 0.111111 0.074324 0.09375 0.213115 1.163522trees.NBTree 0 0.066667 0 0.036364 0.092308 0.177083 0 0.031746 0.087838 0.03125 0.04918 0.572436trees.SimpleCart 0 00.067797 0 0.169231 0.125 0 0 0.074324 0.015625 0 0.451977