sci 519 - hybrid classifiers (backmatter pages)978-3-642-40997-4/1.pdf · a appendix a.1 hypothesis...

A

Appendix

A.1 Hypothesis Testing

Let us assume that we would like to estimate the mean μ of normal densityfrom sample [314]

X = {xt}, where xt ∼ N(μ, σ2) (A.1)

and

m =

∑txt

N(A.2)

is a sum of normals and so is also normal m ∼ N(μ, σ2/N). Let us define astatistic with the unit normal distribution

√N

(n− μ)

σ∼ Z (A.3)

We know that 95% of Z lies in (−1.96, 1.96), i.e. P (−1.96 < Z < 1.96) = 0.95what is depicted in Fig. A.1. Thus

P (m− 1.96σ√N

< μ < m + 1.96σ√N

) (A.4)

It can be generalized for any required confidence.

P (−zα < Z < zα) = α for 0 < α < 1. (A.5)

Then instead of α = 0.05 we can use a given value and find correspond-ing value of zα in the probability of standardized normal distribution table.Similarly, we know that

P (Z < zα) = α for 0 < α < 1. (A.6)

182 A Appendix

−3 −2 −1 0 1 2 30

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

−1.96 1.96

Fig. A.1 Two-sided confidence interval with 95% confidence

i.e.,

P (m− zασ√N

< μ) = 1 − α (A.7)

For α = 0.05 zα = 1.64 what is depicted in Fig. A.2.

−3 −2 −1 0 1 2 30

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

1.64

Fig. A.2 One-sided upper confidence interval with 95% confidence

When σ2 is not known, then we can estimate the variance

S2 =∑t

(xt −m)2

(N − 1)(A.8)

And we can use the t-Student distribution tN−1 with the N − 1 degrees offreedom √

N(m− μ)

S∼ tN−1 (A.9)

A.1 Hypothesis Testing 183

and similarly we get for a given confidence α

P (−tα2 ,N−1

S√N

< μ < tα2 ,N−1

S√N

) = 1 − α (A.10)

and for the one-side test

P (μ < tα,N−1S√N

) = 1 − α (A.11)

The t-distribution has a longer tail than the unit normal distribution,therefore the interval given by the t is larger. We define a statistic thatis consistent with a considered distribution. If statistic calculated from thesample has low probability of being drawn from the distribution then wereject it, otherwise we accept it.

Let X = {xt} where xt ∼ N(μ, σ2) thenH0 : μ = μ0 - null hypothesesH1 : μ �= μ0 - alternative hypotheses

We can use the two-side test, i.e., we accept H0 with the significance levelα if

√N(m−μ0

σ ∈ (−zα2, zα

2).

If we would like to test the following hypothesis:H0 : μ � μ0 - null hypothesesH1 : μ > μ0 - alternative hypotheses

Then we have to use the one-side test i.e., we accept H0 with the signifi-cance level α if

√N(m−μ0

σ ∈ (−∞, zα).

If the variance is unknown then we simply use t-distribution and replace σby its estimator (A.8).

The test can reject a true hypotheses (type I error) or it can accept a falseone (type II error). The type I error equals the significance level α and it iscalled size of the test. The type II error is related to the power of the test,i.e., the power is increasing if the type II error is decreasing (type II error =1 - power).

184 A Appendix

A.2 Dataset Description

The Tab. A.1 presents the main characteristics of the datasets used duringexperiments presented in this book.

Table A.1 Details of datasets used in the experiments. Numbers in parenthesesindicates the number of objects in the minority class in case of binary problems.

No. Name Objects Features Classes Description

1 Abalone 4177 8 28 The dataset contains anonymizedpersonal census data used to pre-dict whether income of a personwill exceed 50K/y. Also known asCensus Income dataset [103].

2 Adult 48842 14 The dataset contains anonymizedpersonal census data used to pre-dict whether income of a personwill exceed 50K/y. Also known asCensus Income dataset [103].

3 Arcene 90 10000 The dataset contains mass-spectrometric data with can-cerous and normal patterns[103].

4 Audiology 226 69 24 The dataset contains diagnosticcases related to clinical audiology[19]

5 Balance Scale 625 4 3 The dataset includes model psy-chological experimental results[103].

6 Breast Cancer 286 (85) 9 2 The dataset contains diagnosticcases related to the breast cancerdomain. It was obtained from theUniversity Medical Centre, Insti-tute of Oncology, Ljubljana, Yu-goslavia [103].

A.2 Dataset Description 185

Table A.1 (continued)


7 Breast-Wisconsin

699 (241) 9 2 The dataset includes featureswhich are computed from a digi-tized image of a fine needle aspi-rate of a breast mass, which de-scribe characteristics of the cellnuclei present in the image [103].

8 Colic (Horsecolic)

368 (191) 22 2 The dataset contains observationabout horse colic disease (the sur-gical lesion was used as a class la-bel) [103].

9 Cone Torus 800(5) 2 3 The syntectic three-class datasetconsists of two-dimensionalpoints generated from the follow-ing distributions: a cone, half atorus and a normal distribution.The prior probabilities are 0.25,0.25, and 0.50 respectively [212].

10 Credit-rating 690 14 6 The dataset known as Stat-log (Australian Credit Approval)contains credit card applications[103].

11 CYP2C19 iso-form

837 (181) 242 2 The dataset is related to theability of drug-like chemicalsto inhibit drug metabolizingCYP2C19 isoforms. Dataset wassupplied as a part of data min-ing challenge by Simulation PlusInc. [1], and considered in severalarticles as [199].

186 A Appendix



12 Dermatology 366 33 6 The dataset contains data aboutthe differential diagnosis oferythemato-squamous diseases[103].

13 Diabetes 768 (268) 8 2 The dataset includes data fromdiabetes patient records were ob-tained from two sources: an auto-matic electronic recording deviceand paper records [103].

14 Ecoli 336 8 8 The dataset is devoted to theproblem of classifying proteinsinto their various cellular lo-calization sites based on theiramino–acid sequences [149].

15 (MAGIC)Gamma Tele-scope

19020 11 2 The dataset was generated bya Monte Carlo program, Cor-sika [142], to simulate registra-tion of high energy gamma par-ticles in a ground-based atmo-spheric Cherenkov gamma tele-scope using the imaging tech-nique.

16 Glass 214 9 6 The dataset consists of examplesrelated to 6 types of glass, whichare defined in terms of their oxidecontent (i.e. Na, Fe, K, etc.) [103].

17 Heart-c 303 13 5 The dataset concerns cases re-lated to heart disease diagnosisfrom Cleveland Clinic Founda-tion [103].

18 Heart-h 294 13 5 The dataset concerns cases re-lated to heart disease diagnosisfrom Hungarian Institute of Car-diology, Budapest [103].




19 Heart-statlog 270 (120) 13 2 The dataset is a heart diseasedatabase similar to the heart-c and hear-h, but presented aslightly different form [103].

20 Hepatitis 155 (32) 19 2 The dataset concerns problem odhepatitis deiagnosis. It also in-cludes information about dataacquisition cost [103].

21 Internet Adver-tisements

3279 1558 2 The dataset includes possible ad-vertisements on Internet pages[103].

22 Ionosphere 351(124) 34 2 The dataset consists of the radardata, which was collected by asystem in Goose Bay, Labrador[103]..

23 Iris 150 4 3 It is one of the most popularbenchmark datasets devoted tothe recognition of a type of irisplant [103].

24 LED 1000 7 10 The Dataset consists of digit rep-resentations on a 7 segment LEDdisplay. Problem is complicatedby adding noise which means thateach segment could be invertedwith a 10% probability [38]

188 A Appendix



25 Letter Recogni-tion

20000 16 26 The Dataset contains the fea-tures of hand written letters from26 capital letters in the Englishalphabet [103].

26 Liver 345 6 2 The Dataset contains the fea-tures of liver disorders (5 at-tributes are related to the bloodtest and one to the alcohol con-sumption) [103].

27 Lymphography 148 18 4 The dataset is related to the lym-phography tests, i.e., the radio-graphy of the lymphatic chan-nels and lymph nodes after in-jection of radiopaque material.The dataset was collected by theUniversity Medical Centre, Insti-tute of Oncology, Ljubljana, Yu-goslavia [103].

28 MammographicMass

961 6 2 The dataset contains a BI-RADSassessment, the patient’s age andthree BI-RADS attributes for 516benign and 445 malignant massesthat have been identified on fullfield digital mammograms col-lected at the Institute of Radiol-ogy of the University Erlangen-Nuremberg between 2003 and2006.[95].

29 Musk (version 2) 6598 168 2 The dataset describes a set ofmolecules of which are judged byhuman experts to be musks ornon-musks [103].

30 Ozone Level De-tection

2536 73 2 The dataset consists of twoground ozone level data sets col-lected from 1998 to 2004 at theHouston, Galveston and Brazo-ria area. One dataset is the eighthour peak set, the other is the onehour peak set [103].




31 Parkinsons 197 23 2 The dataset consists of biomedi-cal voice measurements from pa-tients with Parkinson’s disease[234].

32 Pima Indian Di-abetes

768 8 2 The datasets includes medicalrecords of females at least 21years old of Pima Indian heritage.The aim is to recognize if a pa-tient shows signs of diabetes ac-cording to World Health Orga-nization criteria. Additionally in-formation about data acquisitioncost is included [103].

33 Primary Tumor 339 17 21 The dataset contains diagnosticcases related to the oncology. Itwas obtained from the Univer-sity Medical Centre, Institute ofOncology, Ljubljana, Yugoslavia[103].

34 Promoter GeneSequences

106 58 2 The dataset concerns E. Colipromoter gene sequences (DNA)[103].

35 SEA depend onexperiment

3 2 The syntectic dataset whichcould include concept drift ap-pearance. Each object belongs tothe on of two classes and is de-scribed by 3 numeric attributeswith value between 0 and 10, butonly two of them are relevant.[348].

36 Sonar 208(97) 60 2 The dataset contains patterns ob-tained by bouncing sonar signalsoff a metal cylinder or rocks atvarious angles and under variousconditions [103].

190 A Appendix



37 SPECTF Heart 267 44 2 The dataset includes data oncardiac Single Proton Emis-sion Computed Tomography(SPECT) images [103].

38 Splice-junctionGene Sequences

3190 61 The dataset contains primatesplice-junction gene sequences(DNA) with associated imper-fect domain theory, used forthe recognition of exon/intronboundaries and intron/exonboundaries [103].

39 Thyroid 9172 21 3 The Dataset contains the fea-tures of thyroid disorders col-lected in the Garavan Institute ofSydney, Australia [103].

40 Waveform 5000 40 3 The dataset concerns the problemof recognizing 3 classes of wavesintroduced by Breiman et al. [38]

41 Voting records 435 (168) 16 2 The dataset includes votes foreach of the U.S. House of Rep-resentatives Congressmen on the16 key votes in 1984. [103].

42 Wine 178 13 3 The dataset includes results ofchemical analysis to determinethe origin of wines [103].

43 Vehicle Silhou-ettes

946 18 4 The database includes data abouta given silhouette as a type of ve-hicles, collected by the Turing In-stitute, Glasgow, Scotland. Thefeatures were extracted from thesilhouettes by the HIPS (Hierar-chical Image Processing System),which extracts a combination ofscale independent features apply-ing both classical moments basedmeasures such as scaled variance,skewness and kurtosis about themajor/minor axes and heuristicmeasures such as hollows, cir-cularity, rectangularity and com-pactness [103].

References

1. Simulation plus Inc., http://www.simulations-plus.com (accessed: July 1,2013)

2. Agrawal, R., Srikant, R.: Privacy-preserving data mining. SIGMODRec. 29(2), 439–450 (2000)

3. Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms.Mach. Learn. 6(1), 37–66 (1991)

4. Aha, D.W.: Lazy Learning. Kluwer Academic Publishers (1997)5. Ahmadzadeh, M.R., Petrou, M.: An expert system with uncertain rules based

on dempster-shafer theory. In: IEEE 2001 International on Geoscience andRemote Sensing Symposium, IGARSS 2001, vol. 2, pp. 861–863 (2001)

6. Alexandre, L.A., Campilho, A.C., Kamel, M.S.: Combining independent andunbiased classifiers using weighted average. In: ICPR, pp. 2495–2498 (2000)

7. Allwein, E.L., Schapire, R.E., Singer, Y.: Reducing multiclass to binary: a uni-fying approach for margin classifiers. J. Mach. Learn. Res. 1, 113–141 (2001)

8. Alpaydin, E.: Combined 5 x 2 cv f test for comparing supervised classificationlearning algorithms. Neural Computation 11(8), 1885–1892 (1999)

9. Alpaydin, E.: Introduction to Machine Learning, 2nd edn. The MIT Press(2010)

10. An, A., Cercone, N.: An empirical study on rule quality measures. In: Pro-ceedings of the 7th International Workshop on New Directions in Rough Sets,Data Mining, and Granular-Soft Computing, RSFDGrC 1999, pp. 482–491.Springer, London (1999)

11. Angiulli, F., Folino, G.: Distributed nearest neighbor-based condensation ofvery large data sets. IEEE Trans. Knowl. Data Eng. 19(12), 1593–1606 (2007)

12. Angluin, D.: Queries and concept learning. Mach. Learn. 2(4), 319–342 (1988)13. Ashlock, D.: Evolutionary Computation for Modeling and Optimization, 1st

edn. Springer, New York (2006)14. Back, T., Fogel, D., Michalewicz, Z.: Handbook of Evolutionary Computation.

Oxford Univ. Press (1997)15. Baena-Garcıa, M., del Campo-Avila, J., Fidalgo, R., Bifet, A., Gavalda, R.,

Morales-Bueno, R.: Early drift detection method. In: Fourth InternationalWorkshop on Knowledge Discovery from Data Streams (2006)

http://www.simulations-plus.com

192 References

16. Bakker, B., Heskes, T.: Clustering ensembles of neural network models. NeuralNetw. 16(2), 261–269 (2003)

17. Banfield, R.E., Hall, L.O., Bowyer, K.W., Kegelmeyer, W.P.: Ensemble di-versity measures and their application to thinning. Information Fusion 6(1),49–62 (2005)

18. Baram, Y.: Partial classification: the benefit of deferred decision. IEEE Trans-actions on Pattern Analysis and Machine Intelligence 20(8), 769–776 (1998)

19. Bareiss, R., Porter, B.W., Wier, C.C.: Protos: An exemplar-based learning ap-prentice. International Journal of Man-Machine Studies 29(5), 549–561 (1988)

20. Bartlett, P.L., Wegkamp, M.H.: Classification with a reject option using ahinge loss. J. Mach. Learn. Res. 9, 1823–1840 (2008)

21. Baruque, B., Corchado, E.: Fusion Methods for Unsupervised LearningEnsembles. Springer-Verlag New York, Inc. (2011)

22. Baruque, B., Porras, S., Corchado, E.: Hybrid classification ensemble us-ing topology-preserving clustering. New Generation Computing 29, 329–344(2011)

23. Bay, S.D.: Nearest neighbor classification from multiple feature subsets. Intel-ligent Data Analysis 3(3), 191–209 (1999)

24. Ben-Haim, Y., Tom-Tov, E.: A streaming parallel decision tree algorithm. J.Mach. Learn. Res. 11, 849–872 (2010)

25. Bergadano, F., Matwin, S., Michalski, R.S., Zhang, J.: Measuring quality ofconcept descriptions. In: EWSL, pp. 1–14 (1988)

26. Beygelzimer, A., Kakade, S., Langford, J.: Cover trees for nearest neighbor.In: Proceedings of the 23rd International Conference on Machine Learning,ICML 2006, pp. 97–104. ACM, New York (2006)

27. Bi, Y.: The impact of diversity on the accuracy of evidential classifier ensem-bles. International Journal of Approximate Reasoning 53(4), 584–607 (2012)

28. Bifet, A., Gavalda, R.: Learning from time-changing data with adaptive win-dowing. In: Proceedings of the Seventh SIAM International Conference onData Mining, April 26-28. SIAM, Minneapolis (2007)

29. Bifet, A., Holmes, G., Pfahringer, B., Kirkby, R., Gavalda, R.: New ensemblemethods for evolving data streams. In: Proceedings of the 15th ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining, KDD2009, pp. 139–148. ACM, New York (2009)

30. Bifet, A., Holmes, G., Pfahringer, B., Read, J., Kranen, P., Kremer, H.,Jansen, T., Seidl, T.: MOA: A real-time analytics open source framework.In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECMLPKDD 2011, Part III. LNCS, vol. 6913, pp. 617–620. Springer, Heidelberg(2011)

31. Biggio, B., Fumera, G., Roli, F.: Bayesian analysis of linear combiners. In:Haindl, M., Kittler, J., Roli, F. (eds.) MCS 2007. LNCS, vol. 4472, pp. 292–301. Springer, Heidelberg (2007)

32. Bishop, C.M.: Pattern Recognition and Machine Learning (Information Sci-ence and Statistics). Springer-Verlag New York, Inc., Secaucus (2006)

33. B�laszczynski, J., Deckert, M., Stefanowski, J., Wilk, S.: Integrating selec-tive pre-processing of imbalanced data with ivotes ensemble. In: Szczuka,M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) RSCTC 2010.LNCS, vol. 6086, pp. 148–157. Springer, Heidelberg (2010)

References 193

34. Blum, A., Langley, P.: Selection of relevant features and examples in machinelearning. Artificial Intelligence 97(1-2), 245–271 (1997)

35. Bolton, R.J., David, J.H.: Statistical fraud detection: A review. StatisticalScience 17 (2002)

36. Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimalmargin classifiers. In: Proceedings of the Fifth Annual Workshop on Compu-tational Learning Theory, COLT 1992, pp. 144–152. ACM, New York (1992)

37. Brandt, S.: Data Analysis: Statistical and Computational Methods for Scien-tists and Engineers, 3rd edn. Springer, Heidelberg (1999)

38. Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and RegressionTrees. Wadsworth and Brooks, Monterey (1984)

39. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)40. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)41. Brodley, C.E., Utgoff, P.E.: Multivariate decision trees. Mach. Learn. 19(1),

45–77 (1995)42. Brown, G., Kuncheva, L.I.: “Good” and “Bad” diversity in majority vote

ensembles. In: El Gayar, N., Kittler, J., Roli, F. (eds.) MCS 2010. LNCS,vol. 5997, pp. 124–133. Springer, Heidelberg (2010)

43. Brown, G., Wyatt, J.L., Harris, R., Yao, X.: Diversity creation methods: asurvey and categorisation. Information Fusion 6(1), 5–20 (2005)

44. Brown, G., Wyatt, J.L., Tino, P.: Managing diversity in regression ensembles.J. Mach. Learn. Res. 6, 1621–1650 (2005)

45. Bruha, I., Kockova, S.: Quality of decision rules: Empirical and statisticalapproaches. Informatica (Slovenia) 17(3) (1993)

46. Bryll, R.K., Gutierrez-Osuna, R., Quek, F.K.H.: Attribute bagging: improv-ing accuracy of classifier ensembles by using random feature subsets. PatternRecognition 36(6), 1291–1302 (2003)

47. Buchanan, B.G., Shortliffe, E.H.: Rule Based Expert Systems: The MycinExperiments of the Stanford Heuristic Programming Project (The Addison-Wesley series in artificial intelligence). Addison-Wesley Longman PublishingCo., Inc., Boston (1984)

48. Burges, C.J.C.: A tutorial on support vector machines for pattern recognition.Data Min. Knowl. Discov. 2(2), 121–167 (1998)

49. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: Syn-thetic minority over-sampling technique. Journal of Artificial Intelligence Re-search 16, 321–357 (2002)

50. Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: Im-proving prediction of the minority class in boosting. In: Lavrac, N., Gam-berger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI),vol. 2838, pp. 107–119. Springer, Heidelberg (2003)

51. Cheeseman, P., Self, M., Kelly, J., Stutz, J., Taylor, W., Freeman, D.: Auto-Class: a Bayesian classification system. In: Machine Learning: Proceedings ofthe Fifth International Workshop. Morgan Kaufmann (1988)

52. Chen, B., Feng, A., Chen, S., Li, B.: One-cluster clustering based data de-scription. Jisuanji Xuebao/Chinese Journal of Computers 30(8), 1325–1332(2007)

53. Chen, S., He, H., Garcia, E.A.: Ramoboost: Ranked minority oversampling inboosting. IEEE Transactions on Neural Networks 21(10), 1624–1642 (2010)

194 References

54. Chen, X., Wasikowski, M.: Fast: A roc-based feature selection metric for smallsamples and imbalanced data classification problems. In: Proceedings of theACM SIGKDD International Conference on Knowledge Discovery and DataMining, pp. 124–132 (2008)

55. Chen, Y., Zhou, X.S., Huang, T.S.: One-class svm for learning in image re-trieval. In: IEEE International Conference on Image Processing, vol. 1, pp.34–37 (2001)

56. Chow, C.K.: Statistical independence and threshold functions. IEEE Trans-actions on Electronic Computers 14(1), 66–68 (1965)

57. Chu, F., Zaniolo, C.: Fast and light boosting for adaptive mining of datastreams. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS(LNAI), vol. 3056, pp. 282–292. Springer, Heidelberg (2004)

58. Clark, P., Boswell, R.: Rule induction with cn2: Some recent improvements.In: Kodratoff, Y. (ed.) EWSL 1991. LNCS, vol. 482, pp. 151–163. Springer,Heidelberg (1991)

59. Clark, P., Niblett, T.: The cn2 induction algorithm. Mach. Learn. 3(4), 261–283 (1989)

60. Clifton, C., Kantarcioglu, M., Vaidya, J., Lin, X., Zhu, M.Y.: Tools for privacypreserving data mining. SIGKDD Explorations 4(2), 28–34 (2002)

61. Cohen, G., Sax, H., Geissbuhler, A.: Novelty detection using one-class parzendensity estimator. an application to surveillance of nosocomial infections. In:Studies in Health Technology and Informatics, vol. 136, pp. 21–26 (2008)

62. Cordella, L.P., Foggia, P., Sansone, C., Tortorella, F., Vento, M.: A cascadedmultiple expert system for verification. In: Kittler, J., Roli, F. (eds.) MCS2000. LNCS, vol. 1857, pp. 330–339. Springer, Heidelberg (2000)

63. Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20(3),273–297 (1995)

64. Cost, S., Salzberg, S.: A weighted nearest neighbor algorithm for learning withsymbolic features. Mach. Learn. 10, 57–78 (1993)

65. Cover, T.M., Hart, P.: Nearest neighbor pattern classification 13, 21–27 (1967)66. Cover, T.M.: The best two independent measurements are not the two best.

IEEE Trans. Syst. Man, Cybern. SMC-4, 116–117 (1974)67. Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass

kernel-based vector machines. J. Mach. Learn. Res. 2, 265–292 (2002)68. Cunningham, P., Carney, J.: Diversity versus quality in classification ensem-

bles based on feature selection. In: Lopez de Mantaras, R., Plaza, E. (eds.)ECML 2000. LNCS (LNAI), vol. 1810, pp. 109–116. Springer, Heidelberg(2000)

69. Cyganek, B.: Image segmentation with a hybrid ensemble of one-class supportvector machines. In: Grana Romay, M., Corchado, E., Garcia Sebastian, M.T.(eds.) HAIS 2010, Part I. LNCS, vol. 6076, pp. 254–261. Springer, Heidelberg(2010)

70. Czogala, E., Leski, J.: Application of entropy and energy measures of fuzzinessto processing of ecg signal. Fuzzy Sets and Systems 97(1), 9–18 (1998)

71. Dai, Q.: A competitive ensemble pruning approach based on cross-validationtechnique. Knowledge-Based Systems (2012)

72. Dasarathy, B.V.: Nearest neighbor (NN) norms: nn pattern classification tech-niques. IEEE Computer Society Press tutorial. IEEE Computer Society Press(1991)

References 195

73. Dasarathy, B.V., Sheela, B.V.: A composite classifier system design: Conceptsand methodology. Proceedings of the IEEE 67(5), 708–713 (1979)

74. Dash, M., Liu, H.: Feature selection for classification. Intelligent Data Analy-sis 1, 131–156 (1997)

75. de Souto, M., Soares, R., Santana, A., Canuto, A.: Empirical comparisonof dynamic classifier selection methods based on diversity and accuracy forbuilding ensembles. In: IEEE International Joint Conference on IEEE WorldCongress on Computational Intelligence Neural Networks, IJCNN 2008, pp.1480–1487 (June 2008)

76. Deanand, P., Famili, A.: Comparative performance of rule quality measuresin an inductionsystem. Applied Intelligence 7(2), 113–124 (1997)

77. Dekel, O., Shamir, O.: Good learners for evil teachers. In: Proceedings of the26th Annual International Conference on Machine Learning, ICML 2009, pp.233–240. ACM, New York (2009)

78. Dekel, O., Shamir, O., Xiao, L.: Learning to classify with missing and cor-rupted features. Mach. Learn. 81(2), 149–178 (2010)

79. Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J.Mach. Learn. Res. 7, 1–30 (2006)

80. Devijver, P.A., Kittler, J.: Pattern recognition: a statistical approach. Pren-tice/Hall International (1982)

81. Didaci, L., Giacinto, G., Roli, F., Marcialis, G.L.: A study on the performancesof dynamic classifier selection based on local accuracy estimation. PatternRecognition 38(11), 2188–2191 (2005)

82. Dietterich, T.G.: Approximate statistical test for comparing supervised clas-sification learning algorithms. Neural Computation 10(7), 1895–1923 (1998)

83. Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli,F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)

84. Dietterich, T.G., Bakiri, G.: Solving multiclass learning problems via error-correcting output codes. J. Artif. Int. Res. 2, 263–286 (1995)

85. Domingos, P., Hulten, G.: A general framework for mining massive datastreams. Journal of Computational and Graphical Statistics 12, 945–949(2003)

86. Domingos, P.: The role of occams razor in knowledge discovery. Data Min.Knowl. Discov. 3(4), 409–425 (1999)

87. Domingos, P.: A few useful things to know about machine learning. Commun.ACM 55(10), 78–87 (2012)

88. Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings ofthe Sixth ACM SIGKDD International Conference on Knowledge Discoveryand Data Mining, KDD 2000, pp. 71–80. ACM, New York (2000)

89. Drucker, P.F.: The age of discontinuity; guidelines to our changing society, 1stedn. Harper and Row New York (1969)

90. Du, W., Zhan, Z.: Building decision tree classifier on private data. In: Pro-ceedings of the IEEE International Conference on Privacy, Security and DataMining. CRPIT, vol. 14, pp. 1–8. Australian Computer Society, Inc., Australia(2002)

91. Duan, K., Sathiya Keerthi, S., Chu, W., Shevade, S.K., Poo, A.N.: Multi-category classification by soft-max combination of binary classifiers. In:Windeatt, T., Roli, F. (eds.) MCS 2003. LNCS, vol. 2709, pp. 125–134.Springer, Heidelberg (2003)

196 References

92. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley,New York (2001)

93. Duin, R.P.W.: The combining classifier: to train or not to train? In: Pro-ceedings of 16th International Conference on Pattern Recognition, vol. 2, pp.765–770 (2002)

94. Ehrgott, M.: Multicriteria Optimization, 2nd edn. Springer (2005)95. Elter, M., Schulz-Wendtland, R., Wittenberg, T.: The prediction of breast

cancer biopsy outcomes using two cad approaches that both emphasize anintelligible decision process. Med. Phys. 34(11), 4164–4172 (2007)

96. Fancourt, C.L., Principe, J.C.: Modeling time dependencies in the mixture ofexperts. In: 1998 IEEE International Joint Conference on Neural NetworksProceedings, IEEE World Congress on Computational Intelligence, vol. 3, pp.2324–2327 (1998)

97. Fawcett, T.: An introduction to roc analysis. Pattern Recogn. Lett. 27(8),861–874 (2006)

98. FCIP. On the Quasi-Minimal Solution of the General Covering Problem (1969)99. Fern, A., Givan, R.: Online ensemble learning: An empirical study. Mach.

Learn. 53(1-2), 71–109 (2003)100. Flach, P., Lavrac, N.: Rule induction. In: Berthold, M., Hand, D.J. (eds.)

Intelligent Data Analysis, pp. 229–267. Springer-Verlag New York, Inc., NewYork (2003)

101. Fleiss, J.L., Cuzick, J.: The reliability of dichotomous judgments: unequalnumbers of judgments per subject. Applied Psychological Measurement 4(3),537–542 (1979)

102. Fong, P.K., Weber-Jahnke, J.H.: Privacy preserving decision tree learning us-ing unrealized data sets. IEEE Trans. on Knowl. and Data Eng. 24(2), 353–364(2012)

103. Frank, A., Asuncion, A.: UCI machine learning repository (2010),http://archive.ics.uci.edu/ml

104. Freitas, A.A., Lavington, S.H.: Mining Very Large Databases with ParallelProcessing, 1st edn. Kluwer Academic Publishers, Norwell (1997)

105. Freund, Y.: Boosting a weak learning algorithm by majority. Inf. Com-put. 121(2), 256–285 (1995)

106. Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learn-ing and an application to boosting. In: Vitanyi, P.M.B. (ed.) EuroCOLT 1995.LNCS, vol. 904, pp. 23–37. Springer, Heidelberg (1995)

107. Friedman, J.H.: Another approach to polychotomous classification. Technicalreport, Department of Statistics, Stanford University (1996)

108. Friedman, J.H., Roosen, C.B.: An introduction to multivariate adaptive re-gression splines. Stat. Methods Med. Res. 4(3), 197–217 (1995)

109. Fumera, G., Roli, F.: A theoretical and experimental analysis of linear com-biners for multiple classifier systems. IEEE Transactions on Pattern Analysisand Machine Intelligence 27(6), 942–956 (2005)

110. Fumera, G., Pillai, I., Roli, F.: A two-stage classifier with reject option for textcategorisation. In: Fred, A., Caelli, T.M., Duin, R.P.W., Campilho, A.C., deRidder, D. (eds.) SSPR&SPR 2004. LNCS, vol. 3138, pp. 771–779. Springer,Heidelberg (2004)

111. Gaber, M.M., Yu, P.S.: Classification of changes in evolving data streamsusing online clustering result deviation. In: Proc. of Internatinal Workshop onKnowledge Discovery in Data Streams (2006)

http://archive.ics.uci.edu/ml

References 197

112. Gabrys, B., Ruta, D.: Genetic algorithms in classifier fusion. Appl. Soft Com-put. 6(4), 337–347 (2006)

113. Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: Anoverview of ensemble methods for binary classifiers in multi-class problems:Experimental study on one-vs-one and one-vs-all schemes. Pattern Recogni-tion 44(8), 1761–1776 (2011)

114. Gantz, J., Reinsel, D.: As the Economy Contracts, the Digital Universe Ex-pands. IDC Multimedia Whitepaper (2009)

115. Garcia, V., Mollineda, R.A., Sanchez, J.S.: On the k-nn performance in achallenging scenario of imbalance and overlapping. Pattern Analysis and Ap-plications 11(3-4), 269–280 (2008)

116. Gardner, A.B., Krieger, A.M., Vachtsevanos, G., Litt, B.: One-class noveltydetection for seizure analysis from intracranial eeg. Journal of Machine Learn-ing Research 7, 1025–1044 (2006)

117. Gavin, G., Velcin, J., Aubertin, P.: Privacy preserving aggregation of secretclassifiers. Trans. Data Privacy 4(3), 167–187 (2011)

118. Geman, S., Bienenstock, E., Doursat, R.: Neural networks and thebias/variance dilemma. Neural Comput 4(1), 1–58 (1992)

119. Giacinto, G., Roli, F., Fumera, G.: Design of effective multiple classifier sys-tems by clustering of classifiers. In: Proceedings of the 15th InternationalConference on Pattern Recognition, vol. 2, pp. 160–163 (2000)

120. Giacinto, G., Perdisci, R., Del Rio, M., Roli, F.: Intrusion detection in com-puter networks by a modular ensemble of one-class classifiers. Inf. Fusion 9,69–82 (2008)

121. Giacinto, G., Roli, F.: Design of effective neural network ensembles for imageclassification purposes. Image Vision Comput. 19(9-10), 699–707 (2001)

122. Giacinto, G., Roli, F.: Dynamic classifier selection based on multiple classifierbehaviour. Pattern Recognition 34(9), 1879–1881 (2001)

123. Giakoumakis, E., Papaconstantinou, G., Skordalakis, E.: Rule-based systemsand pattern recognition. Pattern Recogn. Lett. 5(4), 267–272 (1987)

124. Globerson, A., Roweis, S.: Nightmare at test time: robust learning by featuredeletion. In: Proceedings of the 23rd International Conference on MachineLearning, ICML 2006, pp. 353–360. ACM, New York (2006)

125. Goebel, K., Yan, W.: Choosing classifiers for decision fusion. In: Proceedingsof the Seventh International Conference on Information Fusion, pp. 563–568(2004)

126. Goldberg, D.E.: Genetic Algorithms in Search, Optimization and MachineLearning, 1st edn. Addison-Wesley Longman Publishing Co., Inc., Boston(1989)

127. Goldstein, M.: kn–nearest neighbor classification. IEEE Trans. Inf.Theor. 18(5), 627–630 (1972)

128. Gong, Y., Hu, G., Pan, Z., Zhou, Y.: Radio transmitter verification basedon elastic sparsity preserving projections and svdd. International Journal ofDigital Content Technology and its Applications 5(10), 247–254 (2011)

129. Greiner, R., Grove, A.J., Roth, D.: Learning cost-sensitive active classifiers.Artif. Intell. 139(2), 137–174 (2002)

130. Grzymala-Busse, J.W., Grzymala-Busse, W.J.: Inducing better rule sets byadding missing attribute values. In: Chan, C.-C., Grzymala-Busse, J.W.,Ziarko, W.P. (eds.) RSCTC 2008. LNCS (LNAI), vol. 5306, pp. 160–169.Springer, Heidelberg (2008)

198 References

131. Gur-Ali, O., Wallace, W.A.: Induction of rules subject to a quality constraint:Probabilistic inductive learning. IEEE Trans. on Knowl. and Data Eng. 5(6),979–984 (1993)

132. Hacigumus, H., Iyer, B., Li, C., Mehrotra, S.: Executing sql over encrypteddata in the database-service-provider model. In: Proceedings of the 2002 ACMSIGMOD International Conference on Management of Data, SIGMOD 2002,pp. 216–227. ACM, New York (2002)

133. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.:The weka data mining software: an update. SIGKDD Explor. Newsl. 11(1),10–18 (2009)

134. Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rdedn. Morgan Kaufmann Publishers Inc., San Francisco (2011)

135. Hansen, L.K., Salamon, P.: Neural network ensembles. IEEE Transactions onPattern Analysis and Machine Intelligence 12(10), 993–1001 (1990)

136. Hart, P.E.: The condensed nearest neighbor rule. IEEE Transactions on In-formation Theory 14, 515–516 (1968)

137. Hart, P.E., Duda, R.O.: Stanford Research Institute. Artificial IntelligenceGroup. Prospector: A Computer-based Consultation System for Mineral Ex-ploration. Technical note: Artificial Intelligence Center. Stanford Research Inst(1977)

138. Hashem, S.: Optimal linear combinations of neural networks. NeuralNetw. 10(4), 599–614 (1997)

139. Hastie, T., Tibshirani, R., Friedman, J.: The elements of statistical learning:data mining, inference and prediction, 2nd edn. Springer (2008)

140. Haussler, D.: Probably approximately correct learning. In: Proceedings of theEighth National Conference on Artificial Intelligence, AAAI 1990, vol. 2, pp.1101–1108. AAAI Press (1990)

141. He, H., Bai, Y., Garcia, E.A., Li, S.: Adasyn: Adaptive synthetic samplingapproach for imbalanced learning. In: Proceedings of the International JointConference on Neural Networks, pp. 1322–1328 (2008)

142. Heck, D., Forschungszentrum (Karlsruhe): Corsika: A Monte Carlo Code toSimulate Extensive Air Showers. FZKA–6019. FZKA (1998)

143. Ho, T.K.: Random decision forests. In: Proceedings of the Third InternationalConference on Document Analysis and Recognition, ICDAR 1995, vol. 1, p.278. IEEE Computer Society, Washington, DC (1995)

144. Ho, T.K.: The random subspace method for constructing decision forests.IEEE Trans. Pattern Anal. Mach. Intell. 20, 832–844 (1998)

145. Ho, T.K.: Complexity of classification problems and comparative advantagesof combined classifiers. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS,vol. 1857, pp. 97–106. Springer, Heidelberg (2000)

146. Ho, T.K., Hull, J.J., Srihari, S.N.: Decision combination in multiple classifiersystems. IEEE Trans. Pattern Anal. Mach. Intell. 16(1), 66–75 (1994)

147. Holte, R.C.: Very simple classification rules perform well on most commonlyused datasets. Machine Learning 11, 63–91 (1993)

148. Hong, J.-H., Min, J.-K., Cho, U.-K., ChoFingerprint, S.-B.: classification us-ing one-vs-all support vector machines dynamically ordered with naıve bayesclassifiers. Pattern Recogn. 41, 662–671 (2008)

149. Horton, P., Nakai, K.: A Probabilistic Classification System for Predicting theCellular Localization Sites of Proteins. In: Proceedings of the Fourth Interna-tional Conference on Intelligent Systems for Molecular Biology, pp. 109–115.AAAI Press (1996)

References 199

150. Hosseini, M.J., Ahmadi, Z., Beigy, H.: Pool and accuracy based stream classi-fication: A new ensemble algorithm on data stream classification using recur-ring concepts detection. In: Proceedings of the 2011 IEEE 11th InternationalConference on Data Mining Workshops, ICDMW 2011, pp. 588–595. IEEEComputer Society, Washington, DC (2011)

151. Hsu, C.-W., Lin, C.-J.: A comparison of methods for multiclass support vectormachines. IEEE Transactions on Neural Networks 13(2), 415–425 (2002)

152. Hu, Y.H.: Handbook of Neural Network Signal Processing, 1st edn. CRC Press,Inc., Boca Raton (2000)

153. Huang, Y.S., Suen, C.Y.: A method of combining multiple experts for therecognition of unconstrained handwritten numerals. IEEE Transactions onPattern Analysis and Machine Intelligence 17(1), 90–94 (1995)

154. Hughes, G.: On the mean accuracy of statistical pattern recognizers. IEEETransactions on Information Theory 14(1), 55–63 (1968)

155. Hullermeier, E., Vanderlooy, S.: Combining predictions in pairwise classifica-tion: An optimal adaptive voting strategy and its relation to weighted voting.Pattern Recogn. 43(1), 128–142 (2010)

156. Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams.In: Proceedings of the Seventh ACM SIGKDD International Conference onKnowledge Discovery and Data Mining, KDD 2001, pp. 97–106. ACM, NewYork (2001)

157. Huzar, Z., Sas, J., Kurzynski, M.: Rule-based Pattern Recognition with Learn-ing. Wroclaw University of Technology Press, Wroclaw (1994)

158. Inoue, H., Narihisa, H.: Optimizing a multiple classifier system. In: Ishizuka,M., Sattar, A. (eds.) PRICAI 2002. LNCS (LNAI), vol. 2417, pp. 285–294.Springer, Heidelberg (2002)

159. Islam, Z., Brankovic, L.: Privacy preserving data mining: A noise additionframework using a novel clustering technique. Knowl.-Based Syst. 24(8), 1214–1223 (2011)

160. Jackowski, K.: Fixed-size ensemble classifier system evolutionarily adapted toa recurring context with an unlimited pool of classifiers. Pattern Analysis andApplications, 1–16 (2013)

161. Jackowski, K., Krawczyk, B., Wozniak, M.: Cost-Sensitive Splitting and Selec-tion Method for Medical Decision Support System. In: Yin, H., Costa, J.A.F.,Barreto, G. (eds.) IDEAL 2012. LNCS, vol. 7435, pp. 850–857. Springer, Hei-delberg (2012)

162. Jackowski, K., Krawczyk, B., Wozniak, M.: Adass+ the hybrid trainingmethod of a classifier based on a feature space partitioning. InternationalJournal of Neural Systems (under review, 2013)

163. Jackowski, K., Wozniak, M.: Algorithm of designing compound recognitionsystem on the basis of combining classifiers with simultaneous splitting featurespace into competence areas. Pattern Analysis and Applications 12(4), 415–425 (2009)

164. Jacobs, R.A.: Methods for combining experts’ probability assessments. NeuralComputation 7(5), 867–888 (1995)

165. Jacobs, R.A., Jordan, M.I., Nowlan, S.J., Hinton, G.E.: Adaptive mixtures oflocal experts. Neural Comput. 3, 79–87 (1991)

166. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput.Surv. 31, 264–323 (1999)

200 References

167. Jain, A.K., Duin, R.P.W.: Jianchang Mao. Statistical pattern recognition: a re-view. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(1),4–37 (2000)

168. Jajodia, S., Sandhu, R.: Toward a multilevel secure relational data model.SIGMOD Rec. 20(2), 50–59 (1991)

169. Japkowicz, N., Shah, M.: Evaluating Learning Algorithms. Cambridge Uni-versity Press (2011)

170. Jiang, H., Liu, G., Xiao, X., Mei, C., Ding, Y., Yu, S.: Monitoring of solid-state fermentation of wheat straw in a pilot scale using ft-nir spectroscopyand support vector data description. Microchemical Journal 102 (2012)

171. Jin, R., Agrawal, G.: Efficient decision tree construction on streaming data. In:Proceedings of the Ninth ACM SIGKDD International Conference on Knowl-edge Discovery and Data Mining, KDD 2003, pp. 571–576. ACM, New York(2003)

172. Jin, R., Breitbart, Y., Muoh, C.: Data discretization unification. Knowl. Inf.Syst. 19(1), 1–29 (2009)

173. Jolliffe, I.T.: Principal Component Analysis, 2nd edn. Springer, New York(2002)

174. Jordan, M.I., Jacobs, R.A.: Hierarchical mixtures of experts and the em algo-rithm. Neural Comput. 6(2), 181–214 (1994)

175. Juszczak, P.: Learning to recognise. A study on one-class classification andactive learning. PhD thesis, Delft University of Technology (2006)

176. Juszczak, P., Tax, D.M.J., Pekalska, E., Duin, R.P.W.: Minimum spanningtree based one-class classifier. Neurocomputing 72(7-9), 1859–1869 (2009)

177. Kacprzak, T., Walkowiak, K., Wozniak, M.: Optimization of overlay dis-tributed computing systems for multiple classifier system - heuristic approach.Logic Journal of the IGPL 20(4), 677–688 (2012)

178. Kage, H., Seki, M., Sumi, K., Tanaka, K., Kyuma, K.: Pattern recognition forvideo surveillance and physical security. In: SICE, 2007 Annual Conference,pp. 1823–1828 (September 2007)

179. Kamgar-Parsi, B., Kanal, L.N.: An improved branch and bound algorithm forcomputing k-nearest neighbors. Pattern Recogn. Lett. 3(1), 7–12 (1985)

180. Kantarcıoglu, M., Clifton, C.: Privately Computing a Distributed k-nn Clas-sifier. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.)PKDD 2004. LNCS (LNAI), vol. 3202, pp. 279–290. Springer, Heidelberg(2004)

181. Katakis, I., Tsoumakas, G., Vlahavas, I.: Tracking recurring contexts usingensemble classifiers: an application to email filtering. Knowl. Inf. Syst. 22(3),371–391 (2010)

182. Kearns, M.J., Vazirani, U.V.: An introduction to computational learning the-ory. MIT Press, Cambridge (1994)

183. Kelly, M.G., Hand, D.J., Adams, N.M.: The impact of changing populationson classifier performance. In: Proceedings of the Fifth ACM SIGKDD Inter-national Conference on Knowledge Discovery and Data Mining, KDD 1999,pp. 367–371. ACM, New York (1999)

184. Kifer, D., Ben-David, S., Gehrke, J.: Detecting change in data streams. In:Proceedings of the Thirtieth International Conference on Very Large DataBases, VLDB 2004, vol. 30, pp. 180–191. VLDB Endowment (2004)

185. Kittler, J., Alkoot, F.M.: Sum versus vote fusion in multiple classifier systems.IEEE Transactions on Pattern Analysis and Machine Intelligence 25(1), 110–115 (2003)

References 201

186. Klinkenberg, R., Renz, I.: Adaptive information filtering: Learning in the pres-ence of concept drifts, pp. 33–40 (1998)

187. Klinkenberg, R., Joachims, T.: Detecting concept drift with support vectormachines. In: Proceedings of the Seventeenth International Conference on Ma-chine Learning, ICML 2000, pp. 487–494. Morgan Kaufmann Publishers Inc.,San Francisco (2000)

188. Klir, G.J., Clair, U.H.S., Yuan, B.: Fuzzy set theory: foundations and appli-cations. Prentice-Hall (1997)

189. Klopotek, M.A., Wierzchon, S.T., Trojanowski, K. (eds.): Concept of theKnowledge Quality Management for Rule-Based Decision System. Advancesin Soft Computing, vol. 22. Springer, Heidelberg (2003)

190. Ko, A.H.R., Sabourin, R., Britto Jr., A.S.: From dynamic classifier selectionto dynamic ensemble selection. Pattern Recogn. 41(5), 1735–1748 (2008)

191. Kohavi, R., Wolpert, D.: Bias plus variance decomposition for zero-one lossfunctions. In: ICML 1996 (1996)

192. Kohavi, R., Provost, F.: Glossary of terms. Machine Learning 30(2-3), 271–274(1998)

193. Zico Kolter, J., Maloof, M.A.: Dynamic weighted majority: An ensemblemethod for drifting concepts. J. Mach. Learn. Res. 8, 2755–2790 (2007)

194. Kolter, J.Z., Maloof, M.A.: Dynamic weighted majority: a new ensemblemethod for tracking concept drift. In: Third IEEE International Conferenceon Data Mining, ICDM 2003, pp. 123–130 (November 2003)

195. Korb, K.B.: Introduction: Machine learning as philosophy of science. Mindsand Machines 14(4), 433–440 (2004)

196. Koychev, I.: Gradual forgetting for adaptation to concept drift. In: ECAI 2000Workshop on Current Issues in Spatio-Temporal Reasoning, Berlin, Germany,pp. 101–106 (2000)

197. Krawczyk, B.: Diversity in ensembles for one-class classification. In: Pech-enizkiy, M., Wojciechowski, M. (eds.) New Trends in Databases & Inform.AISC, vol. 185, pp. 119–129. Springer, Heidelberg (2012)

198. Krawczyk, B., Schaefer, G., Wozniak, M.: Breast thermogram analysis usinga cost-sensitive multiple classifier system. In: Proceedings of the IEEE-EMBSInternational Conference on Biomedical and Health Informatics (BHI 2012),pp. 507–510 (2012)

199. Krawczyk, B.: Pattern recognition approach to classifying cyp 2c19 isoform.Central European Journal of Medicine 7(1), 38–44 (2012)

200. Krawczyk, B., Wozniak, M.: Incremental learning and forgetting in one-classclassifiers for data streams. In: Burduk, R., Jackowski, K., Kurzynski, M.,Wozniak, M., Zolnierek, A. (eds.) CORES 2013. AISC, vol. 226, pp. 323–332.Springer, Heidelberg (2013)

201. Krawczyk, B., Wozniak, M.: Designing cost-sensitive ensemble – genetic ap-proach. In: Choras, R.S. (ed.) Image Processing and Communications Chal-lenges 3. AISC, vol. 102, pp. 227–234. Springer, Heidelberg (2011)

202. Krawczyk, B., Wozniak, M.: Privacy preserving models of k-NN algorithm.In: Burduk, R., Kurzynski, M., Wozniak, M., Zo�lnierek, A. (eds.) ComputerRecognition Systems 4. AISC, vol. 95, pp. 207–217. Springer, Heidelberg(2011)

203. Krawczyk, B., Wozniak, M.: Combining Diverse One-Class Classifiers. In: Cor-chado, E., Snasel, V., Abraham, A., Wozniak, M., Grana, M., Cho, S.-B. (eds.)HAIS 2012, Part II. LNCS, vol. 7209, pp. 590–601. Springer, Heidelberg (2012)

202 References

204. Krawczyk, B., Wozniak, M.: Distributed privacy-preserving minimal distanceclassification. In: Pan, J.-S., Polycarpou, M.M., Wozniak, M., de Carvalho,A.C.P.L.F., Quintian, H., Corchado, E. (eds.) HAIS 2013. LNCS, vol. 8073,pp. 462–471. Springer, Heidelberg (2013)

205. Krawczyk, B., Wozniak, M.: Diversity measures for one-class classifier ensem-bles. Neurocomputing (in press, 2013),Doi=http://dx.doi.org/10.1016/j.neurocom.2013.01.053

206. Krawczyk, B., Wozniak, M., Cyganek, B.: Clustering-based ensembles for one-class classification. Information Sciences (in press, 2013)

207. Krogh, A., Vedelsby, J.: Neural network ensembles, cross validation, and activelearning. In: Advances in Neural Information Processing Systems, vol. 7, pp.231–238 (1995)

208. Kufrin, R.: Decision trees on parallel processors. In: Parallel Processing forArtificial Intelligence 3. Elsevier Science, pp. 279–306. Elsevier (1995)

209. Kuncheva, L.I.: Clustering-and-selection model for classifier combination. In:Proceedings of the Fourth International Conference on Knowledge-Based In-telligent Engineering Systems and Allied Technologies, vol. 1, pp. 185–188(2000)

210. Kuncheva, L.I., Whitaker, C.J., Shipp, C.A., Duin, R.P.W.: Limits on the ma-jority vote accuracy in classifier fusion. Pattern Analysis and Applications 6,22–31 (2003)

211. Kuncheva, L., Bezdek, J.C., Duin, R.P.W.: Decision templates for multipleclassifier fusion: an experimental comparison. Pattern Recognition 34(2), 299–314 (2001)

212. Kuncheva, L.I.: Clustering-and-selection model for classifier combination. In:KES, pp. 185–188 (2000)

213. Kuncheva, L.I.: Classifier Ensembles for Changing Environments. In: Roli,F., Kittler, J., Windeatt, T. (eds.) MCS 2004. LNCS, vol. 3077, pp. 1–15.Springer, Heidelberg (2004)

214. Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms.Wiley-Interscience (2004)

215. Kuncheva, L.I.: Classifier ensembles for detecting concept change in streamingdata: Overview and perspectives. In: 2nd Workshop SUEMA 2008 (ECAI2008), pp. 5–10 (2008)

216. Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensemblesand their relationship with the ensemble accuracy. Mach. Learn. 51(2), 181–207 (2003)

217. Kuratowski, K., Mostowski, A.: Set theory: with an introduction to descriptiveset theory, 2nd completely rev edn. North-Holland Pub. Co. (1976)

218. Kurlej, B., Wozniak, M.: Learning Curve in Concept Drift While Using ActiveLearning Paradigm. In: Bouchachia, A. (ed.) ICAIS 2011. LNCS, vol. 6943,pp. 98–106. Springer, Heidelberg (2011)

219. Kurlej, B., Wozniak, M.: Impact of window size in active learning of evolv-ing data streams. In: Proceedings of the 45th International Conference onModelling and Simulation of Systems, MOSIS 2011, pp. 56–62 (2011)

220. Kurlej, B., Wozniak, M.: Active learning approach to concept drift problem.Logic Journal of the IGPL 20(3), 550–559 (2012)

221. Kurzynski, M.W.: Combining rule-based and sample-based classifiers - prob-abilistic approach. In: De Gregorio, M., Di Maio, V., Frucci, M., Musio, C.(eds.) BVAI 2005. LNCS, vol. 3704, pp. 298–307. Springer, Heidelberg (2005)

http://dx.doi.org/10.1016/j.neurocom.2013.01.053

References 203

222. Kurzynski, M.W., Puchala, E., Sas, J.: Hybrid pattern recognition algorithmswith the statistical model applied to the computer-aided medical diagnosis.In: Crespo, J.L., Maojo, V., Martin, F. (eds.) ISMDA 2001. LNCS, vol. 2199,pp. 133–139. Springer, Heidelberg (2001)

223. Kurzynski, M., Wozniak, M.: Combining classifiers under probabilistic models:experimental comparative analysis of methods. Expert Systems 29(4), 374–393(2012)

224. Kurzynski, M.W.: The optimal strategy of a tree classifier. Pattern Recogni-tion 16(1), 81–87 (1983)

225. Lam, L.: Classifier combinations: Implementations and theoretical issues. In:Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 77–86. Springer,Heidelberg (2000)

226. Landwehr, N., Hall, M., Frank, E.: Logistic model trees. Mach. Learn. 59(1-2),161–205 (2005)

227. Lazarescu, M.M., Venkatesh, S., Bui, H.H.: Using multiple windows to trackconcept drift. Intell. Data Anal. 8(1), 29–59 (2004)

228. Lazarevic, A., Obradovic, Z.: Effective pruning of neural network classifierensembles. In: Proceedings of International Joint Conference on Neural Net-works, IJCNN 2001, vol. 2, pp. 796–801. IEEE (2001)

229. Lee, H.-M., Chen, C.-M., Chen, J.-M., Jou, Y.-L.: An efficient fuzzy classifierwith feature selection based on fuzzy entropy. IEEE Transactions on Systems,Man, and Cybernetics, Part B: Cybernetics 31(3), 426–432 (2001)

230. Lindell, Y., Pinkas, B.: Privacy preserving data mining. Journal of Cryptol-ogy 15(3), 177–206 (2002)

231. Lindell, Y., Pinkas, B.: Secure multiparty computation for privacy-preservingdata mining. IACR Cryptology ePrint Archive, 2008:197 (2008)

232. Lindell, Y., Pinkas, B.: A proof of security of yao’s protocol for two-partycomputation. J. Cryptol. 22(2), 161–188 (2009)

233. Lirov, Y., Yue, O.-C.: Automated network troubleshooting knowledge acqui-sition. Applied Intelligence 1, 121–132 (1991)

234. Little, M.A., McSharry, P.E., Hunter, E.J., Spielman, J.L., Ramig, L.O.: Suit-ability of dysphonia measurements for telemonitoring of parkinson’s disease.IEEE Trans. Biomed. Engineering 56(4), 1015–1022 (2009)

235. Littlestone, N., Warmuth, M.K.: The weighted majority algorithm. Inf. Com-put. 108(2), 212–261 (1994)

236. Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and DataMining. Kluwer Academic Publishers, Norwell (1998)

237. Liu, X., Wu, J., Zhou, Z.: Exploratory undersampling for class-imbalancelearning. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cy-bernetics 39(2), 539–550 (2009)

238. Loftsgaarden, D., Quessenberry, C.: A nonparametric estimate of multivariatedensity function. Ann. Math. Statist. (36), 1049–1051 (1965)

239. Lopez, V., Fernandez, A., Moreno-Torres, J.G., Herrera, F.: Analysis of pre-processing vs. cost-sensitive learning for imbalanced classification. open prob-lems on intrinsic data characteristics. Expert Systems with Applications 39(7),6585–6608 (2012)

240. Lua, E.K., Crowcroft, J., Pias, M., Sharma, R., Lim, S.: A survey and compar-ison of peer-to-peer overlay network schemes. IEEE Communications SurveysTutorials 7(2), 72–93 (2005)

204 References

241. Lumijarvi, J., Laurikkala, J., Juhola, M.: A comparison of different heteroge-neous proximity functions and euclidean distance. In: Stud Health Technol.Inform., vol. 107(pt 2), pp. 1362–1366 (2004)

242. Manevitz, L., Yousef, M.: One-class document classification via neural net-works. Neurocomputing 70(7-9), 1466–1481 (2007)

243. Marcialis, G.L., Roli, F.: Fusion of face recognition algorithms for video-basedsurveillance systems. In: Foresti, G.L., Regazzoni, C., Varshney, P. (eds.), pp.235–250 (2003)

244. Margineantu, D.D., Dietterich, T.G.: Pruning adaptive boosting. In: Proceed-ings of the Fourteenth International Conference on Machine Learning, ICML1997, pp. 211–218. Morgan Kaufmann Publishers Inc., San Francisco (1997)

245. Markou, M., Singh, S.: Novelty detection: a review-part 1: statistical ap-proaches. Signal Process 83(12), 2481–2497 (2003)

246. Martin, B.: Instance-based learning: Nearest neighbor with generalization.Master’s thesis, University of Waikato, Hamilton, New Zealand (1995)

247. Martinez-Munoz, G., Hernandez-Lobato, D., Suarez, A.: An analysis of en-semble pruning techniques based on ordered aggregation. IEEE Transactionson Pattern Analysis and Machine Intelligence 31(2), 245–259 (2009)

248. McCulloch, W.S., Pitts, W.: A logical calculus of the ideas immanent in ner-vous activity. In: Neurocomputing: Foundations of Research, pp. 15–27. MITPress, Cambridge (1988)

249. Mehta, M., Agrawal, R., Rissanen, J.: Sliq: A fast scalable classifier for datamining. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996.LNCS, vol. 1057, pp. 18–32. Springer, Heidelberg (1996)

250. Michalski, R.S.: Understanding the nature of learning: issues and research di-rections. In: Michalski, R., Carbonnel, J., Mitchell, T. (eds.) Machine Learn-ing: An Artificial Intelligence Approach, Kaufmann, Los Altos, CA, vol. 2, pp.3–25 (1986)

251. Mickalski, R.S., Mozetic, I., Hong, J., Lavrack, H.: The multi purpose in-cremental learning system aq15 and its testing application to three medicaldomains. In: Proceedings of the 5th National Conference on Artificial Intelli-gence. Morgan Kaufmann Publisher (1986)

252. Miettinen, K.: Nonlinear Multiobjective Optimization. International Seriesin Operations Research and Management Science, vol. 12. Kluwer AcademicPublishers, Dordrecht (1999)

253. Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)254. Montgomery, D.C.: Design and Analysis of Experiments. John Wiley & Sons

(2006)255. Moor, J.H.: The future of computer ethics: You ain’t seen nothin’ yet! Ethics

and Inf. Technol. 3(2), 89–91 (2001)256. Mui, J., Fu, K.S.: Automated classification of nucleated blood cells using a

binary tree classifier. IEEE Trans. Pattern Anal PAMI-2, 429–443 (1980)257. Nanni, L.: Letters: Experimental comparison of one-class classifiers for online

signature verification. Neurocomput. 69(7-9), 869–873 (2006)258. Napierala, K., Stefanowski, J.: Identification of different types of minority

class examples in imbalanced data. In: Corchado, E., Snasel, V., Abraham,A., Wozniak, M., Grana, M., Cho, S.-B. (eds.) HAIS 2012, Part II. LNCS,vol. 7209, pp. 139–150. Springer, Heidelberg (2012)

References 205

259. Narasimhamurthy, A., Kuncheva, L.I.: A framework for generating data tosimulate changing environments. In: Proceedings of the 25th conference onProceedings of the 25th IASTED International Multi-Conference: artificialintelligence and applications (AIAP 2007), Innsbruck, Austria, pp. 384–389.ACTA Press, Anaheim (2007)

260. Navarro-Arribas, G., Torra, V.: Information fusion in data privacy: A survey.Inf. Fusion 13(4), 235–244 (2012)

261. Negnevitsky, M.: Artificial Intelligence: A Guide to Intelligent Systems, 2ndedn. Addison-Wesley Longman Publishing Co., Inc., Boston (2005)

262. von Neumann, J.: The computer and the brain. Yale University Press, NewHaven (1958)

263. Newell, A.: Intellectual issues in the history of artificial intelligence. In:Machlup, F., Mansfield, U. (eds.) The Study of Information: InterdisciplinaryMessages, pp. 187–294. John Wiley & Sons, Inc., New York (1983)

264. Ng, K.-C., Abramson, B.: Uncertainty management in expert systems. IEEEExpert: Intelligent Systems and Their Applications 5(2), 29–48 (1990)

265. Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer, NewYork (2006)

266. Norton, S.W.: Generating better decision trees. In: Proceedings of the 11thInternational Joint Conference on Artificial Intelligence, IJCAI 1989, vol. 1,pp. 800–805. Morgan Kaufmann Publishers Inc., San Francisco (1989)

267. Noto, K., Brodley, C., Slonim, D.: Frac: A feature-modeling approach for semi-supervised and unsupervised anomaly detection. Data Mining and KnowledgeDiscovery 25(1), 109–133 (2012)

268. Nunez, M.: The use of background knowledge in decision tree induction. Mach.Learn. 6(3), 231–250 (1991)

269. Nunez, M.: Economic induction: A case study. In: EWSL, pp. 139–145 (1988)270. Gomes, J.B., Ruiz, E.M., Sousa, P.A.C.: Learning recurring concepts from

data streams with a context-aware ensemble. In: Chu, W.C., Wong, W.E.,Palakal, M.J., Hung, C.-C. (eds.) Proceedings of the 2011 ACM Symposiumon Applied Computing (SAC), TaiChung, Taiwan, March 21- 24, pp. 994–999.ACM (2011)

271. Opitz, D.W., Shavlik, J.W.: Generating accurate and diverse members of aneural-network ensemble. In: NIPS, pp. 535–541 (1995)

272. Ouyang, Z., Gao, Y., Zhao, Z., Wang, T.: Study on the classification of datastreams with concept drift. In: FSKD, pp. 1673–1677. IEEE (2011)

273. Oza, N.C.: Online ensemble learning. In: Proceedings of the Seventeenth Na-tional Conference on Artificial Intelligence and Twelfth Conference on on In-novative Applications of Artificial Intelligence, AAAI/IAAI, p. 1109. AAAIPress / The MIT Press, Austin, Texas (2000)

274. Oza, N.C., Tumer, K.: Classifier ensembles: Select real-world applications. Inf.Fusion 9(1), 4–20 (2008)

275. Paliouras, G., Bree, D.S.: The effect of numeric features on the scalability ofinductive learning programs. In: Lavrac, N., Wrobel, S. (eds.) ECML 1995.LNCS, vol. 912, pp. 218–231. Springer, Heidelberg (1995)

276. Partalas, I., Tsoumakas, G., Vlahavas, I.: Pruning an ensemble of classifiersvia reinforcement learning. Neurocomputing 72(7-9), 1900–1909 (2009)

277. Partridge, D., Krzanowski, W.: Software diversity: practical statistics for itsmeasurement and exploitation. Information and Software Technology 39(10),707–717 (1997)

206 References

278. Passerini, A., Pontil, M., Frasconi, P.: New results on error correcting outputcodes of kernel machines. IEEE Transactions on Neural Networks 15(1), 45–54(2004)

279. Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S.,Stonebraker, M.: A comparison of approaches to large-scale data analysis. In:Proceedings of the 2009 ACM SIGMOD International Conference on Manage-ment of Data, SIGMOD 2009, pp. 165–178. ACM, New York (2009)

280. Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of PlausibleInference. Morgan Kaufmann Publishers Inc., San Francisco (1988)

281. Pedrycz, W., Gomide, F.: Fuzzy Systems Engineering: Toward Human-CentricComputing. Wiley (2007)

282. Penar, W., Wozniak, M.: Cost-sensitive methods of constructing hierarchicalclassifiers. Expert Systems 27(3), 146–155 (2010)

283. Peng, Y.H., Huang, Q., Jiang, P., Jiang, J.: Cost-sensitive ensemble of supportvector machines for effective detection of microcalcification in breast cancerdiagnosis. In: Wang, L., Jin, Y. (eds.) FSKD 2005. LNCS (LNAI), vol. 3614,pp. 483–493. Springer, Heidelberg (2005)

284. Platt, J.C., Cristianini, N., Shawe-Taylor, J.: Large margin dags for multiclassclassification. In: NIPS, pp. 547–553 (1999)

285. Pola, G.: How to solve it. Princeton University Press, Princeton (1945)286. Polikar, R.: Ensemble based systems in decision making. IEEE Circuits and

Systems Magazine 6(3), 21–45 (2006)287. Polikar, R.: Ensemble learning. Scholarpedia 3(12), 2776 (2008)288. Polikar, R., Upda, L., Upda, S.S., Honavar, V.: Learn++: an incremental

learning algorithm for supervised neural networks. Trans. Sys. Man CyberPart C 31(4), 497–508 (2001)

289. Poston, W.L., Marchette, D.J.: Recursive dimensionality reduction usingfisher’s linear discriminant. Pattern Recognition 31(7), 881–888 (1998)

290. Provost, F., Fawcett, T., Kohavi, R.: The case against accuracy estimation forcomparing induction algorithms. In: Proceedings of the Fifteenth InternationalConference on Machine Learning, pp. 445–453. Morgan Kaufmann (1997)

291. Przewozniczek, M., Walkowiak, K., Wozniak, M.: Optimizing distributed com-puting systems for k-nearest neighbours classifiers - evolutionary approach.Logic Journal of the IGPL 19(2), 357–372 (2011)

292. Pujol, O., Radeva, P., Vitria, J.: Discriminant ecoc: A heuristic method forapplication dependent design of error correcting output codes. IEEE Trans.Pattern Anal. Mach. Intell. 28(6), 1007–1012 (2006)

293. Qiang, F., Shang-Xu, H., Sheng-Ying, Z.: Clustering-based selective neuralnetwork ensemble. Journal of Zhejiang University - Science A 6(5), 387–392(2005)

294. Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)295. Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann series

in machine learning. Morgan Kaufmann Publishers (1993)296. Rabiner, L.R.: A tutorial on hidden markov models and selected applications

in speech recognition. In: Waibel, A., Lee, K.-F. (eds.) Readings in SpeechRecognition, pp. 267–296. Morgan Kaufmann Publishers Inc., San Francisco(1990)

297. Rahman, A.F.R., Fairhurst, M.C.: Serial combination of multiple experts: Aunified evaluation. Pattern Analysis and Applications 2, 292–311 (1999)

References 207

298. Ramamurthy, S., Bhatnagar, R.: Tracking recurrent concept drift in stream-ing data using ensemble classifiers. In: Proceedings of the Sixth InternationalConference on Machine Learning and Applications, ICMLA 2007, pp. 404–409.IEEE Computer Society, Washington, DC (2007)

299. Rao, N.S.V.: A Generic Sensor Fusion Problem: Classification and FunctionEstimation. In: Roli, F., Kittler, J., Windeatt, T. (eds.) MCS 2004. LNCS,vol. 3077, pp. 16–30. Springer, Heidelberg (2004)

300. Rastrigin, L.A., Erenstein, R.H.: Method of Collective Recognition. Energoiz-dat, Moscow (1981)

301. Raudys, S.: Trainable fusion rules. i. large sample size case. NeuralNetw. 19(10), 1506–1516 (2006)

302. Raudys, S.: Trainable fusion rules. ii. small sample-size effects. NeuralNetw. 19(10), 1517–1527 (2006)

303. Raudys, S., Roli, F.: The behavior knowledge space fusion method: analy-sis of generalization error and strategies for performance improvement. In:Windeatt, T., Roli, F. (eds.) MCS 2003. LNCS, vol. 2709, pp. 55–64. Springer,Heidelberg (2003)

304. Rissanen, J.: Modeling by shortest data description. Automatica 14, 465–471(1978)

305. Rivest, R.L.: Learning decision lists. Mach. Learn. 2(3), 229–246 (1987)306. Rodrıguez, J.J., Kuncheva, L.I.: Combining online classification approaches for

changing environments. In: da Vitoria Lobo, N., Kasparis, T., Roli, F., Kwok,J.T., Georgiopoulos, M., Anagnostopoulos, G.C., Loog, M. (eds.) SSPR &SPR 2008. LNCS, vol. 5342, pp. 520–529. Springer, Heidelberg (2008)

307. Rogova, G.: Combining the results of several neural network classifiers. NeuralNetw. 7(5), 777–781 (1994)

308. Rokach, L.: Taxonomy for characterizing ensemble methods in classificationtasks: A review and annotated bibliography. Computational Statistics andData Analysis 53(12), 4046–4072 (2009)

309. Rokach, L.: Pattern Classification Using Ensemble Methods. World ScientificPublishing Co., Inc., River Edge (2010)

310. Rokach, L., Maimon, O.: Feature set decomposition for decision trees. Intell.Data Anal. 9(2), 131–158 (2005)

311. Roli, F., Giacinto, G.: Design of Multiple Classifier Systems. World ScientificPublishing (2002)

312. Rosenblatt, F.: The Perceptron: A probabilistic model for information storageand organization in the brain. Psychological Review 65, 386–408 (1958)

313. Rosenblatt, F.: Principles of Neurodynamics. Spartan Books (1959)314. Ross, S.M.: Introduction to probability and statistics for engineers and scien-

tists, 2nd edn. Academic Press (2000)315. Rubinstein, R.Y., Kroese, D.P.: Simulation and the Monte Carlo Method.

Wiley Series in Probability and Statistics, 2nd edn.316. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by

back-propagating errors. In: Neurocomputing: Foundations of Research, pp.696–699. MIT Press, Cambridge (1988)

317. Ruta, D., Gabrys, B.: A theoretical analysis of the limits of majority votingerrors for multiple classifier systems. Pattern Anal. Appl. 5(4), 333–350 (2002)

318. Ruta, D., Gabrys, B.: Classifier selection for majority voting. InformationFusion 6(1), 63–81 (2005)

208 References

319. Rutkowski, L.: Flexible Neuro-Fuzzy Systems: Structures, Learning and Per-formance Evaluation. The Springer International Series in Engineering andComputer Science. Springer (2004)

320. Ryan, W.G.: Privacy and freedom. Business Horizons 10(4), 106–106 (1967)321. Sachs, L.: Applied statistics. A handbook of techniques. Springer, New York

(1984)322. Safavian, S.R., Landgrebe, D.: A survey of decision tree classifier methodology.

IEEE Transactions on Systems, Man and Cybernetics 21(3), 660–674 (1991)323. Salganicoff, M.: Density-adaptive learning and forgetting. In: Proceedings of

the Tenth Annual Conference on Machine Learning. Morgan Kaufmann, SanFrancisco (1993)

324. Salzberg, S.: A nearest hyperrectangle learning method. Mach. Learn. 6(3),251–276 (1991)

325. Sanchez, G.M., Burridge, A.L.: Decision making in head injury managementin the edwin smith papyrus. Neurosurgical Focus 23(1), E5 (2007)

326. Schapire, R.E.: The Boosting Approach to Machine Learning: An Overview.In: MSRI Workshop on Nonlinear Estimation and Classification, Berkeley,CA, USA (2001)

327. Schapire, R.E.: The strength of weak learnability. Mach. Learn. 5(2), 197–227(1990)

328. Schlimmer, J.C., Granger Jr., R.H.: Incremental learning from noisy data.Mach. Learn. 1(3), 317–354 (1986)

329. Scholkopf, B., Smola, A.J.: Learning with kernels: support vector machines,regularization, optimization, and beyond. In: Adaptive Computation and Ma-chine Learning. MIT Press (2002)

330. Seni, G., Elder, J.: Ensemble Methods in Data Mining: Improving AccuracyThrough Combining Predictions. Morgan and Claypool Publishers (2010)

331. Shapley, L.S., Grofman, B.N.: Optimizing group judgmental accuracy in thepresence of interdependencies. Public Choice 43(3), 329–333 (1984)

332. Sharkey, A.J.C., Sharkey, N.E.: Combining diverse neural nets. Knowl. Eng.Rev. 12(3), 231–247 (1997)

333. Sheskin, D.J.: Handbook of Parametric and Nonparametric Statistical Proce-dures, 4th edn. Chapman & Hall/CRC (2007)

334. Shipp, C.A., Kuncheva, L.: Relationships between combination methods andmeasures of diversity in combining classifiers. Information Fusion 3(2), 135–148 (2002)

335. Shlien, S.: Multiple binary decision tree classifiers. Pattern Recogn. 23(7),757–763 (1990)

336. Skalak, D.B.: The sources of increased accuracy for two proposed boostingalgorithms. In: Proc. American Association for Arti Intelligence, AAAI 1996,Integrating Multiple Learned Models Workshop, pp. 120–125 (1996)

337. Skurichina, M., Duin, R.P.W.: Bagging, boosting and the random subspacemethod for linear classifiers. Pattern Anal. Appl. 5(2), 121–135 (2002)

338. Smetek, M., Trawinski, B.: Selection of heterogeneous fuzzy model ensemblesusing self-adaptive genetic algorithms. New Generation Computing 29, 309–327 (2011)

339. Sneath, P.H.A., Sokal, R.R.: Numerical Taxonomy: The Principles and Prac-tice of Numerical Classification. A series of boocks in biology. W H FreemanLimited (1973)

References 209

340. Snyman, J.: Practical Mathematical Optimization: An Introduction to Ba-sic Optimization Theory and Classical and New Gradient-Based Algorithms.Applied Optimization. Springer (2005)

341. Sobolewski, P., Wozniak, M.: Comparable study of statistical tests for vir-tual concept drift detection. In: Burduk, R., Jackowski, K., Kurzynski, M.,Wozniak, M., Zolnierek, A. (eds.) CORES 2013. AISC, vol. 226, pp. 333–341.Springer, Heidelberg (2013)

342. Sobolewski, P., Wozniak, M.: Concept drift detection and model selectionwith simulated recurrence and ensembles of statistical detectors. Journal ofUniversal Computer Science 19(4), 462–483 (2013)

343. Sonnenburg, S., Ratsch, G., Schafer, C., Scholkopf, B.: Large scale multiplekernel learning. J. Mach. Learn. Res. 7, 1531–1565 (2006)

344. Srinivas, M., Patnaik, L.M.: Genetic algorithms: A survey. Computer 27,17–26 (1994)

345. Srivastava, A., Han, E.-H., Kumar, V., Singh, V.: Parallel formulations ofdecision-tree classification algorithms. Data Min. Knowl. Discov. 3(3), 237–261 (1999)

346. Stanley, K.O.: Learning concept drift with a committee of decision trees. In:Artificial Intelligence: A Modern Approach (2003)

347. Stork, D.G., Yom-Tov, E., Duda, R.O.: Computer manual in MATLAB toaccompany pattern classification. John Wiley & Sons (2004)

348. Nick Street, W., Kim, Y.S.: A streaming ensemble algorithm (sea) for large-scale classification. In: Proceedings of the Seventh ACM SIGKDD Interna-tional Conference on Knowledge Discovery and Data Mining, KDD 2001, pp.377–382. ACM, New York (2001)

349. Su, J., Zhang, H.: A fast decision tree learning algorithm. In: Proceedings ofthe 21st National Conference on Artificial Intelligence, AAAI 2006, vol. 1, pp.500–505. AAAI Press (2006)

350. Sun, Y., Kamel, M.S., Wong, A.K.C., Wang, Y.: Cost-sensitive boostingfor classification of imbalanced data. Pattern Recognition 40(12), 3358–3378(2007)

351. Sun, Y., Wong, A.K.C., Kamel, M.S.: Classification of imbalanced data: Areview. International Journal of Pattern Recognition and Artificial Intelli-gence 23(4), 687–719 (2009)

352. Surowiecki, J.: The Wisdom of Crowds: Why the Many Are Smarter Than theFew and How Collective Wisdom Shapes Business, Economies, Societies andNations. Knopf Doubleday Publishing Group (2004)

353. Swets, J.A.: Measuring the accuracy of diagnostic systems. Science 240,1285–1289 (1998)

354. Szeliski, R.: Computer Vision: Algorithms and Applications, 1st edn. Springer-Verlag New York, Inc., New York (2010)

355. Tan, M.: Csl: a cost-sensitive learning system for sensing and grasping ob-jects. In: Proceedings of 1990 IEEE International Conference on Robotics andAutomation, vol. 2, pp. 858–863 (May 1990)

356. Tan, M.: Cost-sensitive learning of classification knowledge and its applica-tions in robotics. Machine Learning 13, 7–33 (1993)

357. Tan, M., Schlimmer, J.C.: Cost-sensitive concept learning of sensor use inapproach and recognition. In: Proceedings of the Sixth International Workshopon Machine Learning, pp. 392–395. Morgan Kaufmann Publishers Inc., SanFrancisco (1989)

210 References

358. Tang, E.K., Suganthan, P.N., Yao, X.: An analysis of diversity measures.Mach. Learn. 65(1), 247–271 (2006)

359. Tao, D., Tang, X., Li, X., Wu, X.: Asymmetric bagging and random subspacefor support vector machines-based relevance feedback in image retrieval. IEEETrans. Pattern Anal. Mach. Intell. 28(7), 1088–1099 (2006)

360. Tavani, H.T.: Informational privacy, data mining, and theinternet. Ethics andInf. Technol. 1(2), 137–145 (1998)

361. Tax, D.M.J., Duin, R.P.W.: Support vector data description. Machine Learn-ing 54(1), 45–66 (2004)

362. Tax, D.M.J., Duin, R.P.W.: Combining one-class classifiers. In: Kittler, J.,Roli, F. (eds.) MCS 2001. LNCS, vol. 2096, pp. 299–308. Springer, Heidelberg(2001)

363. Tax, D.M.J., Juszczak, P., Pekalska, E.Z., Duin, R.P.W.: Outlier detectionusing ball descriptions with adjustable metric. In: Yeung, D.-Y., Kwok, J.T.,Fred, A., Roli, F., de Ridder, D. (eds.) SSPR 2006 and SPR 2006. LNCS,vol. 4109, pp. 587–595. Springer, Heidelberg (2006)

364. Tax, D.M.J.: Ddtools, the data description toolbox for matlab, version 1.9.1(May 2012)

365. Tax, D.M.J., Duin, R.P.W.: Characterizing one-class datasets. In: Proceedingsof the Sixteenth Annual Symposium of the Pattern Recognition Associationof South Africa, pp. 21–26 (2005)

366. Tax, D.M.J., Duin, R.P.W.: Using two-class classifiers for multiclass classifica-tion. In: Proceedings of 16th International Conference on Pattern Recognition,vol. 2, pp. 124–127 (2002)

367. R Development Core Team. R: A Language and Environment for StatisticalComputing. R Foundation for Statistical Computing, Vienna, Austria (2008)

368. Teng, Z., Du, W.: A Hybrid Multi-group Privacy-Preserving Approach forBuilding Decision Trees. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD2007. LNCS (LNAI), vol. 4426, pp. 296–307. Springer, Heidelberg (2007)

369. Termenon, M., Grana, M.: A two stage sequential ensemble applied to theclassification of alzheimer’s disease based on mri features. Neural ProcessingLetters 35(1), 1–12 (2012)

370. Thagard, P.R.: Philosophy and machine learning. Canadian Journal of Phi-losophy 20(2), 261–276 (1990)

371. Theodoridis, S., Koutroumbas, K.: Pattern Recognition & Matlab Intro, 4thedn. Academic Press (2010)

372. Ting, K., Wells, J., Tan, S., Teng, S., Webb, G.: Feature-subspace aggregating:ensembles for stable and unstable learners. Machine Learning 82, 375–397(2011)

373. Ting, K.M., Witten, I.H.: Stacked generalization: when does it work? In: Pro-ceedings of the Fifteenth International Joint Conference on Artifical Intelli-gence, IJCAI 1997, vol. 2, pp. 866–871. Morgan Kaufmann Publishers Inc.,San Francisco (1997)

374. Torczon, V.: On the convergence of pattern search algorithms. SIAM J. onOptimization 7(1), 1–25 (1997)

375. Tremblay, G., Sabourin, R., Maupin, P.: Optimizing nearest neighbour inrandom subspaces using a multi-objective genetic algorithm. In: Proceedingsof the Pattern Recognition, 17th International Conference on (ICPR 2004),vol. 1, p. 208. IEEE Computer Society, Washington, DC (2004)

References 211

376. Tresp, V., Taniguchi, M.: Combining estimators using non-constant weightingfunctions. In: Advances in Neural Information Processing Systems 7, pp. 419–426. MIT Press (1995)

377. Tsoumakas, G., Partalas, I., Vlahavas, I.: An ensemble pruning primer. In:Okun, O., Valentini, G. (eds.) Applications of Supervised and UnsupervisedEnsemble Methods. SCI, vol. 245, pp. 1–13. Springer, Heidelberg (2009)

378. Tsymbal, A., Pechenizkiy, M., Cunningham, P., Puuronen, S.: Dynamic inte-gration of classifiers for handling concept drift. Inf. Fusion 9(1), 56–68 (2008)

379. Tumer, K., Ghosh, J.: Analysis of decision boundaries in linearly combinedneural classifiers. Pattern Recognition 29(2), 341–348 (1996)

380. Turney, P.D.: Exploiting context when learning to classify. In: Brazdil, P.B.(ed.) ECML 1993. LNCS, vol. 667, pp. 402–407. Springer, Heidelberg (1993)

381. Turney, P.D.: Cost-sensitive classification: empirical evaluation of a hybridgenetic decision tree induction algorithm. J. Artif. Int. Res. 2, 369–409 (1995)

382. Ueda, N., Nakano, R.: Generalization error of ensemble estimators. In: Pro-ceedings of IEEE International Conference on Neural Networks, Washington,USA, Junho, pp. 90–95 (1996)

383. Vaidya, J.: Privacy Preserving Data Mining over Vertically Partitioned Data.PhD thesis, Purdue University (August 2004)

384. Valiant, L.G.: A theory of the learnable. Commun. ACM 27(11), 1134–1142(1984)

385. van der Heijden, F., Duin, R., de Ridder, D., Tax, D.M.J.: Classification,Parameter Estimation and State Estimation: An Engineering Approach UsingMATLAB. John Wiley & Sons (2005)

386. van Erp, M., Vuurpijl, L., Schomaker, L.: An overview and comparison of vot-ing methods for pattern recognition. In: Proceedings ofv Eighth InternationalWorkshop on Frontiers in Handwriting Recognition, pp. 195–200 (2002)

387. Vapnik, V.N.: Estimation of Dependences Based on Empirical Data. SpringerSeries in Statistics. Springer-Verlag New York, Inc., Secaucus (1982)

388. Vapnik, V.N.: Statistical learning theory. Wiley (1998)389. Vapnik, V.N.: The nature of statistical learning theory. Springer-Verlag New

York, Inc., New York (1995)390. Vardeman, S.B., Morris, M.D.: Majority voting by independent classifiers can

increase error rates. The American Statistician 67(2), 94–96 (2013)391. Verdenius, F.: A method for inductive cost optimization. In: Proceedings of

the European Working Session on Machine Learning, pp. 179–191. Springer,London (1991)

392. Walkowiak, K., Sztajer, S., Wozniak, M.: Decentralized distributed computingsystem for privacy-preserving combined classifiers – modeling and optimiza-tion. In: Murgante, B., Gervasi, O., Iglesias, A., Taniar, D., Apduhan, B.O.(eds.) ICCSA 2011, Part I. LNCS, vol. 6782, pp. 512–525. Springer, Heidelberg(2011)

393. Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streamsusing ensemble classifiers. In: Proceedings of the ninth ACM SIGKDD Inter-national Conference on Knowledge Discovery and Data Mining, KDD 2003,pp. 226–235. ACM, New York (2003)

394. Wang, S., Yao, X.: Diversity analysis on imbalanced data sets by using en-semble models. In: Proceedings of 2009 IEEE Symposium on ComputationalIntelligence and Data Mining, CIDM 2009, pp. 324–331 (2009)

212 References

395. Wang, X., Paliwal, K.K.: Feature extraction and dimensionality reductionalgorithms and their applications in vowel recognition. Pattern Recogni-tion 36(10), 2429–2439 (2003)

396. Watanabe, S.: Pattern Recognition: Human and Mechanical. Wiley, New York(1985)

397. Wettschereck, D.: A hybrid nearest-neighbor and nearest-hyperrectangle al-gorithm. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784,pp. 323–335. Springer, Heidelberg (1994)

398. Wettschereck, D., Dietterich, T.G.: An experimental comparison of thenearest-neighbor and nearest-hyperrectangle algorithms. Mach. Learn. 19(1),5–27 (1995)

399. Widmer, G.: Tracking context changes through meta-learning. Mach.Learn. 27(3), 259–286 (1997)

400. Widmer, G., Kubat, M.: Learning in the presence of concept drift and hiddencontexts. Mach. Learn. 23(1), 69–101 (1996)

401. Wilk, T., Wozniak, M.: Combination of one-class classifiers for multiclass prob-lems by fuzzy logic. Neural Network World International Journal on Non-Standard Computing and Artificial Intelligence 20, 853–869 (2010)

402. Wilk, T., Wozniak, M.: Complexity and Multithreaded Implementation Anal-ysis of One Class-Classifiers Fuzzy Combiner. In: Corchado, E., Kurzynski,M., Wozniak, M. (eds.) HAIS 2011, Part II. LNCS, vol. 6679, pp. 237–244.Springer, Heidelberg (2011)

403. Wilk, T., Wozniak, M.: Soft computing methods applied to combination ofone-class classifiers. Neurocomputing 75(1), 185–193 (2012)

404. Woloszynski, T., Kurzynski, M.: A probabilistic model of classifier competencefor dynamic ensemble selection. Pattern Recognition 44(10-11), 2656–2668(2011)

405. Woloszynski, T., Kurzynski, M., Podsiadlo, P., Stachowiak, G.W.: A measureof competence based on random classification for dynamic ensemble selection.Inf. Fusion 13(3), 207–213 (2012)

406. Wolpert, D.H.: Stacked generalization. Neural Networks 5, 241–259 (1992)407. Wolpert, D.H.: The supervised learning no-free-lunch theorems. In: Proc. 6th

Online World Conference on Soft Computing in Industrial Applications, pp.25–42 (2001)

408. Woods Jr., K., Kegelmeyer, W.P., Bowyer, K.: Combination of multiple clas-sifiers using local accuracy estimates. IEEE Transactions on Pattern Analysisand Machine Intelligence 19(4), 405–410 (1997)

409. Wozniak, M.: Proposition of common classifier construction for pattern recog-nition with context task. Knowledge-Based Systems 19(8), 617–624 (2006)

410. Wozniak, M., Zmyslony, M.: Designing fusers on the basis of discriminants –evolutionary and neural methods of training. In: Grana Romay, M., Corchado,E., Garcia Sebastian, M.T. (eds.) HAIS 2010, Part I. LNCS, vol. 6076, pp.590–597. Springer, Heidelberg (2010)

411. Wozniak, M.: Experiments with trained and untrained fusers. In: Corchado,E., Corchado, J., Abraham, A. (eds.) Innovations in Hybrid Intelligent Sys-tems. Advances in Soft Computing, vol. 44, pp. 144–150. Springer, Heidelberg(2007)

412. Wozniak, M.: Experiments on linear combiners. In: Pietka, E., Kawa, J.(eds.) Information Technologies in Biomedicine. Advances in Soft Comput-ing, vol. 47, pp. 445–452. Springer, Heidelberg (2008)

References 213

413. Wozniak, M.: Evolutionary approach to produce classifier ensemble based onweighted voting. In: NaBIC, pp. 648–653. IEEE (2009)

414. Wozniak, M.: Modification of nested hyperrectangle exemplar as a propositionof information fusion method. In: Corchado, E., Yin, H. (eds.) IDEAL 2009.LNCS, vol. 5788, pp. 687–694. Springer, Heidelberg (2009)

415. Wozniak, M.: A hybrid decision tree training method using data streams.Knowl. Inf. Syst. 29(2), 335–347 (2011)

416. Wozniak, M., Jackowski, K.: Some Remarks on Chosen Methods of ClassifierFusion Based on Weighted Voting. In: Corchado, E., Wu, X., Oja, E., Herrero,A., Baruque, B. (eds.) HAIS 2009. LNCS, vol. 5572, pp. 541–548. Springer,Heidelberg (2009)

417. Wozniak, M., Kasprzak, A., Cal, P.: Application of combined classifiers to datastream classification. In: FQAS 2013. LNCS, vol. 8132, pp. 579–588. Springer,Heidelberg (in press, 2013)

418. Wozniak, M., Krawczyk, B.: Combined classifier based on feature space par-titioning. Int. J. Appl. Math. Comput. Sci. 22(4), 855–866 (2012)

419. Wozniak, M., Zmyslony, M.: Combining classifiers using trained fuser - ana-lytical and experimental results. Neural Network World 13(7), 925–934 (2010)

420. Wu, T.-F., Lin, C.-J., Weng, R.C.: Probability estimates for multi-class clas-sification by pairwise coupling. J. Mach. Learn. Res. 5, 975–1005 (2004)

421. Xiong, L., Chitti, S., Liu, L.: Mining multiple private databases using a knnclassifier. In: Proceedings of the 2007 ACM Symposium on Applied Comput-ing, SAC 2007, pp. 435–440. ACM, New York (2007)

422. Xu, L., Krzyzak, A., Suen, C.Y.: Methods of combining multiple classifiers andtheir applications to handwriting recognition. IEEE Transactions on Systems,Man and Cybernetics 22(3), 418–435 (1992)

423. Yager, R.R., Kacprzyk, J., Fedrizzi, M. (eds.): Advances in the Dempster-Shafer theory of evidence. John Wiley & Sons, Inc., New York (1994)

424. Yan, L., Wolniewicz, R.H., Dodier, R.: Predicting customer behavior intelecommunications. IEEE Intelligent Systems 19(2), 50–58 (2004)

425. Yang, C.-T., Tsai, S.-T., Li, K.-C.: Decision tree construction for data miningon grid computing environments. In: Proceedings of the 19th InternationalConference on Advanced Information Networking and Applications, AINA2005, vol. 2, pp. 421–424. IEEE Computer Society, Washington, DC (2005)

426. Yildiz, O.T., Alpaydin, E.: Ordering and finding the best of k>2 supervisedlearning algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 28(3), 392–402(2006)

427. Yildiz, O.T., Dikmen, O.: Parallel univariate decision trees. Pattern Recogn.Lett. 28(7), 825–832 (2007)

428. Zadeh, L.A.: Fuzzy sets. Information and Control 8, 338–353 (1965)429. Zeng, Q., Zhang, L., Xu, Y., Cheng, L., Yan, X., Zu, J., Dai, G.: Designing

expert system for in situ toughened si3n4 based on adaptive neural fuzzyinference system and genetic algorithms. Materials and Design 30(2), 256–259(2009)

430. Zenobi, G., Cunningham, P.: Using diversity in preparing ensembles of clas-sifiers based on different feature subsets to minimize generalization error. In:Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp.576–587. Springer, Heidelberg (2001)

431. Zhang, Y., Jin, X.: An automatic construction and organization strategy forensemble learning on data streams. SIGMOD Rec. 35(3), 28–33 (2006)

214 References

432. Zhou, Z.-H., Wu, J., Tang, W.: Ensembling neural networks: Many could bebetter than all. Artificial Intelligence 137(1-2), 239–263 (2002)

433. Zhou, Z.-H.: Ensemble Methods: Foundations and Algorithms, 1st edn. Chap-man & Hall/CRC (2012)

434. Zhu, X., Wu, X., Yang, Y.: Effective classification of noisy data streams withattribute-oriented dynamic classifier selection. Knowl. Inf. Syst. 9(3), 339–363(2006)

435. Zliobaite, I.: Change with delayed labeling: When is it detectable? In: Proceed-ings of the 2010 IEEE International Conference on Data Mining Workshops,ICDMW 2010, pp. 843–850. IEEE Computer Society Press, Los Alamitos(2010)

436. Zuo, H., Wu, O., Hu, W., Xu, B.: Recognition of blue movies by fusion ofaudio and video. In: Proceedings of 2008 IEEE International Conference onMultimedia and Expo., ICME 2008, pp. 37–40 (2008)

Index

Adaptive Splitting and Selection, 107AdaSS, 107, 145attribute, 8augmented error function, 26

bagging, 104Bayes decision theory, 6Behavior Knowledge Space, 117bias, 26bias-variance dilemma, 26boosting, 105

AdaBoost, 105bootstraping, 48Borda count, 97, 121

calculemus, 1classification, 5

model, 7, 10classification algorithm, 9classifier

bayesiannonparametric, 27parametric, 27rule-based, 28

cost-sensitive, 75decision tree, 34evaluation

metrics, 50minimal distance, 31nearest hyperrectangle, 68rule-based, 33

classifier evaluationAUC, 51ROC, 50

classifier selection, 106dynamic, 107static, 107

classsifierevaluation

McNemar’s Test, 53Paired Test, 54

Clustering and Selection, 107clustering and selection, 143COLT, 3combination rule

Behavior Knowledge Space, 117Borda count, 121Decision Template, 127majority voting, 113mixture of experts, 125MOMV, 119oracle, 112rank-based, 121Stacked Generalization, 120stacking, 120weighted aggregating, 123weighted voting, 115

concept drift, 169Condorcet Jury Theorem, 96, 113confusion matrix, 49consistency

data, 64knowledge, 62

cost-sensitive classifier, 75cross validation, 47

5 times 2 fold, 48k-fold, 48

CS, 143

216 Index

curse of dimensionality, 6

data streamclassification, 168, 171, 174

data stream classificationconcept drift, 169

decision area, 10Decision Template, 127decision tree

top-down induction, 35discretization, 9diversity measure, 100

non-pairwise, 102pairwise, 101

divide and conquer, 125

ECOC, 108energy measure, 165ensemble pruning, 110

clustering-based, 111optimization-based, 111ranking-based, 111

Error Correction Output Codec, 108

feature reduction, 8feature selection, 8feature space splitting, 141

Adaptive Splitting and Selection, 145classifier, 143clustering and selection, 143

feature vector, 7fuzzy logic, 14

gating network, 127gradient descent, 41

sequential, 43

Hughes effect, 6

imbalance data, 166intelligence

artificial, 2human, 2

kernel trick, 45

learning, 2, 20bias, 26bias-variance dilemma, 26errors, 23

expert ruledirect, 39indirect, 34sequential covering, 40

mode, 21neural network, 44overfitting, 24supervised, 20variance, 26

learning information, 11expert rule, 13learning set, 12

learning materialexpert rule, 18

loss function, 16

majority voting, 113error, 113

Minimum Description Length, 27mixture of experts, 122, 125, 129

gating network, 127

nearest hyperrectangle classifier, 68Nested Generalize Exemplar, 70neural network, 40

activation function, 41multi-layer perceptron, 43

backpropagation, 44perceptron, 40

learning, 42NGE, 70

OAA, 108OAO, 108OCClustE, 158One-Against-All, 108One-Against-One, 108one-class classifier, 156

diversity, 165ensamble, 158OCSVM, 157SVDD, 157

One-Versus-All, 108One-Versus-One, 108oracle, 112overfitting, 26, 71

PAC theory, 3, 97pattern recognition

stages, 5

Index 217

privacy preservingminimal distance classifier, 86taxonomy, 83

Random Forrest, 105Random Subspace, 105risk

conditional, 16overall, 16

soft computing, 15Stacked Generalization, 120

stacking, 120, 127support vector machine, 44

uncertainty, 14probabilistic approach, 15

underfitting, 26

variance, 26

weighted aggregating, 123weighted voting, 115

sci 519 - hybrid classifiers (backmatter pages)978-3-642-40997-4/1.pdf · a appendix a.1 hypothesis...

Documents