presenter: wen-feng hsiao ( 蕭文峰 ) 2009/8/31 1. outline introduction related work proposed...

1

Fuzzy Ordinal Support Vector MachinePresenter: Wen-Feng Hsiao ()2009/8/31

1Today is the last day of my visiting at Sinica. In these two months I attend this group meeting and feel a lot of input from you. I want to say thank you to all of you. And I also feel shameful for not being able to have contribution in this group. So I think Id better talk about what I am doing during this two months.1OutlineIntroductionRelated WorkProposed MethodExperiments and ResultsConclusions

2IntroductionOrdinal Classification TasksGrading (students performance)Rating (credit rating, customers rating toward products)Ranking (query results)

Properties of Ordinal ClassificationLike multi-class classification tasks, the class values in ordinal classification are discrete; but ordered.Like regression prediction tasks, the class values in ordinal classification are ordered; but not equal-spaced/continuous.

3Statistics GLM follows Proportional Odds AssumptionNominal: Suppose you want to guess the types of Iris flower; types of weather, brands of carInterval: the closing price of a certain stock, the amount of electricity to be generated by a power company3Introduction (contd)SVM has been shown to be a very powerful and efficient method for multi-class classification tasks.

SVM has been further extended to regression domain, called SVR, short for Support Vector Regression.

Therefore, several researchers try to understand if SVM can be further applied to ordinal classification tasks.

4Introduction (contd)Most existent methods for ordinal classification do not make use of the ordinal characteristics of the data.

But oSVM, proposed recently by Cardoso and da Costa (2007), which makes use of ordinal characteristics of the data.

It has been shown to outperform traditional methods in predicting ordinal classes.

5Introduction (contd)However, our empirical experiments showed that oSVM suffers from two problems: (1) it cannot handle datasets with noisy data; (2)it often misclassifies instances near class boundaries.

We propose to apply fuzzy sets to ordinal support vector machine in a hope to resolve the above problems simultaneously.

6Introduction (contd)But the challenge is how to devise a reasonable membership function (mf) to assign the membership degrees for instances.

Weve proposed two mfs. The experiments show that the proposed mfs are promising, though still need to be verified further.

7OutlineIntroductionRelated WorkProposed MethodExperiments and ResultsConclusions

8Related Work preprocess & postprocessKramer, et al. (2000) proposed to use pre-process mechanisms to translate ordinal values into metric values, then use regression model for the tasks.Problem: the true metric distances between the ordinal scales are unknown in most of the tasks.

9

Related Work converting to binary classification tasksFrank and Hall (2001) converted an ordinal classification problem into nested binary classification problems that encode the ordering of the original ranks.

The prediction of an instance can be obtained by combining the results of these binary classifiers.

10One of (c-1)10Frank and Hall (2001) (contd)Can be applied only for methods that can output class probability estimates.

Can go wrong when the calculated prob. is negative.

11Classification tasks have only one y. Thus, Frank and Hall (2001) split ordinal classification tasks into several binary classifiers according class value.11Related Work Ordinal Support Vector MachineSVM is a supervised learning method, includes both versions for classification and regression tasks: SVC and SVR.

Support Vector Machine for Classification (SVC), without particular specification, SVM stands for SVC.

Support Vector Machine for Regression (SVR)

12Related Work SVM ABCPerpendicular bisector of the shortest line connecting the convex hullsThe instances closest to the maximum margin hyperplane are called support vectors

-13Related Work SVM linear

PrimalDual-14Related Work SVM soft margin

PrimalDual

15Related Work SVMs kernel trick

Sometimes, seemingly linear unsolvable problems can be linearly resolved by mapping instances into higher dimension. But SVM involves inner product between two instances mapping to higher dimension then conduct inner product calculation can be computation intensive.

16Related Work SVMs nonlinear17Fortunately, kernel function can be used to relieve this burden.Common Kernel functionsPolynomial

Radial Basis Function

Related Work oSVM (Cardoso and da Costa, 2007)18

Related Work oSVM (contd)19Related Work oSVM (contd) 20

Related Work Fuzzy SVM (Lin and Wang, 2002) 21

Define mi is the membership of instance i; the larger value of an instances mi is, the more important of this instance. PrimalDualOutlineIntroductionRelated WorkProposed MethodExperiments and ResultsConclusions

22oSVMs error for grade datasetIs oSVM good enough? No! It is easily influenced by noise.23

oSVMs error for grade dataset (contd)24class 1class 2class 3class 4class 51st layer(-5.22,-0.29)(0.36,4.87)(5.04,9.3)2nd layer(-9.87,-4.94)(-4.29,0.22)(0.39,4.65)(4.96,8.14)3rd layer(-8.8,-4.28)(-4.12,0.14)(0.45,3.64)(4.7,4.98)4th layer(-8.09,-3.83)(-3.53,-0.34)(0.73,1)TruePredictedRegression valuesLayer 1Layer 2Layer 3Layer 411-1.23-5.88-10.39-14.36120.3-4.35-8.85-12.83235.841.19-3.32-7.29220.23-4.42-8.93-12.9224.14-0.51-5.02-9334.90.25-4.25-8.23337.933.28-1.23-5.2338.623.97-0.54-4.514394.35-0.16-4.13449.264.610.11-3.875514.099.444.940.96oSVMs error for ERA datasetIt goes wrong when the class boundaries are vague.

25

oSVMs error for ERA dataset (contd)26

oSVMs error for ERA dataset (contd)class 1class 2class 3class 4class 5class 6class 7class 8class 91st(0.97,1.06)(0.97,1.16)(0.97,1.35)2nd(0.91,1)(0.91,1.1)(0.91,1.29)(0.91,1.1)3rd(-1,-0.81)(-1,-0.62)(-1,-0.81)(-1,-0.86)4th(-1,-0.62)(-1,-0.81)(-1,-0.86)(-1,-0.62)5th(-1,-0.81)(-1,-0.86)(-1,-0.62)(-0.98,-0.62)6th(-1.03,-0.89)(-1.03,-0.66)(-1.01,-0.66)(-1,-0.66)7th(-1.1,-0.73)(-1.08,-0.73)(-1.07,-0.73)(-1.06,-0.73)8th(-1.17,-0.81)(-1.16,-0.81)(-1.15,-0.81)27Proposed Membership function 128class 2class 3class 4min dAclass 1min dmin dmin dd11

For any instance xij, which longs to class i, its membership degree can be defined as:Proposed Membership function 229hyperplaneclass 2class 3class 4Aclass 1d1id1d2d2id3d3id4d4iFor any instance xij, which longs to class i, its membership degree can be defined as:

OutlineIntroductionRelated WorkProposed MethodExperiments and ResultsConclusions

30ExperimentsThe oSVM codes are from Cardoso and da Costa (2007) in matlab.We base their codes to develop our FoSVM.ordinalClassifier is from weka (c4.5 is the base classifier)SVR is from libsvms -SVR (better than -SVR)10 datasets are used for comparing these five classifiersTwo measures are employed to compare the performance: mean zero-one error and mean absolute error.The experiments use 10-fold cross-validation method to obtain over all performance.

3110 Datasets32DatasetInstancesFeaturesClassesAuto MPG392710Diabetes43210ERA26849ESL26148LEV24345SWD280104Grade10035wpbc1943210machine209610stock950910classifierdataoSVMFoSVM (MF1)FoSVM (MF2)ordinalClassifierlibsvmAuto MPG47.14%3.46%44.79%3.97%51.3%4.35%53.12%2.83%*83.44%16.61%Diabetes84.23%1.72%*84.62%10.59%75.77%12.04%78.85%14.09%71.92%9.75%ERA84.65%1.63%85.55%4.59%84.23%1.33%87.31%5.6%85.42%5.84%ESL38.98%9.54%44.84%12.35%42.94%8.93%*55.9%8.98%53.28%8.35%LEV66.57%10.9%67.82%7.03%72.88%12.09%73.43%9.37%*74.5%12.19%SWD79.73%8.89%*82.48%5.68%73.99%10.5%64.27%3.3%72.45%10.41%Grade12.57%9.35%8.09%4.68%9.74%8.66%41.7%20.43%*43.65%9.31%wpbc84.67%8.92%86.75%8.71%83.91%6.64%87.37%5.12%86.54%3.33%machine34.2%13.88%32.78%7.38%32.87%12.92%28.82%14.7%*43.34%14.52%stock25.82%4.2%22.5%3.78%29.07%5.2%21.61%5.02%*53.9%4.88%classifierdataoSVMFoSVM (MF1)FoSVM (MF2)Auto MPG47.14%3.46%44.79%3.97%51.3%4.35%Diabetes84.23%1.72%*84.62%10.59%75.77%12.0%ERA84.65%1.63%85.55%4.59%84.23%1.33%ESL38.98%9.54%44.84%12.35%42.94%8.93%LEV66.57%10.9%67.82%7.03%72.88%12.09%SWD79.73%8.89%*82.48%5.68%73.99%10.5%Grade12.57%9.35%8.09%4.68%9.74%8.66%wpbc84.67%8.92%86.75%8.71%83.91%6.64%machine34.2%13.88%32.78%7.38%32.87%12.92%stock25.82%4.2%22.5%3.78%29.07%5.2%Classifier comparison mean 0-1 error33Mean zero-one error gives an error of 1 to every incorrect prediction that is the fraction of incorrect predictions.classifierdataoSVMFoSVM (MF1)FoSVM (MF2)ordinalClassifierlibsvmAuto MPG0.56540.43260.53050.05020.64270.07040.67260.0451*1.43130.0294Diabetes1.81930.08751.85770.41561.50.4254*1.89230.711.08080.1444ERA1.93940.1477*2.05750.33771.95770.14741.77550.10491.74510.2711ESL0.41910.12340.47360.13640.45090.10860.61460.3535*0.6570.0424LEV0.80650.14950.8260.1390.94470.20570.90230.1102*1.12900.1964SWD0.88070.13080.91130.07680.92390.1460.69840.0859*1.04030.1694Grade0.12570.09350.08090.04680.09740.08660.4260.2037*0.43650.0931wpbc2.39840.53312.64030.57972.65820.25252.70020.71062.30920.1139machine0.48170.25770.44350.17250.48160.24440.39280.2092*0.66360.3419stock0.25820.0420.2250.03780.29270.05970.22560.0587*0.62160.0765classifierdataoSVMFoSVM (MF1)FoSVM (MF2)Auto MPG0.56540.43260.53050.05020.64270.0704Diabetes1.81930.08751.85770.41561.50.4254ERA1.93940.1477*2.05750.33771.95770.1474ESL0.41910.12340.47360.13640.45090.1086LEV0.80650.14950.8260.1390.94470.2057SWD0.88070.13080.91130.07680.92390.146Grade0.12570.09350.08090.04680.09740.0866wpbc2.39840.53312.64030.57972.65820.2525machine0.48170.25770.44350.17250.48160.2444stock0.25820.0420.2250.03780.29270.0597Classifier comparison MAE34Mean absolute error is the average deviation of the prediction from the true target, i.e., in which we treat the ordinal scales as consecutive integers

OutlineIntroductionRelated WorkProposed MethodExperiments and ResultsConclusions

35Membership function 336hyperplaneclass 2class 3class 4Aclass 1d1id1d2d2id3d3id4d4iFuzzy widthFor any instance xij, which longs to class i, its membership degree can be defined as:ReferencesCardoso, J. S., and da Costa, J. F. P. (2007), "Learning to classify ordinal data: The data replication method," Journal of Machine Learning Research, Vol. 8, pp. 13931429.

Frank, E. and Hall, M. (2001). A simple approach to ordinal classification, Proceedings of the European Conference on Machine Learning, pages 145165.

Kramer, S., Widmer, G., Pfahringer, B. , and DeGroeve, M. (2001 ), Prediction of ordinal classes using regression trees., Fundamenta Informaticae, 47:113.

Lin, C.-F., and Wang, S.-D. (2002), "Fuzzy Support Vector Machines," IEEE Transactions on Neural Networks, Vol. 13, No. 2, pp 464-471.

3737

presenter: wen-feng hsiao ( 蕭文峰 ) 2009/8/31 1. outline introduction related work proposed...

Documents