Feature Selection and Ranking Method based on Intuitionistic Fuzzy Matrix and Rough Sets

In this paper we propose a novel rough-fuzzy hybridization technique to feature selection and feature ranking problem. The idea is to model the local preference relation between pair of features by intuitionistic fuzzy values and search for a feature ranking that is consistent with those constraints. We apply the techniques used in group decision making where constraints are presented in form of intuitionistic fuzzy preference relation. The proposed method has been illustrated by some simple examples and verified on a benchmark dataset.


I. INTRODUCTION
D ECISION making plays an extremely important role in the real life problems. With trong development of datasets having a great deal of features, recognizing preferred features is a challenging task. In recent years, rough-fuzzy hybridation becomes a hot trend with great success in machine learning, data mining, decision making, etc. [1].
Usually, a group decision-making process must collect all decision-makers's opinions, establish a suitable method for measuring them, obtain the final scores of all alternatives, and then rank them. Decision makers have to rank the alternatives, find the most preferred feature in order to make decision. So, preference relations are effective techniques to gather the overviews of a group of decision makers. Recently, many researchers have developed many methods of preference relations [2], [3], [4], [5], [6] [7]. To solve decision-making problems with uncertain information or not precise judgments, preference relations with Zadeh's fuzzy sets are proposed. There are two number values to measure the degree of membership and nonmembership in Zadeh's fuzzy theory with sum is one, but ignore the decision maker's hesitation in the decision making process. The Atanassov's intuitionistic fuzzy sets consider fully expressing affirmation, negation and hesitation. Particularly, the usage of intuitionistic fuzzy preference relation to affirmative, negative and hesitant characteristics makes the research problem more and more attractive and competitive.
Real world problems may have numerous irrelevant features, and in such cases feature selection can help decision makers choose important information hidden in the full dataset. Feature selection (FS) method belongs to one of the three main groups: embedded, filter or wrapper methods and can be defined as selecting a subset of available features in a dataset that is associated with the response variables by excluding irrelevant and unnecessary features. An alternative to FS for dimensionality reduction is feature extraction (FE) in which original features are united and then projected into a new feature space with smaller dimensionality. In this paper, we interpret the feature selection and feature ranking as decision making problems and apply the recent techniques for solving it.
In the process of group making decisions, there are often many different opinions of defining the degree of certainty, uncertainty or hesitation among decision makers (DMs). Therefore, it's necessary to define an intuitionistic fuzzy preference relation which have the capability of representing all selections. The consistency of intuitionistic fuzzy preference relations (IFPRs) and the priority weights of DMs gathered from these preference relations play a vital role in group making decision to lead to the most best result. In some works, the consequences of additive consistent and multiplicative consistent IFPRs on priority weights is examined and considered to calculate the priority weights. Numerical analyses have shown that the ranking of the individual priority weights do not differ seriously despite of the different priority weight vectors of the individual priority weights. The intuitionistic fuzzy preference relation is introduced as an indispensable tool for enabling decision-makers to judge the superiority or inferiority of one object to another, in the presence of fuzziness. Ranking methods for alternatives with intuitionistic fuzzy information are expressed straightforwardly and efficiently to get the solution of group decision problems.
The paper is organized as follows: in Section II we recall some basic notions in intuitionistic fuzzy set theory and rough set theory. Section III describes ranking methods that are consistent or semi-consistent with one or more intuitionistic fuzzy preference relations. In Section IV, we present a roughfuzzy hybridization method for feature ranking and illustrate the proposed method on the base of some simple examples. The results of experiments on the accuracy of the feature ranking methods in the context of classification task are reported in Section V. The conclusions and plan for future research are presented in Section VI.

II. BASIC NOTIONS
In this section we present some fundamental knowledge about intuitionistic fuzzy set, intuitionistic fuzzy relation and feature reduction in rough set theory.

A. Intuitionistic fuzzy sets and Intuitionistic fuzzy matrices
Fuzzy set (FS) theory which was introduced by Zadeh in 1975 [8] just considers the problems with the degree of membership and non-membership without mentioning the degree of hesitation of no decision-making. The Atanassov's intuitionistic fuzzy set (IFS) theory [9] considers fully expressing affirmation, negation and hesitation of decision-makers. Therefore, with real-life situations, IFS theory solves the problems more successfully than FS theory. In this part, some basic notions related to IFS are recalled.
The class of IFS in a universe X is denoted by IFS(X).
Let F be the set of tuples (a 1 , a 2 ), where a 1 , a 2 * [0, 1] and a 1 + a 2 f 1, i.e., A partial order f F over F defined by: The elements of F are called the intuitionistic fuzzy values (IFV), of which (0, 1) is the least element and (1, 0) is the greatest element. The operations in (F, f F ) are defined by Each IFS of a universe X is in fact a map from X to F. If A and B are two IFSs defined by (µ A , ν A ) and (µ B , ν B ), correspondingly, then the union, intersection and complement are defined as follows: Relations (between two sets X and Y ) in traditional set theory are defined as subsets of the Cartesian product X × Y .
It is quite natural to define intuitionistic fuzzy relations as IFSs in X × Y . If X = {x 1 , · · · , x m } and Y = {y 1 , · · · , y n }, then any intuitionistic fuzzy relation in X ×Y can be represented by an m × n matrix R = (ρ ij ) m×n , where ρ ij = (µ ij , ν ij ) * F is the IFV describing the membership and non-membership of (x i , y j ) to this relation. The concepts of intuitionistic fuzzy relation and intuitionistic fuzzy matrix (IFM) have been studied by many authors [10], [11], [12]. IFM is a generalization of Fuzzy Matrix and has been useful in dealing with decision-making, clustering analysis, relational equations, etc.
Since IFM is an extension of FM by replacing the values from [0, 1] by IFV, i.e. elements of F = {(a 1 , a 2 ) * [0, 1] 2 : a 1 +a 2 f 1}, and the fuzzy operations ( and ' were extended for the elements of F. Most of operations on fuzzy matrices can be also extended for IFMs. In particular, , for i = 1, · · · , m; j = 1, · · · , n then " disjunction and conjunction are defined by: " Comparison: A f B ô a ij f b ij for all i = 1, 2, ..., m and j = 1, 2, ..., n. " The tranpose of A = (a ij ) m×n is A T = (x ij ) n×m , where x ij = a ji . " The composition of two relations or the product of two matrices A * F m×n and C * F n×l is the matrix This operation is denoted by D = A ç C.

B. Feature selection and feature ranking problem
In machine learning, a classification task is defined as the problem of learning the partition of a set of objects into subsets called decision classes (or briefly classes). The partition should be expressed in terms of object features. Let U be a set of objects. Any function a : U ³ V a , where V a is the domain (the set of possible values) of a, is called a feature or attribute for U . If a is a measurement such as a person's weight, height, blood pressure or the weather temperature, i.e. V a is a real interval, then a is called the numeric or quantitative feature. Otherwise, if the values in V a are not comparable, or if they can not be ordered in a linear order, then a is called categorical, symbolic or qualitative feature.
Decision table is a tuple T = (U, A * {d}), where U = {u 1 , u 2 , · · · , u n } is a finite set of objects, A = {a 1 , a 2 , · · · a m } is a finite set of features called conditional attributes and d is the decision attribute, i.e. the attribute defining the partition of objects into decision classes. Any The goal of the feature selection (FS) problem for decision table T = (U, A * {d}) is to determine the minimal subsets of A satisfying a particular classification performance requirement. Feature ranking problem is to order the feature in a ranking list so that the more important features are at the beginning of the ranking list while the less important features are at the end of the ranking list.

C. Rough Sets and feature selection problem
In rough set theory, the FS problem is formulated as a problem of searching for reducts. Intuitively, reducts of a decision table T = (U, A * {d}) are the minimal subsets (with respect to inclusion) of the set of all attributes A that guarantee the classification performance of the reduced decision table.
In the pioneering paper in rough set theory [13], [14], [15], the classification performance was defined in terms of the discernibility between the objects. Formally, for any subset B ¢ A and two objects u, v * U , we say that The set B ¢ A is called the reduct of decision table T if " For any u, v * U , if u, v are discernible by A and discernible by {d} then u, v are discernible by B; " No proper subset of B satisfies the previous condition. This original concept of reducts has been generated by many researchers [16], [17], [18]. However, the solution space for the problem of searching for the reduct with minimal cardinality is 2 m 2 1 where m = |A|.
The first approach to minimal reduct problem has been proposed in [14]. For the given decision table The example of discernibility matrix is presented in Table II.
One can notice that a subset of attributes B ¢ A discerns a pair of objects u i , u j * U if and only if B + S ij ; = '.
Let us notice that for any u i , u j * U we have S ij = S ji . That's why, in case of the decision table with two decision classes, the discernibility matrix can be simplified into p × q matrix where p and q are the cardinalities of the two decision classes.
Since minimal reduct calculation problem is NP-hard, many heuristics algorithms are using random permutations of features [16], [19] as a nondeterministic policy and the algorithm is searching for the reduct according to the attribute order defined by the given permutations.
In this Section we present a method of using Intuitionstic Fuzzy Sets to approximate the concept of fuzzy preference [20] [21], [22].

Definition 3 (intuitionistic fuzzy preference relation). An intuitionistic fuzzy preference relation
n is an intuitionistic fuzzy value, composed by the certainty degree µ ij to which x i is preferred to x j and the certainty degree v ij to which x i is non-preferred to x j , and π ij = 1 2 µ ij 2 ν ij is interpreted as the hesitation degree to which x i is preferred to x j . Moreover, µ ij and ν ij satisfy the following conditions: for all i, j = 1; 2; · · · , n.
Usually, the intuitionistic fuzzy preference relation expresses the opinions of the decision makers about each pair of choices (alternatives), but we would like to convert this relation into a linear order (a ranking list). We can do it by assigning a weight w i to the i-th choice so that the higher weight means the more preferred choice. Without lost of generality, we can assume that the weight vector can be determined in form of a probability vector, i.e. a vector w = (w 1 , w 2 , . . . , w n ) T such that w i * [0, 1] for i = 1, · · · , n and n i=1 w i = 1. We can define the concept of consistent preference relation as follows: Definition 4 (Additive consistent preference relation). An intuitionistic fuzzy preference relation B = ((µ ij , ν ij )) n×n on X = {x 1 , · · · , x n } is an additive consistent preference relation if there exists a probability vector w = (w 1 , w 2 , . . . , w n ) T satisfying the condition: for all 1 f i < j f n.
It is obvious that not every intuitionistic fuzzy preference relation is also an additive consistent relation. In this case the condition in Eq. (1) can be relaxed by introducing the nonnegative deviation variables l ij and r ij for 1 f i < j f n such that for all 1 f i < j f n. As the deviation variables l ij and r ij become smaller, B becomes closer to an additive consistent intuitionistic fuzzy preference relation. Therefore, in order to find the smallest deviation variables one can developed the following linear optimization model [23], [21]: Let δ o be the optimal value and let l o ij and r o ij for 1 f i < j f n be optimal deviation values of the optimization model (A1). One can see that if δ o = 0 then B is an additive consistent intuitionistic fuzzy preference relation. Otherwise, we can improve the additive consistency of B by defining the new intuitionistic fuzzy preference relationB = ((μ ij ,ν ij )) n×n , where Based on matrixB we can calculate the priority weight vector w = (w 1 , . . . , w n ) T by establishing the weight intervals [w 2 k , w + k ] for each k = 1, · · · , n. In order to do that, we solve the following optimization models Model (A2): for each k = 1, 2, · · · , n: w i g 0 for i = 1, · · · , n, n j=1 w j = 1. where (*) must be true for all 1 f i < j f n.
It has been shown [23] that ifB is additive consistent then Model (A2) will return an unique solution for the considered optimization problem.
Definition 5 (Multiplicative consistent preference relation:). An intuitionistic fuzzy preference relation B = ((µ ij , ν ij )) n×n on X = {x 1 , · · · , x n } is an multiplicative consistent preference relation if there exists a probability vector w = (w 1 , w 2 , . . . , w n ) T satisfying the condition: Checking for the multiplicative consistency is quite similar to the additive consistency. In this case, we can establish the optimization model (M1). In contrast to model (A1), this model is nonlinear.
Let δ 7 be the optimal value and let l 7 ij and r 7 ij for 1 f i < j f n be optimal deviation values of the optimization model (M1). One can see that if δ 7 = 0 then B is an multiplicative consistent intuitionistic fuzzy preference relation. Otherwise, we can improve the multiplicative consistency of B by defining the new intuitionistic fuzzy preference relation B 7 = ((µ 7 ij , ν 7 ij )) n×n , where Based on matrix B 7 we can calculate the priority weight vector w = (w 1 , . . . , w n ) T by establishing the weight intervals [w 2 k , w + k ] for each k = 1, · · · , n. In order to do that, we solve the following optimization models Model (M2): for each k = 1, 2, · · · , n: w i g 0 for i = 1, · · · , n, n j=1 w j = 1. where (*) must be true for all 1 f i < j f n.

IV. HYBRID METHOD FOR FEATURE RANKING PROBLEM
In this section we present a rough-fuzzy hybridization technique for searching for the optimal ranking list of features, called RAFAR (Rough-fuzzy Algorithm For Attribute Ranking). We introduce the concept of fuzzy discernibility matrix, which is a generalization of discernibility matrix in rough set theory, and combine it with the ranking calculation methods from intuitionistic fuzzy preference relations.

A. Construction of IFPR from decision table
The general framework of our proposition is presented in the Fig. 1: For a given decision table T = (U, A * {d}), where U = {u 1 , u 2 , · · · , u n } and A = {a 1 , a 2 , · · · a m }. To simplify the description, let us assume that d is a binary decision attribute, e.g., V d = {21, 1}. The proposed methods also work for the multi-class case.
Let RED(T ) = {R * A : R is a reduct in T } denotes the set of all reducts of the decision table T . In [24], the authors classified the attributes into 3 categories: " Core attributes: the attributes that occur in all reducts: Reductive attributes: the attributes that present in at least one reduct: The attribute is called redundant if it is not a reductive attribute. Our aim is to generate a feature ranking that at least follows this classification.
Let consider the case of decision table with symbolic values. We define the two decision classes by For each feature a k * A, we define a function P a k : We can see that if a k is a symbolic feature then P a k is a relation between U 2 and U + in the traditional meaning. This relation is called the discernibility relation [14]. Moreover, if M(T ) = (S ij ) n×n is the discernibility matrix of T , then P a k (u i , u j ) = 1 if and only if a k * S ij .
In case of numeric features, instead of the discernibility relation, we will define a fuzzy discernibility relation. If a k * A be a real value feature, we define a fuzzy membership function in U + × U 2 for the relation P α,β a k (u i , u j ) as follows: where 0 < α < β are real parameters and d k (u i , u j ) = |a k (u i ) 2 a k (u j )|. Now, we propose the following method for construction of IFPR over the set of features A. For each a k * A, we define a function Score a k : Intuitively, the value Score a k (u i , u j ) determines the probability that a k should be selected in order to discern u i from u j . For any pair of features a k , a l * A, we define the following sets: Score a k (u i , u j ) < Score a l (u i , u j )} Using those sets, we can calculate the following values The discernibility IFPR: P dis = ((µ kl , ν kl )) m×m as follows

B. The illustrated examples
Consider an exemplary decision table shown in Table I. This table was created by taking first 10 objects from the famous "weather data set" and adding one more feature (smog) as the fifth feature. The simplified version M 2 1 of the discernibility matrix for T 1 is presented in Table II. One can see that this new decision table has exactly 2 reducts: R 1 = {a 1 , a 2 , a 4 } and R 2 = {a 1 , a 3 , a 4 }. According to [24], the features a 1 and a 4 are called the core attributes and a 5 is the redundant attribute.
The sums of scores in previous sets are: x 12 = 0.5 + 0.33 + 0.5 + 1 + 0.5 + 0.5 = 3.33 z 12 = 0.25 + 0.5 + 0.33 + 0.5 + 0.5 = 2.083 y 12 = 3.54 Therefore:     Now we can use the models (A1) and (A2) to find the feature ranking that is additively consistent with P dis . As a result we receive: (w 1 , w 2 , w 3 , w 4 , w 5 ) = (0.468, 0.214, 0.214, 0.104, 0) This means a 1 is the best and a 5 is the worst feature. Let us consider the decision table T 2 , which is almost the same as T 1 . The only difference is that, a 2 (temperature) and a 3 (humidity) are numeric features.
The discernibility relations for a 1 , a 4 , a 5 remain unchanged. As an example, for a 2 , we use the fuzzy discernibility relation P α,β a2 with α = 2 and β = 12 and for a 3 , we use the  with α = 5 and β = 15. The fuzzy discernibility relations as well as the coresponding Score functions for features are presented in Table VI and  Table VII.  The Score functions can be used to construct the IFPR in the same way as previously. As the result we receive the following matrix: We can see that in this case the feature a 3 become almost important as the feature a 1 , and the redundant feature a 5 is still located at the end of the ranking.

C. Simplified ranking method:
The presented above method for feature ranking has quite high computational complexity. The time complexity of this proposition is O((n · m) 2 ), where n is the number of objects and m is the number of attributes. In this Section we propose a heuristic solution called sRAFAR (simplified Rough-fuzzy Algorithm For Attribute Ranking), which is applicable for the data sets with larger number of objects. The idea is to generate a simplified IFPR instead of the full method presented in Section IV.A. In particular, for any continuous feature a k * A, we discretize its domain into k equal length intervals and use the binary discernibility relation for the discretized feature.
Then the simplified IFPR: P s = ((µ 2 kl , ν 2 kl )) m×m can be defined by kl is the probability that a pair of objects is discernible a k but not discernible by a l , and ν 2 kl is the probability that a pair of objects is discernible a l but not discernible by a k . We have the following theorem: The proof of this fact is similar to the properties of the MD-heuristic in [24].

V. EXPERIMENT RESULTS
In this section, we present the application of our feature ranking methods for the WDBC data set [25]. The WDBC dataset contains features extracted from digitized image of a fine needle aspirate of a breast mass which describes the characteristics of the cell nuclei in the image. This dataset consists of 569 instances with 30 attributes and two decision classes. The features are encoded by V 1, V 2, · · · , V 30. We will compare the quality of feature ranking lists generated by: " RAFAR: Rough-fuzzy Algorithm For Attribute Ranking; " sRAFAR: simplified version of RAFAR; " Random Forest Feature Importance; 1 " No ranking, i.e. using the original feature list: V1, V2, ..., V30.  In order to analyze the quality of a ranking list of features (attributes), we select a classifier (classification algorithm) and apply it to the sub-dataset restricted to the first m features for m = 1, 2, · · · , 30. For a fixed value of m we evaluate the accuracy of the classifier using 5-fold-cross-validation technique.
The first classifier in our experiment is kNN. Figure 2 presents the accuracy of kNN on the whole data set for different values of the parameter k. We can see that the optimal value of k for kNN classifier equals 9. Therefore we will select kNN with k = 9 in the first experiment.
In Fig. 3 the accuracy of kNN with k = 9 using first m features of the ranking list for m = 1, 2, · · · , 30 is presented. One can see that the accuracy of ranking lists generated by both of our algorithms outperform the other ranking lists.
In Fig. 4 the accuracy comparison of decision tree classifier using first features in the ranking lists is presented. We can see that in this case, the ranking list generated by the RAFAR algorithm seems to be best, especially when we want to use up to 17 features.  Figures 5 present the accuracy comparison for SVM classifier. In this case we notice the fact that the ranking list generated by the sRAFAR algorithm is the best one.

VI. CONCLUSIONS AND FUTURE RESEARCH
In this paper we proposed a new method for feature ranking. We constructed the Intuitionistic Fuzzy Preference Relation (IFPR) for the set of features and searched for the optimal feature ranking that is consistent with the IFPR. All experiments are showing that the proposed rough-fuzzy algorithms for attribute ranking outperform the state of the art method (Random Forest Feature Ranking is the main feature ranking method in the Python scikit-learn library https://scikit-learn.org/stable/). We can conclude that the proposed methods are promising and should be thoroughly investigated.
One of the future research direction is multi-criteria feature ranking instead of a single preference relation defined on the based of discernibility power of the features as t has been proposed in RAFAR. The general framework is shown in Fig 1. This idea is motivated by the real life decision making process, where a decision is usually made by a group of experts, E k (k = 1, 2, · · · , m) with different weights λ = (λ 1 , · · · , λ m ), where m k=1 λ k = 1 and λ k g 0 for k = 1, ..., k. In such cases, the individual preference relations of the experts are aggregated to derive a collective preference relation. Let B (k) = ((µ (k) ij , ν (k) ij )) n×n be the intuitionistic fuzzy preference relation of the expert E k , the aggregated preference relation B is defined by B = m k=1 λ k · B (k) . In other words B = ((µ ij , ν ij )) n×n , where Theorem 2. If B (k) are intuitionistic fuzzy preference relation of the expert E k for k = 1, ..., m and the weight vector λ = (λ 1 , · · · , λ m ) is a probabilistics vector, i.e. m k=1 λ k = 1 and λ k g 0 for k = 1, ..., k, then B is also an intuitionistic fuzzy preference relation.
In such situations, we can apply both ranking methods (i.e. either the models (A1), and (A2) for the additive consistency requirement or the models (M1) and (M2) for the multiplicative consistency requirement) for the collective intuitionistic fuzzy preference relation B.
Following this idea, in case of feature ranking problem, we can create more IFPR with different aspects and include them into the calculation process. For example, another preference relation could be calculate on the base of the class homogeneity of features.
We also plan to verify the accuracy of RAFAR and sRAFAR for bigger and more challenging data sets.

ACKNOWLEDGMENT
The authors would like to thank the Vietnam Institute for Advanced Study in Mathematics (VIASM) for the support during the time they had been visiting and working at the Institute.

Decision table T
IFPR P 1 IFPR P 2 IFPR P n Aggregated IFPR P = λ i P i Feature Ranking Fig. 6: The general framework for multi-criteria feature ranking.