Hybridization of Fuzzy Sets and Rough Sets: Achievements and Opportunities

Fuzzy rough sets are the fruit of an intense and long-lasting collaboration effort between fuzzy set theory and rough set theory. Seminal research on the hybridization originated in the late 1980's, and has inspired generations of researchers from around the globe to address both theoretical and practical challenges. In this paper, we gauge the state-of-the-art in this domain and identify opportunities for further development. In particular, we highlight the potential of fuzzy quantifiers in creating new robust fuzzy rough models, we advocate closer integration with granular computing as a stepping stone for designing rule induction algorithms, and we contemplate the role of fuzzy rough sets vis-a-vis explainable artificial intelligence.


I. INTRODUCTION
F UZZY ROUGH SETS emerge as a combination of fuzzy sets (Zadeh [62], 1965) and rough sets (Pawlak [43], 1982): while the former model vague information by recognizing that membership to certain concepts, or logical truth of certain propositions, is a matter of degree, the latter handle potentially inconsistent information by providing a lower and upper approximation of a concept, using the equivalence classes of an indiscernibility relation as building blocks. Both frameworks can be integrated from at least three different perspectives: 1) Concepts may be fuzzy rather than exact, allowing that objects belong to it to varying degrees. For example, in a data set containing information about hotels, one may be interested to characterize the concept of hotels considered as "expensive", an inherently vague predicate. Then, each hotel's membership to the fuzzy concept "expensive hotel" will be expressed using a value between 0 (not belonging to the concept at all, i.e., not expensive) and 1 (fully meeting the concept's membership conditions, i.e., definitely expensive). 2) The indiscernibility relation, expressing that objects may or may not be distinguished from each other, may be gradual rather than strict; this reflects the intuitive idea that some objects are more similar to each other than others, and therefore they should be related to a higher degree (again a value between 0 and 1).
This work was supported by the Odysseus programme (grant number G0H9118N) of the Research Foundation -Flanders (FWO).
3) The condition for belonging to the lower and upper approximation may be expressed using fuzzy quantifiers. For example, classically an object is a member of the lower approximation of a concept if all objects indiscernible from it also belong to the concept. Here, instead of the traditional universal quantifier ("), one may use a fuzzy quantifier like "most". The purpose of such a relaxation is to introduce a measure of tolerance towards inconsistency into the approximations, making them more robust.
Given that many of the aforementioned applications are instance-based or greedy approaches, whereas most successful applications of the rough set paradigm are rule based systems (see e.g. [23], [24]), it may be argued that the full potential of the hybrid theory has not yet been tapped (some notable exceptions are [22], [65], [66]), taking into account also the vast body of existing research on fuzzy rule based systems [2], [18], [26]. An important key to this logical next step lies in the field of granular computing [4], [59], an information processing paradigm centered on the segmentation of complex information into smaller pieces called information granules. Both rough sets and fuzzy sets relate to granular computing; it is well-known that the lower and upper approximation can be represented as unions of simple sets or granules induced from data [60], while Zadeh [64] identified fuzziness as a key part of the granulation in human cognition, and with the help of fuzzy granules, a fuzzy rule based system may be set up. An important advantage of fuzzy rules, and of fuzzy logic in general, is that they allow for explanations using linguistic expressions. This property can be utilized for the development of interpretable machine learning algorithms, a research direction which is currently attracting a lot of researchers' attention [36].
The remainder of this paper is structured as follows: in Section II, we recall important preliminaries from both rough set and fuzzy set theory, while in Section III, we outline the main steps and results of the hybridization process. In Section IV, we pay attention to robust fuzzy rough set models, which play an important role for practical applications of the theory. Finally, in Section V, we offer an informal discussion of ongoing challenges and new opportunities for the hybrid theory.

A. Rough sets
We first recall Pawlak's definition [43] of rough sets, which is also called the Indiscernibility-based Rough Set Approach (IRSA).
Definition 2.1: Let U be a set of objects and E an equivalence relation expressing indiscernibility, i.e., E is The couple (apr E (A), apr E (A)) is called the rough set of A.
The equations (1) and (2) can be expressed equivalently using logical operators: for u * U , It is also easily verified that apr E (A) ¦ A ¦ apr E (A), which justifies the terms "lower and upper approximation". Moreover, the rough set approximations satisfy various other properties, for example set monotonicity: which expresses that if a concept becomes larger, its approximations naturally should not decrease. On the other hand, relation monotonicity: states that when equivalence classes become larger (more objects are indiscernible from each other), the lower approximation gets smaller, while more objects populate the upper approximation.
In case apr E (A) = apr E (A), we call A an exact set. An equivalent way of expressing that A is exact is In other words, A can be seen as a union of basic building blocks or granules, which correspond to equivalence classes of E. We call (7) the granular representation of A, and A is also called a granularly representable set. The following proposition highlights the special role of the lower and upper approximation as specific exact sets. Proposition 2.2: For A ¦ U , the greatest granularly representable set that is included in A is equal to apr E (A), while the smallest granularly representable set that includes A is equal to apr E (A). On the other hand, the granular representation is also closely connected with the notion of consistency. Proposition 2.3: Set A ¦ U is granularly representable if and only if it satisfies the consistency property, i.e., iff Consistency expresses that if two objects are indiscernible and one of them belongs to a given concept A, the second object should necessarily also be part of the concept. This property is desirable in classification problems, where the goal is to establish meaningful patterns that allow to decide the membership of unseen objects to given decision classes. In this context, objects are also called instances and are characterized by their values for a number of attributes from a set A. The domain of every attribute a * A consists of a finite number of nominal values, and every instance u * U takes one of those values denoted with a(u). Then, the equivalence relation E is constructed as The granular representation of rough sets is in particular very useful from the perspective of rule induction. The problem of rule induction for classification tasks amounts to generating a set of rules which relate descriptions of objects by subsets of attributes with particular decision classes. Basic granules, from which rough sets are composed, can be interpreted as human readable "if..., then..." rules, and can be used to construct a rule based inference system as a prediction model. Well-known examples of rule induction algorithms are LEM2 [23] and MODLEM [24].
Pawlak's theory has been generalized in various different ways. For example, dropping the symmetry requirement from E leads to the Preorder-based Rough Set Approach (PRSA, [40]), which contains as a special case the Dominance-based Rough Set Approach (DRSA, [21]). In the latter, the domain of attributes now contains ordinal values, and the indiscernibility relation is replaced by a dominance relation. For clarity and brevity of the exposition, in the remainder of this paper we will focus on the indiscernibility-based approach, although many of the presented results remain valid for more general settings.

B. Fuzzy sets
Given a universal set U , Zadeh defined a fuzzy set A in U simply as a mapping from U to the unit interval [0, 1], where A(u) is called the membership degree of object u to A. It expresses to what extent u satisfies the vague property expressed by the fuzzy set A. For example, if U is a set of hotels from a given area, we may evaluate their expensiveness, based on their quoted nightly rate for a double room, as a fuzzy set A in U . Clearly, the assignment of membership degrees is both subjective and context-dependent, as it would depend for example on the budget of the person making the booking, and the area where the search is performed. However, an intuitive constraint in this case would be that the higher the quoted rate, the larger the membership degree should be. In this example, as in most practical applications of fuzzy set theory, there is an underlying numerical scale (a subset of the real numbers) on which the evaluation is made, and the ordering on that scale constrains the assignment of membership degrees.
In a similar vein, a binary fuzzy relation R in U is defined as a fuzzy set in U 2 , i.e., for any two objects u and v in U , R(u, v) expresses the degree to which they relate. Fuzzy relations may be used for example to generalize the equivalence relation E from Section II-A, to establish to what extent two objects are similar (as opposed to a black-or-white assessment whether they are indiscernible or not). Such a fuzzy relation R should be at least reflexive and symmetric, i.e, should hold for any u and v in U . In order to accommodate for the transitivity property, we first need an extension of the classical conjunction operator '. Definition 2.4: A triangular norm, or shortly t-norm, is a mapping T : [0, 1] 2 ³ [0, 1] that is commutative, associative, increasing in both arguments, and that satisfies the boundary Well-known representatives of the class of t-norms include the minimum, the product, and the Łukasiewicz t-norm defined by T Ł (x, y) = max(0, x + y 2 1) for x, y in [0, 1]. The choice for a particular t-norm depends on the particular properties that one is interested in; for a comprehensive overview, we refer to [31].
Using a t-norm, we may now impose a kind of transitivity on fuzzy relations, and therefore extend the notion of an equivalence relation.
Definition 2.5: Let T be a t-norm. A fuzzy relation R in U that is reflexive, symmetric and satisfies for any u, v, w in U is called a fuzzy T -equivalence relation. Apart from logical conjunction, we will also require an extension of the boolean implication operator ó. Definition 2.6: An implicator is a mapping I : [0, 1] 2 ³ [0, 1] that is decreasing in its first argument and increasing in its second one, and that satisfies the boundary conditions I(0, 0) = I(0, 1) = I(1, 1) = 1 and I(1, 0) = 0. There exist numerous ways to construct implicators. Again, a detailed overview is out of the scope of this paper, and may be found in e.g. [3]. A popular approach is to associate implicators to t-norms by means of residuation, leading to the following definition of residuated implicators, or shortly R-implicators.
Definition 2.7: The R-implicator I T associated to a t-norm T is defined by, for x, y in [0, 1]: As an example, the R-implicator associated to the Łukasiewicz t-norm can be obtained as I T Ł (x, y) = min(1, 1 2 x + y). Finally, we recall that subsethood for fuzzy sets is defined as follows [62]: for fuzzy sets A and B in U , III. HYBRIDIZATION: GENERAL FUZZY ROUGH SET MODEL The equations (3) and (4) for determining membership to the classical lower and upper approximations can be "fuzzified" by making use of fuzzy logical connectives. This leads to the following definition [48], [6], [11].
Definition 3.1: Let A be a fuzzy set in U , R a fuzzy relation in U , I an implicator and T a t-norm. The lower and upper approximation of A are defined as, for u * U , The couple (apr R (A), apr R (A)) is called the fuzzy rough set of A. If apr R (A) = A = apr R (A), A is called an exact fuzzy set. Take note how these definitions extend their classical counterparts: 1) Object u belongs to the lower approximation of A to the extent that for all objects v, if v is related to u by R, then v should belong to A. 2) Object u belongs to the upper approximation of A to the extent that there exists an object v, such that if v is related to u by R, and v belongs to A. In other words, the inf and sup operators naturally represent the " and # quantifier from Eq. (3) and (4), respectively, while the implicator I and t-norm T fulfil the role of the logical implication ó and conjunction '. When A is a classical, nonfuzzy set and R is a crisp equivalence relation, we again obtain Pawlak's model. Depending on the specific choice of fuzzy connectives I and T , and the fuzzy relation R, some properties of this original model may or may not be preserved (see [12] for more details).
An important question is whether exact fuzzy sets possess a granular representation analogous to Eq. (7), as it would allow the above approximations to be used for generating fuzzy rules in a similar way as is done with crisp granules. Degang et al. [14] were the first to address this issue by formalizing the notions of a fuzzy granule and granular representability of fuzzy sets.
Definition 3.2: Let R be a fuzzy T -equivalence relation in U for a given t-norm T and λ * [0, 1]. The fuzzy granule corresponding to R, λ and T is the fuzzy set R λ in U , defined by We call a fuzzy set A in U granularly representable if In other words, A is granularly representable if it is the union of fuzzy granules R λ (u), where λ = A(u) for each object u.
The following proposition reveals that for particular choices of I and T , exact fuzzy sets indeed correspond to granularly representable ones, and vice versa. Proposition 3.3: Let A be a fuzzy set in U , T a leftcontinuous t-norm and I its R-implicator. Then A is exact if and only if it is granularly representable. Along the same lines, we can also generalize Proposition 2.2 and 2.3.
Proposition 3.4: For a fuzzy set A in U and a fuzzy T -equivalence relation, the greatest granularly representable fuzzy set that is included in A is equal to apr R (A), while the smallest granularly representable set that includes A is equal to apr R (A).
Proposition 3.5: Let R be a fuzzy T -equivalence relation. Fuzzy set A in U is granularly representable if and only if it satisfies the fuzzy consistency property, i.e., iff Note how fuzzy consistency provides us with a softened version of Eq. (8): the more similar u and v are, and the higher u's membership to A, the more v should also belong to A.

IV. ROBUST FUZZY ROUGH SETS
The model of fuzzy rough sets described in the previous section offers considerable strength and flexibility, and lends itself very well for handling datasets with real-valued attributes, where fuzzy T -equivalence relations can be constructed by taking into account the distance between individual instances' attribute values. However, it may still be too rigid when applied in practical problems of data analysis, due to the occurrence of outliers. By the latter, we mean instances that do not follow the general data distribution, and which may negatively impact the quality of the fuzzy-rough approximations. In extreme cases, the lower approximation of a concept may be empty, while its upper approximation may contain all instances fully.
The root of the problem lies in the use of the inf and sup operators which, as we explained, correspond to the " and # quantifier, respectively. Because of this, an instance u will be fully excluded from apr R (A) as soon as there exists another instance v such that R(v, u) = 1 and A(v) = 0, while on the other hand u will fully belong to apr R (A) when an object v can be found such that R(v, u) = 1 and A(v) = 1. This will occur independently of the choice of the implicator I and the t-norm T . While this effect may be mitigated by a thoughtful choice of the fuzzy relation R, it cannot be ruled out altogether as (partial) inconsistencies are commonplace in real applications.
In classical rough set theory, researchers also faced this problem, leading to probabilistic approaches like Ziarko's Variable Precision Rough Set (VPRS) model [67]. The latter relaxes Eq. (3) and (4) into where 1 g p > q g 0 are parameters of the model. In other words, an object belongs to the VPRS lower approximation if at least a fraction p of its equivalence class belongs to A, while it belongs to the upper approximation if more than a fraction q of [u] E is inside A. The model assumes that U is finite (which is not a problem considering that its application is in data analysis), and that p > q, to ensure that apr p E (A) ¦ apr q E (A). When p = 1 and q = 0, we recover Pawlak's original equations (3) and (4). In general, probabilistic rough set approaches have been exploited successfully for classification purposes, most notably within Yao's framework of three-way decisions [61].
Ziarko's VPRS model served as an inspiration source for different robust fuzzy rough set proposals. One of them, the Vaguely Quantified Rough Set (VQRS) model [5] softens the membership criterion for an object u to belong to the lower approximation of A into "most elements of [u] E are inside A". Similarly, u belongs to the VQRS upper approximation of A to the extent that "at least some elements in [u] E belong to A". To formalize this idea, the inherently fuzzy quantifiers "most" and "at least some" are modeled as specific fuzzy sets in the unit interval [63]: Definition 4.1: A fuzzy set Q in [0, 1] is called a regular increasing monotone (RIM) quantifier if Q is non-decreasing, Q(0) = 0 and Q(1) = 1. The class of RIM quantifiers includes as special cases the existential and the universal quantifier: Examples of RIM quantifiers that also take on values from the interior of the unit interval can be obtained using the following parametrized formula [5], for 0 f α < β f 1, and x in [0, 1], For example, Q (0.1,0.6) and Q (0.2,1) could be used to represent the fuzzy quantifiers "at least some" and "most" from natural language. They are depicted in Figure 1. In general, assuming RIM quantifiers Q 1 and Q 2 such that Q 1 ¦ Q 2 , we may define the VQRS approximations of A: for u * U , The above definition has the peculiarity that although both the set to be approximated and the equivalence relation are non-fuzzy, the resulting approximations may well be fuzzy. The reasoning behind this is that for example the membership degree in Eq. (21) evaluates the degree of fulfilment of the condition "Q 1 elements of [u] E are in A". Note that if Q 1 = Q " and Q 2 = Q # , we again arrive at Pawlak's lower and upper approximation, while the VPRS equations (19) and (20) are also special cases of (19) and (20) using the RIM quantifiers The VQRS equations may be further generalized to a fuzzy set A and a fuzzy relation R; for details, we refer to [5]. Despite its intuitive appeal, the VQRS model has an important shortcoming which it shares with the VPRS model: it does not satisfy relation monotonicity, Eq. (6). This is in particular problematic in applications where the (fuzzy) indiscernibility relation is iteratively refined by adding more information (additional attributes). For example, in [7] it is shown how this affects the operation of the greedy QuickReduct feature selection algorithm. A solution to this problem can be found by revisiting the equations (15) and (16) and replacing the inf and sup operators by less extreme ones. First note that for finite universes, inf and sup correspond to min and max, respectively. So, the lower approximation (15) is determined solely by the smallest one among all I(R(v, u), A(u)) values, and the single largest value T (R(v, u), A(u)) will set the upper approximation. A more balanced evaluation is offered by using ordered weighted average (OWA) operators [57]: given an input vector of n g 1 real values a 1 , . . . , a n and a weight vector W = w 1 , . . . , w n such that each w i * [0, 1] and n i=1 w i = 1, we first order the input values a i from large to small obtaining c 1 , . . . , c n and then compute OW A W a 1 , . . . , a n = This leads to the definitions of the OWA-based lower and upper approximation [9]: Using W 1 = 0, . . . , 0, 1 and W 2 = 1, 0, . . . , 0 , we obtain the original Eq. (15) and (16). Because of the monotonicity properties of T , I and the OWA operator, relation monotonicity is guaranteed, and moreover it always holds that apr R (A) ¦ apr W1 R (A) and apr W2 R (A) ¦ apr R (A) (26) In other words, OWA-based fuzzy rough sets indeed relax the original definitions, enlarging the lower approximation and restricting the upper one. In [56], different weighting schemes were discussed and evaluated experimentally.
Recently, in [41] it was shown that for certain choices of the t-norm T (including the product and Łukasiewicz tnorm, but not minimum), the OWA-based lower and upper approximations of any fuzzy set A are exact sets, i.e.
This means in particular that these approximations possess a granular representation, and can be used as a basis for fuzzy rule induction algorithms, as discussed in the next section.

V. DISCUSSION: CHALLENGES AND OPPORTUNITIES FOR FUZZY ROUGH SETS
Even though the VQRS model, considering its violation of the relation monotonicity property, has been mostly abandoned in favour of the OWA-based approach, its interpretation of membership to the fuzzy-rough approximations in terms of fuzzy quantifiers expressing "most" and "at least some" is arguably more intuitive and transparent than the one using the rather less compact representation of OWA weight vectors.
Yet this does not mean that OWA fuzzy rough sets are isolated from vague quantification. In fact, as explained in [49], from any OWA weight vector W = w 1 , . . . , w n , a corresponding RIM quantifier Q can be derived by setting and, vice versa, for every RIM quantifier Q and n g 1, the associated OWA weights w i (i = 1, . . . , n) are determined by As such, the weight vectors W 1 and W 2 featured in Eq. (24) and (25) are mutually interchangeable with their VQRS counterparts Q 1 and Q 2 . The resulting membership degrees, however, carry a different meaning; for example, apr W1 R (A)(u) should be understood as the truth value of the statement "for most objects in U , it holds that if they are indiscernible from u, then they also belong to A"; in other words, the quantification also takes into account objects completely unrelated to u (i.e., fully discernible from u), while for the computation of the membership to the VQRS lower approximation, these objects are excluded.
A more serious limitation of the OWA fuzzy rough set model lies in the fact that it treats all objects symmetrically during the aggregation process, i.e., an individual object's impact is determined merely by its fulfillment of a specific logical formula. In practice, this means that we hypothesize that "a limited amount" of objects are outliers, and that their effect will be cancelled out by the chosen weighting scheme. Suppose however that we have specific knowledge that some specific objects are in fact certainly outliers, e.g., based on an outlier score that was calculated for them separately. Then, a more natural way to evaluate whether an object u belongs to the lower approximation of concept A is by checking if all objects indiscernible from u, except perhaps those which are considered outliers, belong to A.
In order to accommodate the above and other related use case scenarios, new definitions were proposed for the fuzzyrough lower and upper approximation in [49] recently. They are based on the Choquet integral, a generalization of the classical Lebesgue integral to non-additive measures which has become popular as an aggregation function in decision making [20]. It was shown how the resulting Choquet-based fuzzy rough sets (CFRS) contain the OWA-based model as a special case, while inheriting some of its desirable properties, including set and relation monotonicity. At the same time, they also maintain the intuitive interpretation in terms of vague quantification. Considering that most of the existing fuzzy rough set models still rely on "traditional" approaches to fuzzy quantifiers such as those proposed by Zadeh [63] and Yager [58], which were shown to suffer from some serious conceptual flaws [19], this opens up exciting research opportunities involving more recent developments (see e.g. [15] for an overview).
An important unresolved question about the new CFRS model is whether it still conforms to the granular structure that the OWA-based model from Eq. (24) and (25), and the traditional fuzzy rough set model from Eq. (15) and (16) exhibit, in other words: whether its approximations are exact fuzzy sets in the sense of Eq. (18). While technical in nature, if this question can be answered positively, it opens the doors to fuzzy rule induction methods based on these fuzzy-rough approximations. Indeed, the fuzzy granules corresponding to the approximations can be used inside the antecedent part of fuzzy rules, and an unseen test object's membership in them may be interpreted as the firing strength of the corresponding rule.
As a concrete example, let us consider rule-based classification. In this case, U is partitioned into a number of decision classes (concepts). Let C be one such decision class, and denote its lower approximation, computed according to one of the models discussed in this paper, by apr(C). Then, if apr(C) is granularly representable, by Eq. (18) a corresponding decision rule will be generated for every training object u, such that for a given test object v, the firing strength of this rule is obtained as T (R(v, u), apr(C)(u)) (31) in other words, as a conjunction between R(v, u), the observed similarity between u and v, and apr(C)(u), the membership of the training object u to the lower approximation. Decision rules based on the lower approximation are usually called "certain" rules, while we refer to those based on the upper approximation as "possible" rules, distinguishing their relative strength. More generally, fuzzy decision rules can be derived from any granularly representable fuzzy set associated to decision classes, for instance from the so-called granular fuzzy-rough approximation introduced in [42], which is defined as the closest granularly representable fuzzy set (w.r.t. a certain loss function) to a given concept, and which is obtained as the result of a linear programming problem.
In practice, however, generating one rule per training object is not a viable approach, and an important challenge is therefore to design proper rule induction algorithm that can at the same time reduce the number of rules, as well as maximize the number of objects that each rule covers. Such a strategy was already pursued in [28], where it was combined with fuzzy rough set guided feature selection, and various other attempts (see e.g. [22], [38], [65], [66]) have also been made to integrate fuzzy rough sets and rule induction; yet, a convincing proposal of a "fuzzy LEM" classification algorithm is still missing and could represent a breakthrough in this domain, not in the least from the perspective of interpretable machine learning.
Indeed, the generation of compact fuzzy rules benefits the human understanding of classification algorithms based on them, as rule-based models are some of the most interpretable models, and they closely resemble human cognition [1]. In [2], various criteria were distinguished for interpretability at different levels of a fuzzy rule-based system, including linguistic variables and fuzzy granules. An integration of these criteria with fuzzy-rough rule induction is therefore at hand to develop a coherent and compact model of interpretable granular computing. Apart from the granules themselves, an important role should again be reserved for fuzzy quantifiers, as they are useful to summarize knowledge in a concise, linguistic way.