Explainability in RIONA Algorithm Combining Rule Induction and Instance-Based Learning

The article concerns the well-known RIONA algorithm. We focus on the explainability property of this algorithm. The theoretical results, formulated and proved in the paper, show the relationships of the RIONA classifiers to both instance- and rule-based classifiers. In particular, we show the equivalence (relative to the classification) of the RIONA algorithm with the rule-based algorithm generating all consistent and maximally general rules from the neighbourhood of the test case.


I. INTRODUCTION
I N THE paper, we focus on the learning algorithm for supervised learning [1], [2], [3].Specifically, we focus on the well-known RIONA algorithm [4], [5], [6].This algorithm combines two widely-used empirical approaches: rule induction and instance-based learning [7], [8], [9], [10].Both these approaches use reasoning schemes comprehensible to a human.It is essential since, Explainable AI [11], [12], [13] is becoming more and more useful in real-life applications.The classifying system should provide for the given test object not only decision but also its user-understandable explanation.
In the paper, we present theoretical results of considered algorithm that allow to meet these requirement.
A few concepts form the framework of the RIONA algorithm.First, instead of inducing an excessive number of decision rules in advance to use them during testing, it induces decision rules relevant only for the test example.This is a strategy of so-called lazy learning [14].Second, only rules from the neighbourhood of the given test example are considered.Third, it automatically groups numerical and symbolic values of attributes by using more general than commonlyused conditions.Fourth, RIONA computes optimal size for the neighbourhoods of objects.
The properties of RIONA algorithm are worth studying as it was reported in the literature as one of the most accurate classification methods in many experimental comparisons done by various researchers (the most commonly used RIONA implementation is a classifier in the WEKA platform named RseslibKnn [15]), to name a few: Facebook content recognition [16, Chapter 1] (RIONA was the best one of 21 tested algorithms), environmental sound recognition [17] (best of 9 algorithms), metabolic pathway prediction of plant enzymes [18] (2nd of 47 algorithms), acoustic-based environment monitoring [19] (2nd of 8 algorithms), context awareness of a service robot [20] (2nd of 8 algorithms), student performance prediction [21] (5th of 47 algorithms).
The novelty of the paper is in theoretical results creating the basis for explainability of classifications returned by classifiers.The results concern the relationships of the classifiers generated by the RIONA algorithm to classifiers obtained by applying instance-based as well as rule-based approaches.In particular, it occurs that RIONA classifiers are equivalent (relative to the classification property) to classifiers produced by the rule-based algorithm based on all consistent and maximally general rules generated from the neighbourhood of the test case.Such rules are easily interpretable by humans.
The paper relates to the PhD thesis [6].

II. BASIC NOTIONS
|X| denotes cardinality (size) of the set X set.If A and B are algorithms then the equality A(v) = B(w) means that the values returned by A on input v and by B on input w are the same.
By X we denote a set of objects, called the domain of learning and by Atr = A ∪ {d} a finite set of attributes atr, where atr : X −→ V atr .V atr is called the value set of atr.Attributes from A are called conditional attributes and the attribute d / ∈ A is called the decision.V d is called the decision set.We assume for simplicity of notation that V d = {1, . . ., n d }.Any object x ∈ X is represented relative to its signature Inf A(x) , i.e. a set of pairs (a, a(x)) for each a ∈ A. We use the symbols A sym and A num for denoting the sets of symbolic and numerical attributes, respectively.If a ∈ A sym then V a is a finite set.If a ∈ A num then without loss of generality, we assume that V a is equal to an interval (l a , u a ), where l a , u a ∈ R (possibly not all of the values from the interval are used).
A decision system is a triplet (X, A, d) if X is the set of objects, A is a set of attributes, and d is a decision function.A pseudometric decision system is a 4-tuple (X, A, d, {ϱ a } a∈A ) if (X, A, d) is a decision system and for any attribute a ∈ A, ϱ a is a pseudometric on the respective value set V a , i.e. for any a ∈ A, (V a , ϱ a ) is a pseudometric space [22].
In the sequel by D is denoted a given (pseudometric) decision system of the above form.
In the paper we study a combination of two methods (learning algorithms) inducing a classifier from a given subdecision system (trnSet, A, d), where trnSet ⊆ X and attributes from A ∪ {d} are restricted to trnSet, which for any x ∈ X computes the decision d(x) in such a way that d is close to d [2].It should be noted that in practical experiments, the normalized Euclidean metric was used for numerical attributes, and the SVDM [23] pseudometric for symbolic attributes.If a ∈ A num , the normalisation is obtained by using a max and a min , which denote the maximal and the minimal values for an attribute a among training examples trnSet.

A. Rule-based methods
The induction of rule sets is one of the fundamental Machine Learning (ML) techniques (see e.g.[7]).Its significance stems from the fact that a human may easily comprehend knowledge representation in the form of rules.Decision rules specify the appropriate course of action in a given circumstance.Decision rules frequently have the form 'if φ then ψ', where φ denotes the premise of the rule and ψ denotes its consequence; ψ is a formula defined by the decision attribute d.
Decision rules are generated using rule induction algorithms from a training set.The premises of the rules are represented by a conjunction of elementary conditions and the consequences are describing the particular decision.Each elementary condition describes a collection of the attribute's values.Roughly speaking, it has the form a ∈ V , where V ⊆ V a , and a is an attribute.We first determine how such sets V of values can be expressed in a formal language.Next, we define the semantics (meaning) of specified expressions from this language in the powerset of the attribute value set V a .For simplicity of notation we do not distinguish between symbols denoting values (or intervals) and values (or intervals) themselves.
Definition II.1.Let D be a (pseudometric) decision system.For symbolic attributes a ∈ A sym , the description of any elementary set has one of the following forms: B(c, r), when D is pseudometric decision system and The description of elementary set for the decision attribute d is of the form {v}, for v ∈ V d .
For numerical attributes a ∈ A num , the description of elementary set has one of the following forms: ( The semantics ||des|| D ⊆ V a of any description des of the elementary set for attribute a ∈ A ∪ {d} is defined as follows: Now, the elementary conditions expressed in a language and their semantics can be defined. Definition II.2.Let D be a (pseudometric) decision system.
Any expression a ∈ V, where a ∈ A and V is a description of elementary set for attribute a is called an elementary condition.The semantics of a ∈ V is defined by [[a ∈ V ]] D may be restricted to subsets of X, e.g.
An example (case) x satisfies the elementary condition a ∈ V (or, in short, The set ||V || D ⊆ V a for a given elementary condition a ∈ V is equal to • {v} for some v ∈ V d for the decision attribute d (see set description 2 and its semantics), • a proper interval for the numerical attribute (see set description 6 and its semantics), and • {v} for some v ∈ V a , V a or a ball set for the symbolic attribute a (see set descriptions 2, 3, 4, respectively and their semantics).A given object x satisfies the elementary condition a ∈ V if the value of a on x, i.e. a(x) is in the set defined in D by V , i.e. a(x) ∈ ||V || D .Instead of a ∈ {v} we write a = v and instead of trivial elementary condition a ∈ V a (always true, i.e. the set of objects satisfying this condition is equal to the set X) we write a = * .Now, we introduce concepts related to syntax and semantics of decision rules.

Definition II.3. Let D be a (pseudometric) decision system. A decision rule is an expression of the form
where m = |A|, t i is an elementary condition for an attribute a i for i = 1, 2..., m, and v ∈ V d .
The semantics of the premise

we say that
• the premise of the rule r is satisfied by example x (or x satisfies this premise), • example x matches the rule r, • r covers x.
One can treat a single rule as a classifier assigning examples covered by that rule to the decision class from the rule's consequence.Ideally, we could search for rules if φ then ] trnSet is available only.Rules satisfying this last condition (for trnSet) are induced from trnSet and an hypothesis on extension of the truth of this inclusion on X is made.Moreover, rules covering as many as possible examples are generated.
In description of decision rules, trivial conditions are usually omitted 1 .The typical conditions are equations a = v in case of symbolic attributes and inclusions into intervals in case of numerical attributes, e.g.: In this paper we use for symbolic attributes more general conditions, i.e. a ∈ V (see Definition II.2), making it possible to extend singleton sets to the ball sets.If the data set under consideration has some numerical attributes, then by applying discretisation the relevant intervals can be constructed.By applying discretisation to a given decision system a new one is obtained with new attributes being characteristic functions of induced intervals including objects from trnSet labeled by the same decision (see e.g.[24]).
By t i (r), where r is a given decision rule we denote the i-th condition t i from Definition II.3.We write t a (r) instead of t i (r) if the condition t i from Definition II.3 concerns the attribute a.
We define three kinds of decision rules by distinguishing elementary conditions (used in Definition II.3) occurring in them.In consequence, we obtain three sets of decision rules.
Definition II.4.Let D be a decision system.
The set SimRules of simple rules is the set of all rules from Definition II.3 with the elementary conditions of the form a = v for v ∈ V a and a = * only.
The set CombRules of combined rules is the set of all rules from Definition II.3 with elementary conditions for symbolic attributes of the form as in SimRules and for numerical attributes of the form a ∈ I (where I is a proper interval description) only.
Definition II.5.Let D be a pseudometric decision system such that for any symbolic attribute a ∈ A sym there is a distinguished specific value c a ∈ V a .
The set GenRules {(ϱ a , c a )} a∈Asym (or simply GenRules whenever pairs (ϱ a , c a ) are clear from the context or irrelevant due to generality) of general rules is the set of all rules from Definition II.3 where set descriptions in elementary conditions in the premises of the rules are (i) as in the definition of CombRules for numerical attributes and (ii) of specific form of 4, i.e.B(c, r), where c = c a , r = ϱ a (c a , v), v ∈ V a for symbolic attributes a ∈ A sym .
Definition II.
For further considerations the concept of maximality of rule will be useful.
Definition II.7.Let D be a (pseudometric) decision system and let a ∈ A, and let V 1 , V 2 be elementary conditions for a.The condition a ∈ V 2 is more general than (or is implied by) For any two rules r 1 , r 2 (over D) with the same consequence d = v, we say that a rule r 2 is more general than (or is implied by), in symbols r 1 ⇒ r 2 if t i (r 1 ) ⇒ t i (r 2 ) for i = 1, . . ., m and m = |A|.
A consistent rule r with a training set trnSet is maximally general (relative to trnSet and a given set of rules Rules) if there is no rule in Rules more general than r which is different from r and consistent with trnSet.
Definition II.8.Let D be a (pseudometric) decision system and let Rules be a given set of admissible rules.The set of maximally general rules (relative to trnSet) M axRules(Rules, trnSet) is equal to the set of all maximally general rules r ∈ Rules consistent with trnSet.
Computing from trnSet the set of all consistent and maximally general matching at least one case from trnSet is important for some learning algorithms.
In the case of M axRules(SimRules, trnSet), a consistent rule is maximally general relative to trnSet if it becomes inconsistent (relative to trnSet) after substitution of the trivial condition instead of a non-trivial one.Hence, consistent rules from M axRules(SimRules, trnSet) can be characterised as the rules with minimal length (measured by the number of nontrivial conditions in predecessors).Hence, in the considered case, the problem is to generate the complete set of consistent and minimal decision rules (see e.g.[25]).One can observe that searching for the set of minimal rules can be motivated by the minimum description length principle (MDL) (see e.g.[26]).However, the computational time complexity of algorithms generating M axRules(SimRules, trnSet) is not feasible when the number of training objects or attributes are large.In fact, the size of the M axRules set can be exponential relative to |trnSet| (see e.g.[27]).Hence, efficient heuristics are often used to overcome this drawback, especially when not necessarily complete sets of minimal rules are required (see e.g.[28]).There are also other approaches inducing a set of rules fully covering cases from trnSet (see e.g.[29]).Here, we focus on the complete M axRules set.
In the case of M axRules(CombRules, trnSet), additionally we deal with numerical attributes.From trnSet maximally general intervals of reals are induced.Searching for maximally general rules for numerical attributes is closely related to the problem of discretisation.A partition of reals in discretisation is consistent if each interval covers only objects with the same decision (see e.g.[24], [27]).
The discretisation problem is a complex task.For example, searching for a consistent partition with the minimal number of cuts is NP-hard (see e.g.[24]).In Subsection III-A we show how to overcome this drawback using lazy learning and focusing on a local part of X instead of on the whole universe.This is illustrated by the lazy rule induction algorithm Algorithm 3 or Algorithm 4.
In the case of M axRules(GenRules, trnSet), the additional search is performed for the relevant grouping of values for symbolic attributes into a partition of value sets of symbolic attributes.One can define a partition over an attribute a as any function P a : V a → {1, . . ., m a }.It should be noted that the problem of searching for a consistent family of partitions with the minimal a∈A |P a (V a )| is NP-hard (see e.g.[30]).We show in the paper how to overcome this drawback by limiting the number of possible groupings of values of any attribute (from 2 n to n2 , where n is the number of values for an attribute) and by using the lazy rule induction (see Section III-A).
The induced sets of rules from trnSet are used to classify objects.First, for any test object tst there are selected all rules from the set matching this object.Next, the set of matched rules is checked.If all rules matched by tst have the same decision then this decision is assigned to tst else it should be resolved conflict between matching rules voting for different decisions (see e.g.[31]).Typically, it is selected the decision with the highest value of a selected measure used for conflict resolution.We use the commonly used measure for conflict resolution.
Definition II.9.Let us assume that D -(pseudometric) decision system, trnSet -training set trnSet, tst -test example (case), and M axRules -set of maximally general rules are given.By supportSet(r) ⊆ trnSet, where r ∈ M axRules we denote the set of all objects from trnSet matching r, and by M atchR(tst, v) ⊆ M axRules (where v ∈ {1, . . ., n d }, i.e. v is a decision of d on some object from trnSet) the set of rules from M axRules with decision v matching the test object tst.Now, we define From definition it follows that by computing Strength(tst, v) it is counted the number of objects from trnSet covered by some maximally general rule from M axRules (i) with the decision v and (ii) covering the test example tst.
On the basis of M axRules and the defined conflict resolution strategy using Strength we define the classifier assigning to a given test object tst the most frequent decision of such training examples from trnSet which are covered by matched by tst rules from M axRules, i.e.: As it was observed, the drawback of the presented approach comes from the high computational complexity of M axRules generation.

B. Lazy rule learning for symbolic attributes
The lazy learning (or memory based learning) algorithms do not require construction of sets of decision rules before classification of new objects.
kNN is the well known example of such algorithms (see Algorithm 1).For these algorithms, first for any test object tst it is defined its neighbourhood N (tst, trnSet, k, ϱ) ⊆ trnSet (N, for short) with k the most similar to tst (relative to a given distance function ϱ) training examples (where k is a parameter).If more than one example has the same distance from tst as the one already added to the N under construction then all of them are added to N (tst, trnSet, k, ϱ).Then the set N (tst, trnSet, k, ϱ) may contain more than k examples 2 .
An interesting example of lazy rule-based algorithm for SimRules is presented in [32].For a new tst it generates only the relevant for it decision rules and next tst is classified as before on the basis of such rules.The value of Eqn. 7 is computed for any tst object without computing the whole set M axRule.
For given two objects tst, trn, we first define simple local decision rule, in symbols s-rule(tst, trn).The relationship of the set of such rules with SimRules will be presented in the following proposition.
Definition II.10.Let D be a decision system, trn ∈ trnSet and let tst be a test object.A simple local decision rule (for short s-rule) s-rule(tst, trn) is the decision rule of the form if a∈A t a then d = d(trn), where conditions t a for each symbolic attribute a are as follows  From the above considerations it follows that LAZY takes into account only these decision rules that can be involved in the classification of a given test object.

A. Extension and generalisation of lazy rule learning
We introduce an extension and generalization of the LAZY algorithm (see Algorithm 3) that was discussed in Subsection II-B.This novel algorithm permits the use of numerical attributes as well as more general conditions for symbolic attributes.
In Subsections III-A1, III-A2 we present a generalisation of rules introduced before.
For a given test object tst, training object trn ∈ trnSet and pseudomietric decision system D with pseudometrics ϱ a for a ∈ A sym , in addition to simple local decision rule (in short s-rule) (see Subsection II-B) denoted by s-rule(tst, trn), we consider two new types of local rules: combined local decision rule (in short c-rule) and generalised local decision rule (in short g-rule) denoted by c-rule(tst, trn), g-rule tst, trn, {ϱ a } a∈Asym (or simply g-rule (tst, trn)), respectively.In this way we obtain sets composed out of simple rules, combined rules and general rules, denoted by SimRules (see Subsection II-A), CombRules, GenRules, respectively.We already demonstrated (see Subsection II-B) an important relation between any srule and the set of maximally general consistent rules M axRules(SimRules, trnSet).Here, we show analogous important relations between any c-rule or g-rule and sets of maximally general rules M axRules(CombRules, trnSet), M axRules(GenRules, trnSet) corresponding to sets of rules CombRules and GenRules, respectively.

1) Extension of lazy rule learning for numerical attributes:
In this section we assume that D is a given decision system and trnSet ⊆ X.Now, we extend the definition of the local decision rule to the case of symbolic and numerical attributes.
Definition III.1.Let tst be a test object and trn ∈ trnSet.By t a for a ∈ A sym we denote a condition as in Definition II.10 and min a = min(a(tst), a(trn)), max a = max(a(tst), a(trn)) for a ∈ A num We define the combined local decision rule (for short c-rule) if a∈A T a then d = d(trn), denoted by c-rule(tst, trn), where conditions T a for a ∈ A are as follows Let us note that conditions for numerical attributes contain intervals with endpoints determined by attribute values of objects tst and trn.
The relationship of the set M axRules(CombRules, trnSet) and c-rule(tst, trn) is analogous to the relation between M axRules(SimRules, trnSet) and s-rule(tst, trn) as the following lemma states.Proof.From r ∈ M axRules(CombRules, trnSet) we have, in particular that r is consistent with trnSet.Hence, the postcondition of r is d = d(trn), i.e. the decision of r is the same as c-rule(tst, trn).
Since r covers tst and trn, t a (r)(trn) and t a (r)(tst) It is enough to show that for any a ∈ A the implication t a (c-rule(tst, trn)) ⇒ t a (r) holds, i.e.V a (c-rule(tst, trn)) ⊆ V a (r), where for any rule r Let us first assume that a ∈ A sym .If t a (r) is of the form a ∈ V a , then the implication obviously holds (trivial condition is implied by any condition, because for any elementary condition a ∈ V for attribute a, If a ∈ A num is numerical then t a (r) is of the form a ∈ I, where I is the description of interval corresponding to the numerical attribute a of rule r.Because t a (r)(trn) and t a (r)(tst) are both satisfied then a(trn) ∈ ||I|| D and a(tst) ∈ ||I|| D .Thus {a(trn), a(tst)} ⊆ ||I|| D .Hence, all points between a(trn) and a(tst) are also in ||I|| D .In consequence, [min a , max a ] ⊆ ||I|| D , where min a = min(a(tst), a(trn)), max a = max(a(tst), a(trn)) what ends the proof of inclusion V a (c-rule(tst, trn)) ⊆ V a (r) (see Definition III.1).
Theorem III.2.The rule c-rule(tst, trn) for the test object tst and the training object trn is consistent with the training set trnSet if and only if there exists a rule r ∈ M axRules(CombRules, trnSet) covering objects tst and trn.
Proof.We start from a proof of the following fact: if c-rule(tst, trn) is consistent with trnSet then it can be extended to a rule from M axRules(CombRules, trnSet).Such a rule can be constructed inductively.From assumption we have that r 0 = c-rule(tst, trn) ∈ CombRules is consistent with trnSet.In the induction step to define each next rule r i , for i = 1, 2, . . ., m, where m = |A|, we assume that r i−1 is consistent with trnSet and conditions t j (r i−1 ) for all j = 1, 2, . . ., i − 1 are maximally general, i.e. if we replace any condition t j with a more general t (i.e.t j ⇒ t) preserving consistency, then t j = t.t i (r i ), in i-th induction step, is defined as the maximal generalisation of t i (r i−1 ) = . . .= t i (r 0 ) = t i (c-rule(tst, trn)) preserving consistency with trnSet.All others conditions and the decision of the rule are not changed, i.e. t j (r i ) = t j (r i−1 ) for j ̸ = i; d(r i ) = d(r i−1 ).Hene, in i-th induction step we simply maximally generalise condition for attribute a i .
If a i ∈ A sym and t i (r i−1 ) is the trivial condition, then we put r i = r i−1 .If t i (r i−1 ) is non-trivial, it is substituted by a ∈ V a if the consistency of the rule is preserved; otherwise, we put r i = r i−1 .
If a i ∈ A num then t i (r i−1 ) is of the form a i ∈ [min, max].Let us denote by rule i (r, t) the result of replacement in r of i-th condition by a condition t.Now, we define a set of values of attribute a by a(Inc) = {a(trn) : trn ∈ Inc}, where Inc = {trn ∈ trnSet : d(trn) ̸ = d(r 0 ) ∧ rule i (r i−1 , a i = * ) covers trn}, i.e.Inc contains objects which may violate the consistency of the rule under the maximal possible extension of the condition t i (r i−1 ).From the inductive assumption, r i−1 is consistent with Inc because Inc ⊆ trnSet.Hence, a(Inc) ∩ [min, max] = ∅.Now we define newmax = min{v ∈ a(Inc) : v > max}.This minimum exists because Inc and also a(Inc) are finite sets.If the set {v ∈ a(Inc) : v > max} is empty we take newmax = u ai (i.e.maximal possible extension of the right end of the interval).Analogously, we define newmin = max{v ∈ a(Inc) : v < min}.If {v ∈ a(Inc) : v < min} is empty we put newmin = l ai (i.e.maximal possible extension of the left end of the interval).Finally, we define t i (r i ) by a ∈ (newmin, newmax).From the definition, r i is consistent with trnSet and is also maximal because other ends of the interval (newmin, newmax) even if extended by one point to a closed interval will cause inconsistency (in case newmin = l ai this end of the interval cannot be extended; analogously in case newmax = u ai ).
It is easy to prove that all other conditions t j (r i ) for j < i remain still maximal.To prove this, let us assume that for some j < i t j (r i ) could be extended to t with preserving consistency, i.e. rule j (r i , t) is consistent.We also have rule j (r i−1 , t) ⇒ rule j (r i , t).Therefore rule j (r i−1 , t) is consistent with trnSet.From the inductive assumption t is identical with t j (r i−1 ).Because t j (r i−1 ) is the same as t j (r i ) for (j < i), then t is the same as t j (r i ).It means that t j (r i ) is maximally general.
Our inductive reasoning leads to a conclusion that the last rule r m is consistent with trnSet and maximally general.

2) Generalisation of lazy rule learning for symbolic attributes:
In the section by D we denote a given pseudometric decision system and trnSet ⊆ X is a given training set.
In the previous Definitions II. 10 and III.1, the trivial condition a ∈ V a for a symbolic attribute a is introduced.This condition represents the specific grouping of all possible values of an attribute and is satisfied by any object.However, a proper subset of V a may be more relevant for the classification.Grouping of values can be obtained by applying a given pseudometric ρ a for a.Here, now we formulate the following generalisation of Definition III.1, related to a grouping of values for symbolic attributes: Definition III.2.Let tst be a test object, trn ∈ trnSet, and min a = min(a(tst), a(trn)), max a = max(a(tst), a(trn)) for a ∈ A num .We also use the following notation: (i) r a = ϱ a (a(tst), a(trn)) for radius, (ii) B(c, R) for closed pseudometric ball of radius R centred at point c defined by the pseudometric ϱ a .Now, we define the generalised local decision rule (for short g-rule) if a∈A t a then d = d(trn), denoted by g-rule tst, trn, {ϱ a } a∈Asym or simply g-rule (tst, trn) (if parameters {ϱ a } a∈Asym are clear from the context or irrelevant due to generality of considerations), where: Now, we prove that an analogous relationship of the set M axRules(GenRules, trnSet) and g-rule (tst, trn) (grule) to the relation between M axRules(SimRules, trnSet) and s-rule(tst, trn) holds.
Proof.The proof is an extension of proof of Lemma III.1.For numerical attributes, the proof is the same as before.For symbolic attributes, it is enough to change the part of the proof of Lemma III.1 by the following one.Let a ∈ A sym .Then t a (r) is of the form a ∈ B(a(tst), R a ), where R a = ϱ a (a(tst), v), for some v ∈ V a .Hence, R a ≥ ϱ a (a(tst), a(trn)) because t a (r)(trn) is satisfied.So, we obtain B (a(tst), r a ) ⊆ B(a(tst), R a ), where r a = ϱ a (a(tst), a(trn)).Hence, we have t a (g-rule (tst, trn)) ⇒ t a (r).
Theorem III.4.Under the assumptions of Lemma III.3, the rule g-rule tst, trn, {ϱ a } a∈Asym is consistent with trnSet if and only if there exists a rule r ∈ M axRules(GenRules, trnSet) such that r covers tst and trn.
Proof.The proof can be obtained by a modification of the proof of Theorem III.2.
It is enough to modify the inductive step of the proof of Theorem III.2 for a i ∈ A sym as follows.If a i ∈ A sym then t i (r i−1 ) is of the form a ∈ B(a(tst), r a ), where r a = ϱ a (a(tst), v), for some v ∈ V a .Now, let us consider possible extensions of a ∈ B(a(tst), r a ) by a ∈ B(a(tst), R a ), where R a = ϱ a (a(tst), w), for some w ∈ V a and R a ≥ r a preserving consistency (with trnSet) of the rule.In the finite set of such possible extensions (due to the fact that V a is finite) we select the one with the maximal value of R a .The selected extension is maximally general.
If a i ∈ A num then we extend the formula as in Theorem III.2.
One can conclude that the last rule r m is consistent with trnSet and maximally general by performing analogous reasoning as in Theorem III.2.
Also, in analogous way as in Theorem III.2 with the use of Lemma III.3, we obtain that the following implication holds: if g-rule (tst, trn) is inconsistent with trnSet, then in M axRules(GenRules, trnSet) there is no rule covering tst and trn.
Let us note that the set M axRules(GenRules, trnSet) is defined for the given values c a for a ∈ A sym (in the testing procedure we assume c a = a(tst)).The idea behind construction of M axRules(SimRules, trnSet) was to compute all maximally general rules in advance for the later use in the classification process.In order to construct M axRules(GenRules, trnSet), this should be done for all possible combinations of all possible values for all symbolic attributes.It would increase the number of generated rules by the factor no more than b k , where b is the maximal cardinality of |V a | for a ∈ A sym and k is the number of symbolic attributes.
From Theorem III.4 it follows that it is sufficient to generate g-rules for all training examples and then check their consistency with trnSet (instead of computing the support sets for rules from M axRules(GenRules, trnSet) covering a new test case).The lazy Rule Induction Algorithm (RIA) realises this idea.

B. Combining instance-based learning and rule methods -RIONA
In this section, we additionally assume that Agr is an aggregation function defined as sum of individual metrics.
Let us recall that RIONA is based on a combination of instance-based learning and rule-based methods.The primary observation used in the development of RIONA concerns the property of the widely used kNN method.kNN has quite good performance, usually for small values of k.Hence, one may expect that that only training examples close to a given test case are important in the process of inducing (inferring) the final decision.The intuition supporting this claim is that the training examples which are far from a given test object are less relevant for classification than the closer ones.Contrary to this, in the case of rule-based methods, in general, all training examples are used in the process of rule generation.Hence, instead of considering all training examples in constructing the support set in the case of rule-based approach, like in the RIA algorithm, one can bound it to a certain neighbourhood of a test example.In the case of RIONA algorithm, the classification of a given test case is based on training objects from a neighbourhood of this example.
Our approach to inducing of decision for a given test case is basing on a combination of instance-based learning and lazy rule learning (see Section III-A).The core idea concerns the strategy for conflict resolution based on Strength measure (see Eqn. 7) slightly modified by bounding it to the neighbourhood of the test case: where most notation is as in Eqn.7; additionally ϱ = Agr({ϱ a } a∈A ) is the aggregated pseudometric and k is the number indicating the size of the neighbourhood, and locSuppSet(r) = supportSet(r) ∩ N (tst, trnSet, k, ϱ).
One can observe that the change from supportSet(r) to locSuppSet(r) causes that only those examples covered by the rules matched by a test object that are in a specified neighbourhood of the test example are considered.The predicted decision based on LocStrength is analogous to the previous one (see Eqn. 8): Let us note that the size k of the neighbourhood is optimised in the learning phase (see [5]) while in the classification process, we assume that number k for the neighbourhood N (tst, k) is set to this optimal value.
The above measures can be calculated for a given M axRules by bounding the support sets of the rules from M axRules covering a test example to the specified neighbourhood of a given test example.Hence, the algorithm based on maximally general rules with LocStrength can be used here.
One can observe that we have the same result also for aggregation function defined by weighted sum of metrics.This is because adding weights for each attribute preserves the above inequality.
From the above considerations it follows that the examples distanced from tst more than the training example trn cannot cause inconsistency of g-rule tst, trn, {ϱ a } a∈Asym .Hence, one can use N (tst, trnSet, k, ϱ) instead of trnSet in line 7 of Algorithm 4.
The description of classification algorithm RIONA is presented in Algorithm 5. Later on we prove that Algorithm 5 computes LocStrength (see Theorem IV.2).Algorithm 5 returns the most common class corresponding to decisions on the training examples covered by the rules satisfied by tst and belonging to the specified neighbourhood.One should note that all pseudometrics in the argument of Algorithm 5 are given (used for computation of the final pseudometric).However, in g-rule only pseudometrics for symbolic attributes are used (see Definition III.2 and note after it).For every decision value, RIONA computes the support set restricted to the neighbourhood N (tst, k) rather than the whole support set of the maximally general rules covering tst (as in the case of RIA algorithm).This is done as follows.For any trn ∈ trnSet from N (tst, k) RIONA constructs the rule g-rule tst, trn, {ϱ a } a∈Asym based on trn and tst.Next, RIONA is testing whether g-rule is consistent with the examples from the neighbourhood N (tst, k).If g-rule is consistent with N (tst, k) then the support set of the decision d(trn) is extended by trn.Finally, RIONA returns the decision value with the support set of the highest cardinality.

IV. RELATIONSHIPS OF RIONA TO OTHER APPROACHES
A specific combination of kNN approach and lazy rule induction allowed us to develop the algorithm RIONA.One can observe that only the line 9 of RIONA, where is examined the consistency of the rule determined by the training and testing example, differs from Algorithm 1 (kNN).
The relationships between RIONA, RIA and kNN for k = 1 are as follows.
Proposition IV.1.Let us assume that 1NN is the nearest neighbour algorithm for k = 1 with a distance defined by pseudometric ϱ = Agr({ϱ a } a∈A ).Then for any test object tst we have Proof.If k ≥ |trnSet| then the neighbourSet = trnSet, where neighbourSet is defined in the RIONA algorithm (see Algorithm 5).Hence, RIONA works exactly as the RIA algorithm (see Algorithm 4).
RIONA behaves like the RIA algorithm for the maximal neighborhood (and from the Corollary III.5 as the algorithm based on the maximally general rules with the Strength strategy for conflict resolution).By taking a neighbourhood based on the one nearest training example, the nearest neighbour algorithm is obtained.RIONA is positioned between the rulebased classifier based on the maximally general rules and the nearest neighbour classifier.If a small neighborhood is chosen then it acts more like a kNN classifier and if a large neighbourhood is chosen it works more like a rule-based classifier based on inducing maximally general rules.Selection of a neighbourhood that is not the maximal may be interpreted as taking more specific rules instead of maximally general rules consistent with the training examples.
Below, we present more properties of RIONA.
Theorem IV.Theorem IV.3.The following equality holds: RION A(tst, trnSet, k, {ϱ a } a∈A ) = new training set trnSet ′ = N (tst, trnSet, k, ϱ) instead of trnSet becomes the fourth algorithm.(iv) One can consider the RIONA algorithm as an algorithm for computing all maximally general, consistent rules locally and using (locally) Strength for conflict resolution.
In Table I and Table II is presented comparison of these algorithms (the third algorithm is omitted because it is very similar to the fourth).

A. RIONA and rules
Some important properties of instance-based classifiers and rule-based classifiers are inherited by the RIONA algorithm.Even though rule-based classifiers produce less accurate classifications, there are several features of them that users prefer over instance-based classifiers.The ability for a human, noncomputer science professional, to interpret rules is one of these crucial features.He or she can check to see if the information found in such rules is non-trivial, accurate, and revealing brand-new features of the considered case.A rule includes an explanation for making the specific decision that is simple enough for a human to comprehend.
Here, we assume that the RIONA algorithm's parameter k is fixed (potentially learnt [5]).Let's now concentrate on algorithm (4) from Sect.III.Because the local complete set of consistent and maximally general decision rules must be computed for each test case tst, the direct computation of M axLocalRules may initially appear to be highly expensive and impractical.However, if we assume that the size of N is k, then the size of the local training sample is much smaller than the size of the entire training sample being reduced from n = |trnSet| to k.As a result, the total cost of computing M axRules (globally or locally) is decreased from O(2 n ) to O(m • 2 k ), where m is the number of test cases.We don't just present this strategy from a theoretical standpoint only.When a classifier's decision needs to be explained, this kind of method might be useful.In this way, the RIONA algorithm shares characteristics with rule algorithms and quick lazy learning algorithms, i.e. its parameters can be converted into rules.
Additionally, algorithm (4) might be extended to construct all rules globally once at the beginning, analogously to algorithm (2) from Corollary IV.4, except that the rules would be based on the local neighborhood only.Such rules would mimic the RIONA algorithm's behavior.The use of such a strategy has several benefits.First, a set of rules could be immediately provided to explain the predicted decision on a particular test object.Second, the usefulness of the knowledge acquired might be tested against all potential rules generated at the beginning.
The approach for construction of these rules is analogous as in algorithm 4 from Corollary IV.4.One could just construct M axRules locally for each training case and use each training example as a test example.It could be seen as a computation of specific local reducts i.e. reducts constructed during generation of maximally general rules for a given object (see e.g.[33], [34], [35], [32]).Usually, in construction of local reducts one should be aware to keep discernibility for objects with various decisions.Only objects with different decisions and at a distance of no more than determined by k would be required to be discernible in this case.

V. CONCLUSION
The presented findings indicate some important relationships of classifiers generated by the RIONA learning algorithm with instance-and rule-based classifiers.For example, it is proved the relative to classification equivalence of the RIONA algorithm to the algorithm generating all consistent and maximally general rules from a training set including the close training cases to a given test case.As a result, the classification by RIONA classifier can be performed by a relatively small set of rules that are simple for a person to comprehend.It might be applied in circumstances where it's crucial to provide an explanation for the decision that was reached by classifier.Finally, it is worthwhile to mention that the RIONA algorithm, based on hybridization of instance-and rule-based techniques, has the following properties (i) it is efficient as well as effective from the point of view of classification, (ii) it can be used as a high quality tool in the process of explanation of the predicted decisions.

)
[b, e], (b, e], [b, e), (b, e), where b, e ∈ R are such that the corresponding interval between points b and e is included in V a .

Lemma III. 1 .
Any rule r ∈ M axRules(CombRules, trnSet) covering the given test object tst and training object trn is implied by the rule c-rule(tst, trn).
of the form a = v (i.e. a ∈ {v}) then v = a(trn) = a(tst).The last equalities hold because we already concluded that t a (r)(trn) and t a (r)(tst) are both satisfied.Hence, we have trn ∈ [[a = v]] D and tst ∈ [[a = v]] D , i.e. a(trn) ∈ {v} and a(tst) ∈ {v}.It means that in the considered case the equality t a (r) = t a (c-rule(tst, trn)) holds (see Definition III.1 and Definition II.10).
[32]rem II.1.[32]3Iftrn∈ trnSet and tst is a test object than the rule s-rule(tst, trn) is consistent with trnSet if and only if M axRules(SimRules, trnSet) contains a rule covering both objects tst and trn.Hence, for any test object tst, decision v ∈ V d and M axRules(SimRules, trnSet) set the value Strength(tst, v) from Eqn. 7 is equal to the number of trn ∈ trnSet having decision d(trn) = v and for which the rule s-rule(tst, trn) is consistent with trnSet.The simple lazy rule induction algorithm for symbolic attributes (LAZY) presented in Algorithm 3 realises this idea.isCons(r,verif ySet) in Algorithm 3 verifies if r is consistent with a verif ySet.For a given object tst and any trn ∈ trnSet, the rule s-rule(tst, trn) is constructed by Algorithm 3: LAZY(tst, trnSet) Input: test example tst, training set trnSet for all decision v ∈ V d do Algorithm 3. Next, Algorithm 3 is testing the consistency of the rule s-rule(tst, trn) with the set trnSet \ {trn}, i.e. if all the training examples matching the left-hand side of s-rule(tst, trn) have identical decision with trn.If the result of testing is positive than trn is added to the support set with the relevant decision.Finally, Algorithm 3 predicts the decision with the support set of the highest cardinality.From Theorem II.1 we obtain: LAZY (tst, trnSet) = decision M axRules (tst), where trnSet is a training set, tst is a test object, and decision M axRules (tst) is the classifier from Eqn. 8 with M axRules = M axRules(SimRules, trnSet).
11 end Algorithm 2: isCons(r, verif ySet) Input: a rule r : if α then d = v, set of examples verif ySet Output: true if rule r is consistent with verif ySet, false otherwise for all trn ∈ verif ySet do if d(trn) ̸ = v and trn satisfies α then 498 PROCEEDINGS OF THE FEDCSIS.WARSAW, POLAND, 2023 examples trn ∈ N (tst, k) should be considered, (ii) it is not necessary to consider all the examples from the training set to check the consistency of the g-rule tst, trn, {ϱ a } a∈Asym (see line 7 of Algorithm 4).This follows from the next proposition.Suppose that ϱ a (for a ∈ A num ) in a given pseudometric decision system are defined as normalised Eucliean metric, ϱ = Agr({ϱ a } a∈A ) and Agr is defined either by sum of metrics or weighted sum of metrics.If trn ′ ∈ trnSet satisfies g-rule tst, trn, {ϱ a } a∈Asym , then ϱ(tst, trn ′ ) ≤ ϱ(tst, trn).Proof.If trn ′ satisfies g-rule tst, trn, {ϱ a } a∈Asym , then we have (see Definition III.2 of g-rule):

Table II A
COMPARISON SCHEME OF THREE ALGORITHMS FROM COROLLARY IV.4: ALGORITHM (1) RIONA, ALGORITHM (2) BASED ON THE MEASURE LocStrength AND ALGORITHM (4) BASED ON THE MEASURE Strength COUNTED LOCALLY.