Improving Re-rankCCP with Rules Quality Measures

Recommender Systems are software tools and techniques which aim at suggesting new items that may possibly be of interest to a user. Context-Aware Recommender Systems exploit contextual information to provide more adequate recommendations. In this paper we described a modification of an existing contextual post-filtering algorithm which uses rules-like user representation called Contextual Conditional Preferences. We extended the algorithm by taking into account rules quality measures while recommending items to a user. We proved that this modification increases the quality of recommendations, measured with precision, recall and nDCG, and has no impact on the execution time of the original algorithm.


I. INTRODUCTION
R ECOMMENDER Systems (RS) were created as a response to the information overload problem, which we suffer from nowadays. These software tools and techniques aim at suggesting new items that may possibly be of interest to a user [1]. An item could be a movie (Netflix), a song (Pandora), a job (LinkedIn) or a friend (Facebook). In everyday life we interact with RS when we search for information using Google or when we buy something through the Internet.
Context-aware RS (CARS) are a particular category of RS which exploit contextual information to provide more adequate recommendations [2]. For example, a movie recommendation for a Saturday evening with your friends should be different from one suggested for a Sunday afternoon with your family. It has been proven that adding contextual information in the process of recommendation can highly increase prediction accuracy and user satisfaction [3]. Adomavicius and Tuzhilin [2] distinguish tree main types of context-aware recommender systems, i.e. contextual pre-filtering, contextual post-filtering and contextual modeling. The paradigms differ in the way they incorporate context in the recommendation process. More details are given in Section II.
Karpus et al. [4] proposed a context-aware re-ranking algorithm (re-rankCCP) which utilizes user model called Contextual Conditional Preferences (CCPs). CCPs are special kind of rules which are learned from past user ratings and used to reorder items in a primary recommendation list. This method seems promising in making user explanations for recommendations due to the use of rules that are easy to understand by a human. However, this solution has a big disadvantage. While using CCP, an algorithm only checks its relevance to a current user context, not taking into consideration the quality of an induced preference. Thus, better CCPs can be omitted during reordering what would lead to a reduction in the recommendation accuracy and user satisfaction.
In this paper we propose a method for determining the CCP quality using rules quality measures, i.e. coverage, support and confidence and apply it in the modification of the re-rankCCP. We proved that this modification increases the quality of recommendations, measured with precision, recall and nDCG, and has no impact on the execution time of the re-rankCCP. The main contribution of this paper can be summarized as follows: -We propose a way to measure quality of CCP with usage of rules quality measures. -We improve re-rankCCP algorithm to take into account quality of CCPs while generating recommendations. -We compare effectiveness of modified re-rankCCP on 2 baseline algorithms, 3 rules quality measures and 4 aggregate functions and show that there is a best configuration for the dataset used. The rest of the paper is organized as follows. Related work and basic re-rankCCP are described in Sections II and III, respectively. Section IV provides technical details of the proposed modification and is followed by a description of the dataset used. Section VI introduces evaluation method while obtained results are presented in Section VII. Conclusions close the paper.

II. RELATED WORK
Adomavicius and Tuzhilin [2] distinguish three main types of CARS, i.e. contextual pre-filtering, contextual post-filtering and contextual modeling. The paradigms differ in the way they incorporate context in the recommendation process.
In contextual pre-filtering, we first do selection of ratings by taking only relevant context into account. Thus, we filter an initial set of ratings and return the contextualized data. After this preparation any known two-dimensional recommendation algorithm could be used to predict user preferences. Baltrunas et al. [5] introduced micro profiles which split a user profile into partitions depending on the values of context parameters. They showed that usage of such micro profiles gave a significant improvement in the prediction accuracy in the movie domain while considering time as a context variable. Pre-filtering approach which utilizes ontological user profiles was proposed by Karpus et al. [6]. Each user profile consists of many ontologies representing user preferences in  [7], [8] tried to find a correlation between ratings and context in which they were given. They proposed a new context representation based on the Pearson Correlation Coefficient as well as a new prefiltering technique based on this representation.
Contextual post-filtering applies context after traditional recommendation process. It means that from a predicted set of recommendations we select just those that match current user context. Bahramian et al. [9] proposed a new context-aware tourism recommender system based on an ontology approach where a spreading activation technique is used to contextualize user preferences and learns the user profile dynamically. Negre et al. [10] introduced a context-aware recommender system based on a contextual post-filtering for OLAP queries, where queries recommended by a classic log-based recommender system were contextualized.
Contextual modeling differs radically from previously described paradigms. In this kind of recommenders we incorporate a context in a prediction model. The recommendations are achieved directly from the model, taking into account current user-context situation. Iqbal et al. [11] introduced Kernel Context Recommender System, which is a flexible, fast, and accurate kernel mapping framework that recognizes the importance of context and incorporates the contextual information using kernel trick while making predictions. Zheng et al. [12] proposed method that combines context-aware and multicriteria recommender systems. They evaluated their solution on an educational data and an extended TripAdvisor dataset. Authors tested different approaches for incorporating context in the recommendation process.
In the recent years, an application of artificial neural networks in CARS is getting more and more attention [13], [14], [15]. Hildebrandt et al. [15] proposed NECTR, a novel recommender system based on a tensor factorization model and an autoencoder-like neural network. A Deep Learning based model which learns customer similarity from the sequence to sequence similarity as well as item to item similarity by considering all features of the item, contexts, and rating components was introduced by Kala et al. [14]. The method uses Dynamic Temporal Warping distance measure for dynamic temporal matching and 2D-GRU (Two Dimensional-Gated Recurrent Unit) architecture.

III. BACKGROUND -RE-RANKCCP ALGORITHM
Contextual Conditional Preferences (CCPs) were introduced to provide compact and context-aware representation of user interests for RS [16], [4]. CCP is an expression of the form: with γ i being contextual variables, α i item attributes, and c 1 , ..., c n , a 1 , a 2 1 , ..., a m , a 2 m being exact values of these parameters. Symbol { denotes a preference relation, e.g. x { y means that someone prefers x over y.
The above CCP is read as given the context (γ 1 = c 1 ) ' . . . ' (γ n = c n ) I prefer a 1 over a 2 1 for α 1 and . . . and a m over a 2 m for α m . An example of the CCP is shown below.
It means that for a given context, i.e. in the afternoon and the company of children, a user prefers movies that belong to the genre "animated" or "family" to those with category "thriller". CCPs can be learned from explicit user ratings [17]. In order to elicit preference relations the dataset containing ratings, contextual parameters and item features is split into two parts, i.e. positive and negative, based on the value of the ratings. Then, both subsets are divided into smaller sets containing all of the contextual information and one of the item features. Such prepared data are an input for the Prism [18] algorithm. Final CCPs are obtained by merging rules with the same context.
An algorithm for generating a list of top k recommendations with CCPs, the re-rankCCP, was introduced by Karpus et al. in [4]. We describe it below.
For a certain user and his current context, first we generate a primary list of top m recommendations with some existing non-context-aware algorithm, e.g., UserKNN. The value of m has to be significantly greater than k, where k is the number of the recommendations in the final list. Then we have to find the best CCPs that will be further used in the reshuffling process.
The best CCPs are those which are most similar to the considered context. In order to count a contextual similarity between a CCP p and a current user context ctx(u) we used the following measure: We also used the overlap function defined as: The overlap function returns 1 when we are sure that the pair (γ i , c i ) is contained both in the contextual part of p and in the current user context ctx(u). When it is uncertain, i.e. when the value c i for the dimension γ i is equal to 21 (the unknown value), it returns 0.5. Otherwise 0 is returned. Note that the current user context ctx(u) is also a set of pairs (γ 2 i , c 2 i ), i.e. the name of the contextual variable and its value.
For each item in the primary recommendations list and each best CCP we have to compute how much an item i satisfies a CCP p. For this purpose, we have to use the satisfiability measure: where sim denotes Jaccard similarity, α is the name of an item feature, a(p) is the set of item attributes considered in 68 PROCEEDINGS OF THE FEDCSIS. SOFIA, BULGARIA, 2022 the CCP p, v α (i) is the set of values of an attribute α for an item i. Similarly v m α (p) and v l α (p) denote the sets of values of an attribute α for a CCP p on both sides of the preference relation -m stands for more preferred and l for less preferred.
The satisfiability measure represents the difference between item similarities to the both sides of the CCP preference relation, i.e. the similarity to the most preferred part minus the similarity to the less preferred part. In this way we reward items that fit best to user preferences and penalize items that have features that user does not like. The size of a set of item attributes serves as a normalization factor. Thus, regardless of the number of item features, the value of satisfiability is always between 0 and 1.
The next step is to order the primary recommendations list according to the value of average satisfiability of the best CCPs. The last part is to cut off unneeded items from resulting recommendations list to receive top 5, top 10 or other top k ranking.

IV. ALGORITHM MODIFICATION
One of the key parts of the re-rankCCP algorithm is a selection of best CCPs for current user context based on the similarity measure from Equation 1. However, this measure does not take into account the quality of CCPs. Consequently, recommendation could be made based on less important user preferences. Therefore, we replaced the similarity with a weighted similarity sim w : where sim is the similarity measure from Equation 1 and q(p) is a quality of a CCP p. Now, we need to define the quality of a CCP.
CCPs can be generated from rules induced with Prism algorithm. Thus, we decided to use rules quality measures, like coverage or support, to define quality of CCP. However, one CCP is created using many different rules. Therefore, we have to decide how to reasonably aggregate many rules quality values into one value characterizing CCP's quality.
In this paper we tested four aggregate functions, which we found the most reasonable, i.e. minimum, maximum, sum and average. The last one seems the most obvious because it simply takes quality values from all used rules and returns one normalized value for a CCP. We also obtain standardized quality for the first two functions which simply take the worst and the best rule quality value, respectively. The sum function additionally reflects the quantity of rules that are used for creation of a CCP. The more good rules were used, the higher the quality of a CCP would be.
We decided to apply three commonly used rules quality measures, namely: coverage, support and confidence [19]. For this purpose, we had to slightly modify an algorithm for CCPs extraction to compute those measures. However, this algorithm is independent from the re-rankCCP. We also extended a CCP representation to contain information about its quality. Figure 1 shows modified CCP from the above example in the JSON format. The rest of the re-rankCCP remains the same.

V. DATASET
We performed experiments on the same dataset as authors of the original re-ranking algorithm, i.e. LDOS-CoMoDa dataset. The LDOS-CoMoDa dataset [20] was collected by a web application that enables contextual rating of a movie just after watching it. The dataset consists of 2296 ratings given by 121 users to 1232 items. It contains 30 variables among which 12 are contextual parameters. Other variables are basic information about user (user id, age, sex, city and country), a rating in a 5-star scale (higher values denote higher preference) and content information about multiple item dimensions (item id, director, country, language, year, 3 main genres, 3 main actors and budget). Unknown values are denoted by "21".
We chose users who rated at least 5 items. Then, we randomly selected 20% of items rated by each of these users to be included in the test set. The remaining data constitute the training set.

VI. EVALUATION METHOD
We re-implemented in Python the re-rankCCP algorithm, which was originally implemented in Java. We also performed new training and test sets split. Thus, because of the randomness of the split, we have different data in those sets than in the previous papers [4], [16].
The re-rankCCP is a post-filtering technique which means that it needs other algorithm to work. We decided to test two known methods, i.e. Bayesian Personalized Ranking (BPR) [21] and User k Nearest Neighbors (UserKNN) [22]. We had several reasons for this choice. First of all, inventors of re-rankCCP obtained the most promising results with BPR algorithm. Second of all, UserKNN was one of the most (or even the most) popular method in the field of RS. Last but not least is the way how these algorithms treat missing data. BPR tries to minimize its negative impact on the prediction accuracy, while UserKNN completely ignores missing data. Hereby, we obtained a representative sample of base algorithms.
For the re-rankCCP and base algorithms we had to set up some parameters. UserKNN used 50 neighbors to compute recommendations. Base algorithms generate lists of top 100 items while re-rankCCP produces the top 10 list. Rating greater than 3 is considered positive. We decided to choose three commonly used measures of recommendations quality, i.e. precision, recall and nDCG [23].
In addition to the impact of the modification on the quality of recommendations, we wanted to check its impact on the algorithm execution time. Since we slightly modified CCP representation, our method does not affect the time of generating recommendations. However, it could have impact on the time needed to induce CCPs, since it is where the CCP quality is computed. In order to check it we performed an experiment for which we prepared 4 datasets from the LDOS-CoMoDa. The datasets consists of 3000, 5000, 7000 and 10000 rows respectively. For each dataset we performed CCPs generation for the re-rankCCP algoritm and its modifications. We collected results with %time function which is available in ipython environment.
VII. RESULTS Table I shows values of precision, recall and nDCG obtained for different configurations of algorithms, rules quality measures and aggregate functions during our experiments. The best results for each algorithm and rules quality measure is marked with bold (locally best result), while the best results for each algorithm/base algorithm is marked with underline (globally best result considering division into two groups: BPR and UserKNN). It should be noticed that re-rankCCP always improves precision, recall and nDCG of its base algorithm. Nonetheless, re-rankCCP performs weaker than its modifications with rules quality measures.
For most of configurations of algorithms and rules quality measures, the best results were obtained by minimum and average functions. The first function was the best for support and confidence, irrespective of a base algorithm, while the latter works well with coverage and support on re-rankCCP with BPR algorithm. An exception appears in re-rankCCP with UserKNN algorithm and coverage measure. The best results for this configuration was obtained by the maximum function. The best results for minimum and average functions should not be surprising. While using minimum function, we assure that all other rules used to induce a CCP have greater quality values than the resulting value. We obtain similar effect for the average. On the contrary, for the maximum we could choose preference which is generated from rules from which one is strong and all others are weak. The same bad effect can happen for sum function. Modified re-rankCCP will prefer a CCP from many weak rules than a CCP from two strong rules. The smallest improvement in modified re-rankCCP was obtained with the coverage for both base algorithms. It can be justified by the fact that the coverage is not a proper quality measure for rules since it considers only antecedent of a rule.
To check the statistical significance of obtained results we performed Wilcoxon signed rank test with α = 0.05. For the re-rankCCP with BPR as a baseline algorithm two results were statistically insignificant, i.e. for coverage-minimum and confidence-maximum pairs with p-value equals to 0.4375 and 0.5282, respectively. The re-rankCCP with UserKNN as a baseline and support measure has almost all results insignificant, i.e. for maximum, average and sum functions with pvalues equal to 0.5745, 0.0625 and 0.1563 respectively. All other reported results are statistically significant.
Considering results presented in Table I and their statistical significance, we can conclude that objectively the best improvement to the re-rankCCP is obtained using the support measure for rules quality and the minimum function for an aggregation. It should be noticed that UserKNN performs pretty weak on LDOS-CoMoDa dataset. This could be because of the data sparsity. Table II shows times of generating CCPs for the re-rankCCP algorithm and its modification with the coverage. We obtained very similar results for support and confidence measures which is why we omitted it here. The differences in execution times are negligible. Thus, we can conclude that our modification does not increase execution time of re-rankCCP, and improves the quality of recommendations.
VIII. CONCLUSIONS In this paper we proposed a way for measuring the CCP quality using rules quality measures and aggregate functions. To the best of our knowledge, this is the first attempt to compute a CCP quality. We also improved re-rankCCP algorithm by incorporating quality of CCPs into recommendation process and proved that this modification outperforms the re-rankCCP as well as both baseline algorithms, i.e. BPR and UserKNN. We compared its effectiveness on two baseline algorithms, three rules quality measures, i.e. coverage, support and confidence, and four aggregate functions, i.e. minimum, maximum, sum and average. Our experiments showed that the support measure aggregated with the minimum function is the best configuration for computing the CCP quality on LDOS-CoMoDa dataset. However, more experiments on other datasets and with more baseline algorithms are needed to check if these results could be generalized.