Utilizing Frequent Pattern Mining for Solving Cold-Start Problem in Recommender Systems

Although several approaches have been proposed throughout the last decade to build recommender systems (RS), most of them suffer from the cold-start problem. This problem occurs when a new item hits the system or a new user signs up. It is generally recognized that the ability to handle cold users and items is one of the key success factors of any new recommender algorithm. This paper introduces a frequent pattern mining framework for recommender systems (FPRS) - a novel approach to address this challenging task. FPRS is a hybrid RS that incorporates collaborative and content-based recommendation algorithms and employs a frequent pattern (FP) growth algorithm. The article proposes several strategies to combine the generated frequent itemsets with content-based methods to mitigate the cold-start problem for both new users and new items. The performed empirical evaluation confirmed its usefulness. Furthermore, the developed solution can be easily combined with any other approach to build a recommender system and can be further extended to make up a complete and standalone RS.


I. INTRODUCTION
O VER the past few decades, alongside the explosion in the amount of data on the internet, the popularity of online streaming services, e-commerce, and social media has highlighted an important challenge to provide users with recommendations that match their preferences and interests. Therefore, the demand for finding more efficient techniques to generate recommendations has received more attention. Over the years, researchers have suggested various approaches for building recommender systems that leverage the rating history and possibly some other information, such as users' demographics and items' characteristics. The majority of these approaches can be classified into three main categories: (i) collaborative filtering (CF), (ii) content-based (CB) filtering, and (iii) hybrid filtering.
The basic idea behind collaborative filtering is that users with similar tastes or preferences tend to behave similarly in the future. This technique relies on historical transactions to compute similarities among users from which the recommendations are eventually generated. An analogous approach can be applied to create recommendations based on item similarities. On the other hand, content-based filtering tries to utilize items' characteristics, users' demographics, and contextual information to recommend additional items similar to those preferred by the target user in the past. Finally, hybrid techniques cover the weaknesses and exploit the strengths of CF and CB models by combining them to provide more relevant results.
The aforementioned techniques are highly appreciated by practitioners and businesses. However, they also encounter significant difficulties in terms of data characteristics. One of the issues is related to the sparsity of data. The discussed methods rely on modeling the user-item interactions, and hence, the quality of such may be impacted by an insufficient number of movies rated by each user. Another challenge is related to the so-called cold-start problem. This phenomenon is particularly inconvenient and occurs whenever recommendations are generated for a new item or user that does not have any interaction or rating in the history. In fact, many state-ofthe-art recommendation algorithms may generate unreliable recommendations for such cases since they cannot learn the preference embedding of these new users/items [1], [2], [3].
In this study, we presented a particular take on the challenge of devising more effective and efficient recommendation techniques. We put special attention to properly handling the new users and items. We propose several methods to overcome the cold-start problem and the sparsity nature of the datasets by utilizing the FP-growth algorithm to generate frequent patterns based on items' characteristics and users' demographics. The main contributions of this paper are as follows: 1) We introduce frequent pattern mining framework for recommender systems (FPRS) -a novel hybrid recommender system that utilizes the FP-growth algorithm to produce frequent itemsets based on the ratings in the user-item matrix. The remainder of this paper is organized as follows. Section II describes and reviews important research efforts that addressing the cold-start problem in the domain of Recommender systems. In Section III, we provide background information for collaborative filtering and frequent pattern mining. In Section IV, we present a novel frequent pattern mining model (FPRS) that utilizes the ratings in user-item rating matrix to discover the frequent itemsets associated with selected users/items features. Section V evaluates and compares the proposed model with a baseline recommender system. Finally, in Section VI, we draw conclusions and suggest possible future work.

II. RELATED WORKS
Recommender systems (RS) predict the utility of an item to a user and suggest the best items concerning the user's preferences, where the items may represent movies, books, restaurants, or any other things [4], [5]. The aforementioned capability of RSs makes those techniques especially useful, and indeed, there are many areas of their successful applications like eCommerce, online marketing, or social networks [6], [7]. The scientific literature provides several taxonomies for RS [8]. However, the most common approaches refer to content-based or collaboration-based techniques and their various hybridizations [9]. Collaborative Filtering (CF) is one of the most widely used and successful techniques, with excellent results in a wide range of applications in many fields [8], hence is particularly interesting in our research and further reviewed in detail in Section III-A. Despite the noticeable decline in their popularity in favor of collaborative systems, content-based techniques are still widely used because of handling the so-called cold-start problem [10]. Because of the significantly different characteristics of those approaches, it is advisable to construct hybridizations of both [11], as further discussed in our study.
A typical RS consists of the three main elements: a user model (established by analyzing the users' interests and preferences), an item model (based on its characteristics), and the recommendation algorithm that is a key constituent. There are many reported approaches to implementing the recommendation algorithm by the specific adoption of machine learning (ML) models like matrix factorization, deep neural networks, or factorization machines (FM) [12], [13], [14]. Building RS on top of the state-of-the-art ML models leveraged the quality of recommendation results, improving user satisfaction and profits in e-commerce [15], [6], [16]. At the same time, however, we may observe the known problems with ML related to the data sparsity, the latency of prediction returned by complex models, and foremost, the unfairness of recommendations for new users or items that is often referred to as the cold-start problem [17], [18].
Solving scalability issues is one of the most common tasks when deploying big-scale recommender systems. Especially as the number of users and items significantly grows over time, it is essential for RSs to handle requests without appreciable latency. This problem is particularly challenging for memorybased methods like k-nearest neighbors. However, in the case of web-scale recommendation tasks like social media, the Internet of Things (IoT), or various e-commerce applications, it is a hot topic also for model-based techniques, especially considering more complex and deep models [1], [19]. Another aspect that is particularly noticeable for collaborative filtering is related to the sparsity of user-item interactions [20]. Here, the quality of CF-based methods may be impacted by an insufficient number of items rated by each user [21]. Some recommender systems suffer from their over-specialization (sometimes referred to as a serendipity problem). It is observed when the RS produces recommendations with minimal novelty, i.e., all of the same kind [22]. Recently, there is also an increasing interest in privacy awareness when handling user data and explainability of recommendations [23], [24].
Regardless of recent achievements in RS, the cold-start problem is still one of the most prevailing topics deserving further attention and is particularly interesting in the context of our study [3], [21]. The difficulty arises due to the deficient information about new entities. Therefore it has a particularly strong negative impact on collaborative methods, heavily impacting the fairness of recommendations for new users, often passing over new items [18]. Most of the attempts to deal with such a problem consider enhancing the collaborativebased methods with content-based approaches that leverage the intrinsic characteristics of the analyzed entities. For example, in [2], the authors propose hybrid recommender models that use content-based filtering and latent Dirichlet allocation (LDA)-based models. Whereas in [9], we may find a hybrid RS that combines the singular-value decomposition-based collaborative filtering with content-based and fuzzy expert systems.
There are many more techniques to dealing with the cold-start problem by combining collaborative filtering with a content-based methods, including using simultaneous coclustering [25], self-organizing maps, or Siamese neural networks [3]. There are also some attempts to combine RSs with various dimensionality reduction techniques [26]. Considering the discussed problem of missing or insufficient information, it seems interesting to refer to the dimensionality reduction methods based on the granularization of the attribute space [27], and particularly on resilient techniques [28], [29] i.e., resistant to data deficiencies. The hybridization of soft computing techniques with collaborative and content-based methods is a wide-ranging field of research and an interesting area for the further development of recommendation systems [30], particularly interesting for context-aware RSs [31], [4].
Some approaches to dealing with cold-start refer to popularity measures, e.g., on the recent trend in users' preferences or always returning the most popular items [14], [10]. However, these may be very misleading and result in so-called popularity bias since users often differ in their preferences, which may also vary between types of products and their characteristics [18]. Hence, an additional effort to deal with biases in data is required [32]. Another interesting approach to dealing with insufficient or missing historical transactions avail additional sources of information to enhance the data representation. In particular, in [21], the authors train RSs with the Linked Open Data model based on DBpedia to find enough information about new entities. When dealing with the cold-start problem, some researchers rely on directly inquiring the users about their preferences. Such information may be collected, e.g., via survey or by asking users to select the most relevant picture related to the desired item [33]. Combining community-based knowledge with association rule mining to alleviate the coldstart problem is also bringing very promising results [31]. Referring to association rule mining (cf. [34]) and frequent pattern mining (cf. [35]) techniques to address the cold-start problem is interesting also from the perspective of speeding up the recommender systems. For this reason, frequent patterns mining is particularly interesting in our research, and we review this field in detail in Section III-B.

III. PRELIMINARIES
In this section, we briefly summarize the academic knowledge of collaborative filtering and frequent pattern mining techniques. Then, we review some of the research literature related to addressing the cold-start problem.

A. Collaborative Filtering
The basic idea behind collaborative filtering (CF) is that the users who have similar preferences in the past tend to behave similarly in the future. Basically, CF-based methods rely only on users rating history to generate recommendations, meaning that the more ratings the users provide, the more accurate the recommendation become [4]. Usually, the historical ratings or preferences can be acquired explicitly or implicitly. So, the CF-based methods are often distinguished by whether they operate over explicit ratings, where the user explicitly rate particular items, or implicit ratings, where the ratings are inferred from observable user activity, such as products bought, songs heard, visited pages, or any other types of information access patterns [4]. In the literature, collaborative filtering methods can be classified into two main categories: (i) memory-based techniques, and (ii) model-based techniques.
The memory-based technique uses directly the rating history, which is stored in memory, to predict the rating of items that the user has not seen before. However, the memory-based techniques can be grouped into two different classes: (i) userbased collaborative filtering, and (ii) item-based collaborative filtering. The user-based collaborative filtering, also known as k-NN collaborative filtering, works by finding the other users (neighbors) whose historical rating behavior is similar to that of the target user and then using their top-rated products to predict what the target user will like [36]. To mathematically formulate the problem, let us assume there is a list of users U = {u 1 , u 2 , ..., u m } and a list of items I = {i i , i 2 , ..., i n }. Then, the user item rating matrix consists of a set of ratings v i,j corresponding to the rating for user i on item j. If I i is the set of items on which user i has rated in the past, then we can define the average rating for user i as follows [36]: In user-based collaborative filtering, we estimate the rating of item j that has not yet rated by the target user a as follows [36] [37]: where k is the number of most similar users (nearest neighbors) to a. The weights s(a, i) can reflect the degree of similarity between each neighbor i and the target user a. On the other hand, item-based collaborative filtering is just an analogous procedure to the previous method. The similarity scores can also be used to generate predictions using a weighted average, similar to the procedure used in userbased collaborative filtering. Mathematically, we can predict the rating of item j that has not yet been rated by the target user a as follows [36] [37]: where k is the number of most similar items (nearest neighbors) to j that the target user a has rated in the past. However, the most popular metrics used to calculate the similarity between different users, or items, are cosine similarity and Pearson correlation. Finally, the recommendations are generated by selecting the candidate items with the highest predictions.
On the other hand, the model-based technique works by learning a predictive model using the rating history. Basically, it is based on matrix factorization which uses the rating history to learn the latent preferences of users and items. Matrix factorization is an unsupervised learning method that is used for dimensionality reduction. One of the most popular techniques applied for dimensionality reduction is Singular Value Decomposition (SVD). Mathematically, let us assume M is the user item rating matrix. The SVD of M is the factorization of M into three constituent matrices such that [37]: where U is an orthogonal matrix representing left singular vectors of M . V is an orthogonal matrix representing right singular vectors of M . Σ is a diagonal matrix whose values σ i are the singular values of M [37].

B. Frequent Pattern Mining
The basic idea of frequent pattern mining, also known as association rule mining, is to search for all relationships between elements in a given massive dataset. It helps us to discover the associations among items using every distinct transaction in large databases. The key difference between association rules mining and collaborative filtering is that in association rules mining we aim to find global or shared preferences across all users rather than finding an individual's preference like in collaborative filtering-based techniques [38] [39] [40].
At a basic level, association rule mining analyzes the dataset searching for frequent patterns (itemsets) using machine learning models. To define the previous problem mathematically, let I = {i 1 , i 2 , ..., i m } be an itemset and let D be a set of transactions where each transaction T is a nonempty itemset such that T ¦ I. An association rule is an implication of the form A ó B, where A ¢ I, B ¢ I, A = ', B = ', A+B = '. In the rule A ó B, A is called the antecedent and B is called the consequent. Various metrics are used to identify the most important itemset and calculate throe strength, such as support, confidence, and lift. Support metric [40] is the measure that gives an idea of how frequent an itemset is in all transactions. In other words, the support metric represents the number of transactions that contain the itemset. The Equation 5 shows how we calculate the support for an association rule.
On the other hand, the confidence [40] indicates how often the rule is true. It defines the percentage of transactions containing the antecedent A that also contain the consequent B. It can be taken as the conditional probability as shown in Equation 6.
Finally, the lift is a correlation measure used to discover and exclude the weak rules that have high confidence. The  Equation 7 shows that the lift measure is calculated by dividing the confidence by the unconditional probability of the consequent [40] [38].   [42]. In this paper, we employ FP-growth algorithm to generate frequent itemsets. What makes FP-growth better than other algorithms is the fact that FP-growth algorithm relies on FP-tree (frequent pattern tree) data structure to store all data concisely and compactly which greatly helps to avoid the candidate generation step. Moreover, once the FP-tree is constructed, we can directly use a recursive divide-and-conquer approach to efficiently mine the frequent itemsets without any need to scan the database over and over again like in other algorithms [42].

IV. FREQUENT PATTERN MINING FRAMEWORK FOR RECOMMENDER SYSTEMS (FPRS)
The main problem we address in this paper is to alleviate the impact of new users and new items cold-start in recommender systems based on collaborative filtering techniques. In theory, collaborative filtering methods can be grouped into two general categories (i) memory-based techniques and (ii) model-based techniques. In memory-based techniques, we calculate the similarities between users/items based on the rating history and then generate recommendations based on the most similar users/items. In model-based techniques, we rely, e.g., on matrix factorization methods to learn the latent factors of users and items and then decompose the user-item interaction (rating) matrix into the product of two lower dimensionality matrices. Collaborative filtering methods are strictly relying on user ratings or user interactions. For that reason, these methods suffer from the cold-start problem whenever a new user joins the system or when a new item is added. In practice, both situations often lead to the inability to provide accurate or meaningful recommendations.
To tackle the cold-start problem, we implement the Frequent Pattern mining framework for Recommender Systems (FPRS). This framework extends the popularity-based approach by employing frequent pattern mining techniques to learn the user preferences depending on users' and items' characteristics. Fig 1 shows the high-level design which is used to develop the FPRS framework. The process of generating the recommendations consists of four stages: (i) Data Input, (ii) Data Preparation, (iii) Frequent Pattern Mining, and (iv) Recommendation Generation. In the first stage, we enrich the user-item rating matrix by users' demographics and items' characteristics. The data preparation stage consists of three steps. In the first one, we store only the favorable reviews by filtering out every review/rating below a determined threshold. In the second step, we perform attributes analysis and check the validity of using them for generating the recommendation. In the last step, we split the dataset for each selected attribute. It is important to note that we follow multiple strategies to perform the attribute selection. More details about these strategies will be provided later in this section. Then, in the third stage, we generate frequent itemsets using FP-Growth algorithm. Finally, we produce the recommendations in the last stage for user cold-start and item cold-start. However, the FPRS framework consists of two main modules: (i) user coldstart module, and (ii) item cold-start module. Each of these modules has dedicated strategies that are used to select the features and produce the recommendations. In this module, we focus on generating recommendations for new items which are recently added to the system and most likely do not have, or have very few, ratings in the past. We follow multiple strategies to generate such recommendations. More details about the strategies followed in the item cold-start module are provided in Strategy 3, Strategy 4, and Strategy 5.
Finally, it is worth mentioning that the threshold values used in the above strategies are selected carefully by objectively searching for a good set of values that achieves the best performance on a given dataset. More details on how we choose these values are provided in Section V. In this section, we conduct comprehensive experiments to evaluate the performance of the FPRS recommender system.

A. Dataset and Evaluation Measures
In our experiments, we used two datasets (MovieLens 100K and MovieLens 1M) 1 which were collected by the GroupLens research project at the University of Minnesota. MovieLens 100K contains 100,000 ratings given by 943 users on 1682 movies on a scale from 1 to 5. While MovieLens 1M contains 1,000,000 ratings of approximately 3,900 movies made by 6,040 users on a scale from 1 to 5. In both datasets, we combine three files (users.data, items.data, ratings.data) in order to join users' demographics, items' characteristics, and ratings in one dataset. The final/joined dataset contains userId, itemId, rating, gender, age, occupation, and genres attributes (cf . Table I). Moreover, we performed further analysis of the features we used in our experiments (gender, genre) to understand the interrelation between these features and obtained results. Figures 2 show the most popular movie genres among males and females for both datasets (MovieLens 100K and MovieLens 1M).  After clearing the data from invalid records, we split it into training and testing datasets according to FPRS module as follows: • For the user cold-start module testing, the 20 users with the highest number of ratings on the 50 most popular movies were selected. The ratings given by all those 20 users (9052 records in MovieLens 100K and 25533 records in MovieLens 1M) are considered a testing set, keeping the rest of the records in the training set. This way, all the record related to the selected users were removed from the test set, which simulates the cold-start problem associated with new users. • To properly evaluate the item cold-start module, the testing data is chosen similarly. We firstly find the 50 most active users. Then, we select the 20 most rated movies by those 50 users. The ratings of all those 20 movies by all the users in our data (7320 records in MovieLens 100K and 41105 records in MovieLens 1M) are considered as a testing set, keeping the rest of the records in the training set. Note that all the ratings for the selected movies are removed from the training data set, which corresponds to the item cold-start. In our study, we consider a binary decision task whether a given item (i.e., movie) is appropriate for the user. To correctly model this situation for the MovieLens data, we assume that films rated by users 4 or 5 are preferred by them (belong to the positive class). In contrast, those ranked lower are poorly matched to the users. Therefore, the FPRS recommender system feedback for each new user or item is binary information: recommend or not recommend. Following that, in order to assess the quality of the prediction, the F1 measure is used [43].
where precision quantifies the number of correct positive recommendations made (see Equation 9). While recall quantifies the number of correct positive recommendations made out of all positive predictions that could have been made (see Equation 10).
P recision = T P T P + F P (9) Moreover, we use the accuracy metric to measure all the correctly identified cases. This measure is mostly used when all the classes are equally important.

B. Baseline Recommender System
To showcase the strengths of frequent pattern mining in RS, we build a baseline model in a similar way to the strategies described previously, but the FP-growth algorithm was omitted. In order to evaluate both modules of FPRS, two versions of baseline RS are developed as follows: 222 PROCEEDINGS OF THE FEDCSIS. SOFIA, BULGARIA, 2022 • New user baseline model: for each movie in the training set, the most common gender and age group of the user was found. When a new user comes, they get recommended all movies that were assigned to their age group and gender. • New item baseline model: it works similarly. We find the most popular (watched) genre for each user. Then each new movie is recommended to all users whose favorite genre was the same as the new movie's.

C. Performance Comparison and Analysis
In order to provide a fair comparison, we use precision, recall, F1, and accuracy measures to compare the performance of FPRS against the baseline RS. After splitting the dataset into the training and testing sets and training both baseline and FPRS recommendation systems, we run two experiments to evaluate the user cold-start module and item cold-start module.
a) User cold-start module: In the performed experiment, we evaluated the user cold-start module in FPRS. We calculated precision, recall, F1, and accuracy measures for the results generated by the baseline RS and FPRS following the two strategies, which we have described in Section IV. The comparison results of this evaluation method are shown in Tables II and III. The results show that all developed strategies outperformed the baseline RS for precision, F1, and accuracy on both data-sets. However, the baseline solution reported higher recall, which is quite natural since the developed methods are more selective, providing more apt results with a tradeoff that some potentially relevant movies may be omitted. For the user-cold start problem, both strategies were evaluated at min_support value of 0.08 and performed similarly. The second one was just slightly better.  b) Item cold-start module: In the second experiment, we evaluated the item cold-start module in FPRS. We calculated precision, recall, F1, and accuracy measures for the results generated by the baseline RS and FPRS following all the strategies described in Section IV. The comparative summary of this evaluation is shown in Tables II and III. The results show that the performance of FPRS, using all strategies, is superior to the baseline solution. However, the results differ slightly between datasets. For MovieLens 100K, all strategies reported similar recall. Regarding precision and F1 measures, the most successful in dealing with new items in this data appeared to be strategy no. 4, which is based on both items' and users' characteristics. However, for the applications that do require high accuracy, it would be better to apply strategy no. 5, which was also superior in terms of recall, F1, and accuracy on the second data-set (MovieLens 1M). All strategies were evaluated at the participation threshold value of 30% and min_support value of 0.2.

D. Thresholds Sensitivity Analysis
In FPRS model, we use some threshold values, such as min_support and participation percentage, in order to extract frequent itemsets and produce relevant recommendations for new users and new items. In this section, we conduct some experiments to show how changing those values may impact performance of FPRS. Moreover, the output of this experiment helps to find the optimal values of these thresholds, and hence to conduct fair and reliable experiments.
In the first experiment, we aim to find the optimal value of min_sup threshold by evaluating FPRS (item cold-start module) using different min_sup values. Fig 3a shows how the F1-score of FPRS is impacted by applying different values for MovieLena 100K data. Observably, the best min_supp values for all strategies used in FPRS (item cold-start module) are between 0.1 and 0.2. The similar observations, regarding MovieLens 1M data, we may find in Figure 3b.
In the second experiment, we search for the optimal value of the participation threshold in FPRS (item cold-start module). Figures 3c and 3d show how F1-score of FPRS is impacted by applying different values for both investigated datasets. Observably, the best participation threshold values for all strategies used in FPRS (item cold-start module) are between 15% and 30%. Finally, it is worth noting that when we run this experiment, we use the optimal value of min_sup we found in the previous experiment.

VI. CONCLUSIONS AND FUTURE WORKS
This article introduces FPRS, a novel recommender system, which methodically utilizes the ratings to discover frequent itemsets associated with selected users/items features and then incorporates these frequent itemsets in generating recommendations for new users and items. Our study evaluates multiple strategies for creating frequent itemsets to produce meaningful and relevant recommendations.
To evaluate FPRS, we conducted comprehensive experiments on MovieLens 100K and 1M datasets using the FPgrowth algorithm to generate the frequent itemsets. The experimental results show that FPRS has outperformed the baseline recommender system in terms of the precision, recall, F1, and accuracy measures.
In the future work, we plan to incorporate additional contextual information and evaluate more advanced algorithms, such as AprioriTID and Apriori Hybrid, in the process of producing frequent patterns. Another important aspect to consider is to evaluate our method against more state-of-the-art recommender systems on various datasets. Furthermore, we plan to consider changes in users' behavior and preferences by periodically updating frequent itemsets based on recent changes in rating history. It would also be of value to extend the users' and items' data representation by applying a more advanced feature extraction to model the similarities among them more effectively [44], [45], [46].