Boosting a Genetic Algorithm with Graph Neural Networks for Multi-Hop Influence Maximization in Social Networks

In this paper we solve a variant of the multi-hop influence maximization problem in social networks by means of a hybrid algorithm that combines a biased random key genetic algorithm with a graph neural network. Hereby, the predictions of the graph neural network are used with the biased random key genetic algorithm for a more accurate translation of individuals into valid solutions to the tackled problem. The obtained results show that the hybrid algorithm is able to outperform both the biased random key genetic algorithm and the graph neural network when used as standalone techniques. In other words, we were able to show that an integration of both techniques leads to a better algorithm.


I. INTRODUCTION
Social networks form part of our daily lives. Whether a person is an artist or an engineer, a student or an academic, young or old, or a spartan troll, the person most likely is embedded in one or more social networks. People use social networks to communicate through audiovisual media or text messages, to help others, or even to attack them. In other words, people influence other people, either in a positive or in a negative way. Note that the world's leading technology companies invest huge amounts of money in advertising in social networks. For any social network it is essential to be aware of the transmission of information and its impact.
The identification of a group of users who can influence as many people as possible is a problem called influence maximization (IM). This problem was defined by Kempe et al. in 2003 [1]. By solving the IM problem, a set of adequate people can be identified, for example, for spreading the news about a certain product on a social network. In this way, the dissemination of the product can be supported and eventually maximized. The IM problem has been studied, for example, in the context of dealing with emerging negative opinions [2], social advertising [3], and influence maximization on Twitter for marketing campaigns [4]. Also, different variations of the IM problem have been studied. One example concerns the case of social networks that have a diversity of communities, which implies that there are different types of users who can influence in different ways [5]. Another example is the one of trying to maximize influence in time-evolving social networks [6].
A social network can be viewed as a (directed) graph in which the users are the nodes and user interactions are modeled by arcs. Furthermore, the propagation of influence is often simulated by models such as the independent cascade (IC) model and the linear threshold (LT) model, which might be deterministic or probabilistic. In any case, both models consider one-hop coverage, that is, if a person is covered depends exclusively on its direct neighbors. Although it is common to consider one-hop coverage models in IM problems, the interaction in social networks may also be of multihop nature [7].
In this paper we tackle a variant of the IM problem which can be seen as a variant of the classical minimum dominating set problem (MDSP) in a directed graph G = (V, A). In the MDSP, the task is to identify a set of nodes U ¦ V of minimum cardinality such that for each node v * V the following holds: either (1) v * U , or (2) it exists at least one node v 2 * U such that (v 2 , v) * A, where (v 2 , v) is the directed arc from v 2 to v. In other words, the classical MDSP considers one-hop coverage. In contrast, in the problem tackled in this paper-known as the multi-hop influence maximization problem (k-dDSP)-d-hop coverage is considered. More specifically, in the k-dDSP the task is to find a set U ¦ V of cardinality k such that the set C U ¦ V of nodes that are covered (or influenced) by U is of maximal cardinality. Hereby, a node v forms part of C U -that is, v is said to be covered (or influenced) by Uif there exists a node v 2 * U such that the shortest directed path from v 2 to v consists of at most d arcs. As an example, consider Figure 1, where k = 2 and the two nodes with a purple color form part of set U , that is, 6 and v 7 are in 1-hop distance from a node in U . In other words, the objective function value of U is 5. In case 11 } and the objective function value of U would be 8. Finally, in case d = 3, C U = V and the objective function value would be 11.
For a large-scale graph, such as a large social network, exact solutions to the k-dDSP are costly to compute. Therefore, researchers have focused on heuristic and on machine learning techniques for solving this problem (see Section II). In this paper, we present a novel hybrid algorithm to solve k-dDSP. This algorithm is obtained through an integration of (1) a new graph neural network (GNN) called graph inverse attention network (GRAT) that incorporates the influence of the neighbourhood into the feature embedding of each node [8], and (2) a biased random key genetic algorithm (BRKGA) [9]. More specifically, the information provided by the GNN is used in the BRKGA as greedy information. Our algorithm is evaluated on real-world networks with up to 500.000 nodes and 1 million arcs. Experimental results prove that our hybrid method is at least as good as the two individual techniques in most cases. Therefore, our method is another example for the successful integration of a GNN framework with a metaheuristic to boost the metaheuristics' performance. The article is organized as follows. Section II introduces prior and related work. In Section III, we provide a more technical description of the k-dDSP. In Section IV, we present our hybrid approach. In Section V, we compare and analyze our hybrid algorithm on real-world data sets. Finally, Section VI concludes the work with some discussions on the utilized type of hybridization.

II. RELATED WORKS
As mentioned before, most of the works on influence maximization (IM) in social networks make use of the independent cascade (IC) and the linear threshold (LT) diffusion models for calculating the influence of solutions. Chen et al. [10], [11] present a fined-tuned heuristic for generating scalable solutions to the IM problem and an improved runtime in comparison to previous approaches. Jung et al. [12] provide an even faster heuristic for the application to large graphs. Goyal et al. [13] added Monte Carlo simulation in order to obtain an improved heuristic.
Multi-hop influence is a way of measuring the influence of a group of people (set of nodes) in IM problems that has been studied recently. Nguyen et al. [14] proposed a heuristic which takes into account the probability of each node in the network for contributing to a high influence spread. However, the proposed algorithm is only evaluated on small graphs of less than 30.000 nodes. With respect to large graphs, Nguyen et al. [15] presented an alternative heuristic that consists of three phases: pre-optimization to reduce the size of the graph, a construction phase to build a k-dominating set, and post-optimization by removing redundant nodes from the set. This algorithm improves in speed over the one from [16]. However, by doing so it sacrifices performance for a reduction of computation time.
Recently, machine learning techniques have entered the scene to solve combinatorial problems [17]. An early example is S2V-DQN, which is a general reinforcement learning (RL) framework for combinatorial optimization problems in graphs proposed by Khalil et al. [18]. It uses graph neural networks (GNNs) for graph embedding with partial solutions and a deep Q-network (DQN) for node selection. This framework is more and more used by the community working with learning techniques for combinatorial optimization.
In the same line, FASTCOVER is a very recent unsupervised learning framework for solving the k-dDSP by Ni et al. [8].
It uses a multi-layer GNN known as graph reversed attention network (GRAT) for generating for each node of a given graph a probability to belong to the optimal solution. The output of FASTCOVER are the k nodes with the highest probability. The authors of [8] show that FASTCOVER outperforms the existing heuristics.

III. PROBLEM DEFINITION
Many optimization problem in social networks can be formalized by modeling the social network as a directed graph G = (V, A), where V is the set of nodes and A is the set of directed arcs present in the graph. This is also the case for the multi-hop influence maximization problem tackled in this paper, denoted by k-dDSP.
The most important concept in this context is the one of the influence I d (u) ¦ V of a node u * V , which depends on two things: 1) Parameter d g 1, which is part of the problem input.
2) A distance measure dist(u, v) between nodes. In the context of this paper, dist(u, v) is defined as the length-in terms of the number or arcs-of the shortest directed path from u to v in G. With this we can provide the definition of I d (u) as follows: In other words, I d (u) is the set of all nodes of G that can be reached from u by means of a directed path with at most d arcs. We say that u influences (or covers) all nodes from I d (u). This definition can naturally be extended to sets of nodes in the following way: That is, I d (U ) is the set of all nodes of G that are influenced by at least one node from U . Valid solutions to the k-dDSP are all sets U ¦ V such that |U | f k, that is, any valid solution may consists of at most k nodes. The goal of the k-dDSP is to find a valid solution Finally, note that the k-dDSP was proven to be NP-hard in [8], [19].

IV. METHODOLOGY
In this section, we present a novel hybrid algorithm that emerges from the integration between a BRKGA and a GNN framework for solving the k-dDSP in social networks. To begin, we briefly introduce both methods individually. Then we present the developed hybridization strategy.

Algorithm 1
The pseudo-code of BRKGA Require: a directed graph G = (V, E) Ensure: values for params. p size , p e , p m , prob elite , seed 1: P ± GENERATEINITIALPOPULATION(p size , seed)

A. Biased Random Key Genetic Algorithm
We implemented a Biased Random Key Genetic Algorithm (BRKGA), which is a well-known GA variant for combinatorial optimization. In general, a BRKGA is problemindependent because it works with populations of individuals that are vectors of real numbers (random keys). The problemdependent part of each BRKGA deals with the way in which individuals are translated into solutions to the tackled problem. The problem-independent pseudo-code of BRKGA is provided in Algorithm 1.
In the following, we first describe the independent or generic part of the algorithm. It starts by invoking function GenerateInitialPopulation(p size , seed), which generates a population P formed by p size individuals. In case seed = 0, all p size individuals are randomly generated. Hereby, each individual Ã * P is a vector of length |V |, where V is the set of nodes from the input graph. For this purpose, the value at position i of Ã, denoted by Ã(i), is chosen uniformly at random from [0, 1], for all i = 1, . . . , |V |. In case seed = 1, only p size 2 1 individuals are randomly generated. The last individual is obtained by defining Ã(i) := 0.5 for all i = 1, . . . , |V |. Next, the individuals from the initial population are evaluated. This means, each individual Ã * P is transformed into a valid solution U π to the k-dDSP, and the value f (Ã) of Ã is defined as follows: f (Ã) := |U π |. The transformation of individuals to valid solutions is discussed below.
Then, at each iteration of the algorithm, the operations to be performed are as follows. First, the best max{+p e · p size ,, 1} individuals are copied from P to P e in function EliteSolutions(P, p e ). Second, a set of max{+p m · p size ,, 1} socalled mutants are generated and stored in P m . These mutants are random individuals generated in the same way as the random individuals from the initial population. Finally, a set of p size 2 |P e | 2 |P m | individuals are generated by crossover in function Crossover(P, p e , prob elite ) and stored in P c .
Each such individual is generated as follows: (1) an elite parent Ã 1 is chosen uniformly at random from P e , (2) a second parent Ã 2 is chosen uniformly at random from P \ P e , and (3) an offspring individual Ã of f is generated on the basis of Ã 1 and Ã 2 and stored in P c . In the context of the crossover operator, value Ã of f (i) is set to Ã 1 (i) with probability prob elite , and to Ã 2 (i) otherwise. After generating all new offspring in P m and P c , these new individuals are evaluated in function Evaluate(); see line 7. Note that the individuals in P e are already evaluated. Finally, the population of the next generation is determined to be the union of P e with P m and P c .
The evaluation of an individual (see lines 2 and 7 of Algorithm 1) is the problem-dependent part of our BRKGA algorithm. The function that evaluates an individual is often called the decoder. In our case, we make use of a simple greedy heuristic which is based on the intuition that nodes with a higher degree (number of neighbors) are more likely to have a high influence than nodes with a lower degree. Hereby, the set of neighbors In other words, the greedy value of a node v i is obtained by multiplying the degree of v i with the numerical value found at position i of the individual to be translated into a solution. Subsequently, solution U π is obtained by adding the k nodes with the highest greedy values. Note that, in Section IV-C, greedy function × will be modified in order to obtain a hybrid algorithm.

B. Graph Neural Network Framework
The general objective of a graph neural network (GNN) [20]- [22] is to automatically find patterns in data. In contrast to more classical deep learning techniques, GNNs directly work on graphs. Therefore, they can be used to make predictions about nodes, arcs, or subgraphs without the need for unnecessary transformations of the graph. The crucial idea of GNNs is to iteratively update so-called node representations by combining the representations of a nodes' neighbors with its own representation. Given a graph G = (V, A), H l * R |V |×C are node attribute matrices, one for each layer l * {0, 1, . . . , L} of the GNN. Note that C is hereby the number of chosen features. Each line in such a matrix is a representation for the respective node. The final goal of a GNN is to learn competent node representations in these matrices.
In order to adapt/train the representations to be useful for a specific task, there are two actions that are successively performed at each GNN layer: (1) Aggregate, which aggregates all the information from the neighbors of each node, and (2) Combine, which updates the node representations by combining the aggregated information from the neighbors with the current node representation. Based on this, the general framework of a GNN can be specified as follows: is the set of neighbors of node v. H K is the node representation matrix for each layer. Once the training process finishes, the final representations can be used for making predictions.
A GNN can be trained, for example, in order to make predictions about the probability of each node to belong to the optimal solution of the k-dDSP. In fact, as mentioned already in the section on related work, such a GNN approach was recently presented in [8]. This GNN-called FASTCOVER (FC)-is an unsupervised GNN framework. FC can be characterized as follows: (1) the features of all nodes are embedded as vector spaces, and the direction of each arc is reversed, (2) a multi-layer GNN known as graph reversed attention network (GRAT) assigns each node to its value within [0, 1], and (3) the representations of the GNN are optimized in the training stage through a differentiable loss function over all nodes scores.
The GRAT layer is the heart of FC. In particular, in contrast to a standard graph attention network (GAT) [23], the socalled attention mechanism is integrated at the origin nodes instead of the destination nodes. The central idea is that the nodes with more influence are likely to receive a stronger reward. This means a higher probability of getting a potential score.

C. The Hybrid BRKGA Algorithm
Our hybrid algorithm-henceforth called BRKGA+FCstarts with two offline steps. Given a network in which the k-dDSP must be solved, first, all node probabilities are extracted from the trained FC model; note that the prior training process is described in the next section. This probability is denoted by p i * (0, 1] for each v * V . Then, the original greedy function ×() from Eq. 4 is replaced by the following one that incorporates the node probabilities extracted from FC: The hypothesis is that good/correct predictions will bias the algorithm towards the area in the search space in which an optimal solution is located, or, at least, solutions of very high quality. Moreover, we expect the probabilities obtained from FC to undo the bias introduced by the degree of a node, which might sometimes be misleading. The integration process is also shown in Figure 2.

V. EXPERIMENTAL EVALUATION
This section is divided into three parts. First, we will describe the preparation of the data for training and evaluation, and the parameter tuning procedure. Then, the experimental setting and the numerical results of three algorithms will be presented (FC, BRKGA, and BRKGA+FC). In this context note that FC can, of course, be used as a standalone technique by simply adding the k nodes with highest probabilities to the solution. Finally, we will analyse the algorithms graphically by means of so-called search trajectory networks. 366 PROCEEDINGS OF THE FEDCSIS. SOFIA, BULGARIA, 2022 Of f l i n e Fig. 2. Hybridization Process. The integration of BRKGA with FC starts with two offline steps concerning FC as follows. The training phase begins by using 15 random graphs (Erdős-Rényi). This provides us with a trained version of FC (called GNN Framework in the graphic). Then, the social network in which the k-dDSP is to be solved is presented to FC, which returns probabilities for all nodes of the network to belong to the optimal solution. Finally, the final phase consists of integrating these probabilities into the BRKGA (called Genetic Algorithm in the graphic).

A. Data Preparation and Tuning Process
We decided to execute experiments for three different values of k, that is, k * {32, 64, 128}. For this reason we trained 3 different FC models, one for each value of k. Figure 3 illustrates that each model uses the fixed-parameter d = 1. In other words, the same FC model is used for applications of FC and BRKGA+FC for all d * {1, 2, 3}. This was done for reducing the computational burden. Nevertheless, in the analysis of the final results we will see that this had some influence on the quality of the node probabilities extracted from the FC models, that is, these probabilities seem to loose accuracy with a growing value of d.
The three FC models (for each value of k) were trained as follows. First, we used 15 Erdős-Rényi graphs [24] with 4000 nodes each, similar to what is presented by the authors of [8]. After the training phase, the probabilities for all 19 social networks used later for the final experimental evaluation are extracted (for each value of d * {1, 2, 3}) and stored in text files.
In order to ensure a fair experimental evaluation, both BRKGA and BRKGA+FC were tuned for each value of k using 10 test graphs. In particular, we used Erdős-Rényi graphs 1 with n = 25.000 nodes and an arc probability of p = 10/n. The tuning was done using a well-known tool called irace [25]. The considered parameter domains together with their finally chosen values are provided in Table  I. Note that the number of nodes (25.000) of the test graphs corresponds to approximately the average number of nodes in the networks used for the final experimental evaluation (presented in Section V-B). The size of the tuning graphs is reasonable because the population size parameter in BRKGA is highly dependent on the size of the graphs. In the case of FC, we do not modify the parameters and the configuration as described in [8].
Note that the training phase of FC and the parameter tuning procedure for BRKGA and BRKGA+FC were performed with random graphs to maintain generality.

B. Experimental Evaluation
In this subsection, we apply all three approaches-FC, BRKGA and BRKGA+FC-to 19 real-world social networks from the SNAP library [26]. Each of these networks is a directed, unweighed graph. The sizes of these graphs are provided in Table II (columns |V | and |A|).
We use three different values of k * {32, 64, 128}. Also for d, the multi-hop influence parameter, we used three different values: d * {1, 2, 3}. Note that k = 64 was used for the experimental evaluation on social networks of FC [8]. In order to provide a broader experimentation and analysis we also considered two additional values of k: a smaller one (32) and a larger one (128). The reason to not consider values of d greater than 3 is that we were not able to observe substantial differences to the case d = 3. As FC is a deterministic approach (at least for what concerns the use of the model after training), it was applied exactly once to each of the 19 networks, for each combination of k and d. On the contrary, both BRKGA and BRKGA+FC were applied 30 times to each network and combination of k and d. As a computation time limit of BRKGA and BRKGA+FC we used 900 CPU seconds for each run. All experiments were performed on machines with an Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz. The FC framework uses Python 3 and our implementations of BRKGA and BRKGA+FC were coded in C++.
In Table II  hybrid algorithm BRKGA+FC generally benefits from the use of the probability information extracted from FC for the translation of individuals into solutions. This advantage of BRKGA+FC over BRKGA is greatest for the smallest value of d (that is, d = 1). In this case BRKGA+FC outperforms BRKGA in 73% of all cases. " The worst performance of BRKGA+FC is obtained for k = 32 and d = 3 (47% of superiority). This may be due to two possible reasons: (1) FC might find it difficult to detect pattern for rather small values of k; (2) all our FC models were trained for d = 1, which might suggest that our results could be improved by specifically training FC for each value of d.
Summarizing, we can say that making use of information from the GNN framework FC within our BRKGA clearly improves the algorithm.

C. Analysis
There are cases when our hybrid algorithm does not perform as expected. This is the case when the results are similar to the ones of BRKGA, or when they are even worse than the ones of BRKGA. In an attempt to analyse such cases we used the Search Trajectory Networks (STNs) tool from [27], which allows to visualize the trajectories of algorithms in the search space. Moreover, it lets us compare the behavior of more than one metaheuristic. For this analysis we chose three network corresponding to three different cases as outlined in Figure 4. The obtained graphics allow to make the following observations. 1) Figure 4 (a). This is a case in which the hybrid algorithm BRKGA+FC does not perform well in comparison to BRKGA. We can see in the graphic that both algorithms are clearly focused on different areas of the search space.
In particular, BRKGA is attracted by a certain area of the search space. Nevertheless, the best solution found (red dot), even though it belongs to this part of the search space, it is not close to the area of attraction (see the two larger grey triangles). One hypothesis is that the probabilities provided by the graph neural network framework (FC) for this instance are rather misleading. 2) Figure 4 (b). In this case, the performance of both algorithms is comparable. Again, the two algorithm version are focused on different areas of the search space. This time there is a minimal overlap between two of the algorithm trajectories (see the light gray dot in the middle of the graphic). Interestingly, even though both algorithms find a best solution of the same quality, these two solutions are clearly different to each other (see the two red dots).
However, as mentioned before, in a majorty of cases BRKGA+FC outperforms BRKGA. Such a case is visualized in the graphic of Figure 4 (c). It can be observed that the trajectory of BRKGA+FC is more bounded and, therefore, it is not dispersed in the search space as it occurs for BRKGA. Moreover, the best solution is found in the area of the search space that attracts BRKGA+FC. This means that, in this case, the information provided by the FC is very useful.

VI. CONCLUSION AND FUTURE WORK
In this work we have devised a hybrid algorithm combining a biased random key genetic algorithm with a graph neural network called FASTCOVER. This was done in the context of an NP-hard combinatorial optimization problem dealing with the maximization of influence spreading in social networks. In particular, our hybrid algorithm makes use of the recommendations provided by FASTCOVER (in the form of probabilities) for translating individuals to valid solutions to the tackled problem. The results have shown that, in a majority of the cases, our hybrid algorithm outperforms both its individual algorithmic components: the biased random key genetic algorithm and FASTCOVER. The experimental evaluation of our approaches was done in the context of 19 real-world social networks.
One opportunity to advance this type of hybridization is to address other problems using a similar integration methodology, especially taking the recent progress of graph representation learning into account.  and BRKGA+FC (pink) for three instances (gplus, twitter-follows, and themarker). The value of z indicates the degree of search space partitioning used to generate the graphics (see [27]). Yellow squares indicate the start of trajectories, while gray triangles indicate their ends. Also, light gray circles indicate that both algorithms passed through this location of the search space, while red circles indicate the best solutions found. (a) A case in which BRKGA is able to outperform BRKGA+FC (gplus). (b) A case in which BRKGA and BRKGA+FC achieve similar results (twitter-follows). (c) A case in which BRKGA+FC outperforms BRKGA (thermarker). For each graphic we used the force-directed layout based on physical analogies, not relying on any assumptions about the structure of the networks.