An Integer Programming Approach Reinforced by a Message-passing Procedure for Detecting Dense Attributed Subgraphs

One of the recent challenging but vital tasks in graph theory and network analysis, especially when dealing with graphs equipped with a set of nodal attributes, is to discover subgraphs consisting of highly interacting nodes with respect to the number of edges and the attributes’ similarities. This paper proposes an approach based on integer programming modeling and the graph neural network message-passing manner for efficiently extracting these subgraphs. The experiments illustrate the proposed method’s privilege over some alternative algorithms known so far, utilizing several well-known instances.


I. INTRODUCTION
W HEN studying graphs/networks, we usually face collections consisting of nodes with common topological and nodal attribute characteristics. More specifically, sets of highly interactive vertices are likely to yield and share common relationships and properties. In this sense, in all scientific fields where graphs are somehow implicated, one of the challenging tasks would be to efficiently partition the given graph into a number of dense subgraphs consisting of massively connected vertices that share similar properties. These subgraphs are well known as communities, and naturally, the process of identifying them is referred to as the community detection problem. Detecting communities has become one of the fundamental subjects in the field of network analysis and graph theory and has numerous applications in a wide range of areas, including the analysis of Social/Biological/Cosmological networks [1], [2], [3], [4] and WEB [5]. It also plays a crucial role in the domain of Network Design problems [6], Signal Processing [7], Image Segmentation [8], Pattern Recognition [9], and Data Mining [10].
Crucial in this domain is a more formal representation of a graph: A pair G = (V, E) with the set of vertices V and edges E. Subsequently, from the perspective of graph topological structure, a community can be contemplated as a subset C ¦ V with a high density of edges between nodes inside C and a low density of edges connecting C to the other subsets. Accordingly, one can define the community detection problem as partitioning V into a set of disjoint communities C = {C 1 , C 2 , . . . , C k }.
In fact, mining high-quality communities usually coincides with finding a measure that estimates the goodness of communities. In the literature, a large number of such quality measures have been proposed for evaluating the superiority of partitioning from the viewpoint of topological structures. Among them, one of the most widely used and well-known is Modularity, introduced by Newman [11]. Intuitively, for a community C, Modularity is the number of edges inside C subtracted by the expected number of such edges, whereat the expected number of edges can be derived by corresponding to G, a randomized graph (called the Null model) with exactly the same vertices and the same degree of G, in which edges are placed randomly. More specifically if d i and m are, respectively, the degree of node i and the number of edges in G, the probability of existing an edge between nodes i and j in such a graph is didj 2m . This is because, first, each node is assigned the number of stub links exactly equal to its edges (Fig. 1a, 1b). Afterward, each of the two stub edges will be joined at random (Fig. 1c). Consequently, the Modularity value for a community C can be defined as the number of edges within C in G minus the number of edges within C in a Null model of G. Thus, Modularity for partitioning C can then be expressed as where A = (a i,j ) be the adjacency matrix of G, where a i,j is one when there is an edge between node i and node j, and zero otherwise; n is the number of vertices in G. In addition, ξ(i, j) is one if i and j are in the same community and zero otherwise. As a result, high-quality communities can be determined as the ones with a high value of Modularity. However, despite Modularity's advancements in finding high-quality communities in a wide range of graphs, it is known to suffer from limitations (see [12], [13], for example). In particular, as pointed out in [14], since Modularity only considers the existing edges of the network, it qualifies the goodness of the discovered communities by only measuring how good the partitioning fits the existing edges. This is indeed a drawback because the disconnected nodes (absent links) that lie in the same community are not taken into account.
On the other hand, Max-Min Modularity [14], as one of the successful extensions of Modularity, is able to significantly improves the accuracy of the measure by compensating for the Modularity quantity when disconnected nodes are in the same community. More precisely, it is assumed in [14] that (in addition to the graph G) a zero-one relation matrix U = (u i,j ) is given that defines whether every pair of disconnected nodes of the network is related or not: u i,j is one when disconnected nodes i and j are related, and zero otherwise. Max-Min Modularity, in fact, tries to take into account the importance of the indirect connections between disconnected nodes by penalizing the Modularity measure when unrelated nodes are in the same community: Consider a complemented graph G 2 = (V, E 2 ), where E 2 contains an edge between every pair of disconnected nodes of G that is unrelated; i.e., there is an edge between i and j in G 2 if there is not such an edge in G and also u i,j is zero. Max-Min Modularity, then, aims at maximizing the Modularity in G as well as minimizing Modularity in G 2 simultaneously. Mathematically speaking, if i is the degree of node i in G 2 , and m 2 is the number of the edges in G 2 , then Max-Min Modularity Q M M of a given partition C of V is defined as follows: We refer to the problem of finding a partition of the network that maximizes Max-Min Modularity as the Max-Min Modularity Maximization problem. As a crucial remark, we can note that not only is the Max-Min Modularity Maximization problem categorized in the class of NP-hard problems [15], making it hard to find an efficient optimization algorithm, but it also suffers from a major drawback: Max-Min Modularity strongly depends on the accuracy of the given relation matrix, meaning that the quantity of the measure might be heavily affected by the node relationships defined by the user/oracle in the first place. De-spite Ferdowsi and Khanteymoori [16] succeeded in proposing a systematic approach for unveiling the relation between nonadjacent vertices, from a more critical point of view, one could argue that both Modularity and Max-Min Modularity, like many other community discovery methods, only utilize topological information of nodes, naively overlooking a rich set of nodal attributes (e.g., user profiles of an online social network or textual contents of a citation network), which is abundant in all real-life networks [17].
In this regard, recently, an emerging domain of deep learning for graphs has ensured the design of more accurate and scalable algorithms. However, despite these approaches achieving outstanding results in graph-related tasks like link prediction and node classification [18], fairly little concentration has been committed to their application on unsupervised learning that encompasses the community detection problem. This is primarily because graph embedding methods can ideally align with (semi-)supervised learning approaches; however, they cannot be naturally generalized to the unsupervised learning manners since it is not very simple to find a proper loss function to govern the back-propagation updating procedure. Despite that, several deep learning-based unsupervised graph algorithms have been proposed [19], [20], [21], but they all suffer from a not accurate choice of a loss function that can authoritatively extend the model to unsupervised vision. The diverse results obtained by these approaches over the same input instances can prove this claim.
Main contribution: In this work, we introduce a refined model for the Max-Min Modularity that empowers us to find high-quality communities with simultaneously taking the graph's topological structure and the nodal attributes into account. In the sequel, we involve the Modularity metric to mine the communities consisting of densely connected vertices. In addition, we provide a technical introduction to Graph Neural Network (GNN) formalism, a dominant and fast-growing paradigm for deep learning with graph data, whose ability is to use nodes' features and local structure to generate embeddings. Utilizing the general concept of GNN feed-forward message passing, we devise an efficient mechanism for extracting the information induced by propagating nodes' properties throughout the network, leading to a perfect systematic characterization of the relation matrix. Everything combined, we introduce the advanced Max-Min Modularity scheme and express it with a standard integer programming formulation. Ultimately, by solving the model using a robust approach consisting of a row/column generation technique for solving the model's linear relaxation version and a local search manner for obtaining integer solutions, we identify the final communities.
The paper is organized as follows: the rest of this section focuses on providing a brief literature review. In Section II, we first restate the Modularity Maximization problem in terms of an integer programming model. We then devise a method for providing a refined relation matrix. Afterward, gathering all things together, we extend the primary Max-Min Modularity model and present an integer programming formulation for 570 PROCEEDINGS OF THE FEDCSIS. SOFIA, BULGARIA, 2022 it. Next, in Section III, we introduce the employed solution approach that leads to discovering high-quality communities. Section IV eventually focuses on various experiments, confirming the proposed method's high performance.

A. Related Works
Several approaches have been proposed in the literature to detect communities in networks; see, for example, the survey conducted by Souravlas et al. [22]. However, despite a large number of these techniques, relatively little work solves the problem using mathematical programming techniques. Especially in the case of Modularity maximization, few results have been established [23], [24], [25], [26]. On the other side, Chen et al. [14] illustrated that maximizing the Modularity alone does not usually lead to superior communities. Based on their findings, one of the significant drawbacks of Modularity is its sheer dependence on the (existing) links, meaning that, Modularity only focuses on discovering communities in which the number of interactive edges (links) is as many as possible. At the same time, it does not pay any attention to the missing links. This is while just as existing links can play an essential role in analyzing networks, so do the absent links. Therefore, new extensions of Modularity emerged subsequently, one of the most successful of which is the already mentioned: Max-Min Modularity [14]. Nevertheless, as discussed, Max-Min Modularity itself suffers from a critical issue: the nonexistence of a systematic way for proposing the so-called relation matrix, which is required to express the relationship between nonadjacent nodes. In this respect, Ferdowsi and Khanteymoori [16] succeeded in offering an analytical procedure to address this deficiency, generalize the conventional Max-Min Modularity, and provide an efficient local search-based algorithm to discover high-quality communities by considering both existing and missing links.
On the other side, in the past few years, most research has surged toward deep learning methods due to their power to achieve unprecedented results. These techniques aim at embedding nodes into a low-dimensional, dense vector space [27], [28]. However, unfortunately, a vast number of these methods lack the strength of encountering attributed graphs in which nodes are equipped with a set of features, and this is while we are now most surrounded by attributed networks everywhere [19]. It is worth mentioning that a few deep-learning methods have been recently proposed that consider attributed network embedding [29], [30], though most of them employ a matrix factorization manner, which endures some critical boundaries. More precisely, the representation capability of a matrix factorization-based approach is found to be more inadequate than a neural network-based method [31]. Besides, one could also argue that the majority of these proposals only rely on supervised graph algorithms, and therefore, usually fail to perform the community detection task, which can be categorized as an unsupervised assignment in graph problems [32], [33]. And the rests, which are designed for unsupervised learnings, still cannot find a promising loss function that can lead to fine communities for various given graph instances [34], [19].

II. MODEL SKETCHING
In this section, we first recite the so-called Modularity Maximization problem in terms of an integer programming formulation, enabling us to discover communities with respect to the topological aspects of a given graph. Afterward, we explain the Graph Neural Network message passing method that facilitates us to devise a procedure for determining an accurate relation matrix for the Max-Min Modularity problem. This achievement then hopefully leads to a proper integer formulation for our advanced Max-Min Modularity problem that can be used to efficiently capture communities with respect to simultaneously taking both topological and attributes aspects into consideration.

A. Topology extraction
Let the binary variable x ij indicate if nodes i and j belong to the same community or not; the value of x ij is zero if nodes i and j belong to the same community, and one otherwise.
for each (i, j) * I all . As described in [25], the Modularity Maximization problem can be formulated in terms of the following integer linear program: (6) Constraints (3)-(5) guarantee that if i and j are in the same community and j and k are in the same community, then so are i and k. We refer to the relaxation of (IP-M), obtained by replacing the constraints x ij * {0, 1} by x ij * [0, 1], as (LP-M). As discussed in [23] and [35], solving IP-M can soundly provide us with communities consisting of highly interactive edges. It is, however, incontrovertible that maximizing Modularity cannot help us tackle the attributed graphs. Consequently, to address this deficiency, our goal is to design an advanced model for the Max-Min Modularity problem so that nodal attributes are also effectively involved in the community mining process. In order to do that, we first provide a way to exploit the information diffused via nodes' features.

B. Attribute extraction
In this section, which establishes the core part of this research, we utilize the Graph Neural Network (GNN) messagepassing framework to provide a systematic and accurate way of defining a relation matrix that perfectly depicts the similarity between nodes by analyzing the information spread by the attributes. More precisely, we aim to feed the nodal attributes to a GNN message passing to create single representation vectors, each of which captures a node's structure as well as the feature information.
To motivate our discussion, we initially raise the notion of node embedding, which seeks to encode nodes as lowdimensional vectors that summarize their structural graph position and the information of their local graph neighborhood. In other words, node embedding aims to project vertices into a latent space, where geometric relations in this latent space correspond to relationships (e.g., edge existence or similarity) in the original graph or network. In this fashion, an encoder can be referred to as a function that maps each node u * V to vector embedding z u * R d (i.e., z u corresponds to the embedding for node u * V ) (See Fig. 2).
We now turn our attention toward one of the substantial encoder models: Graph Neural Network (GNN), a broad framework for defining deep neural networks on graph data. The elucidative characteristic of a GNN is that it adopts a form of neural message passing in which vector messages are exchanged between nodes and updated using neural networks [36]. In each message-passing iteration of a GNN, a hidden embedding h (k) u corresponding to each node u * V is updated according to information aggregated from u's graph neighborhood N (u). Informally speaking, to each node u * V one can correspond a so-called computational graph that accumulates the information propagated from u's neighbors, and in turn, the messages coming from these neighbors are based on information aggregated from their respective neighborhoods, and so on. Fig. 3 exemplifies a two layers computational graph. From the mathematical perspective, GNN message-passing procedure can be expressed as follows. Suppose that each node u is associated with a d-dimensional attribute vector that we represent with X u * R d . Then, the k2th embedding layer h (k) u , corresponding to node u * V , can be obtained by the following recursive formula: where h (0) u = X u and W d×d is a trainable matrix, which weights the nodes' attributes. Moreover, let σ denotes an elementwise non-linearity (e.g., a tanh or Relu). Furthermore, b (k) * R d is the bias term, which can be often omitted for the sake of simplicity, but including it could be important to obtain high-quality performance. As can be inferred from (7), first, the messages incoming from the neighbors are summed; Fig. 3: Two layers computational graph corresponding to node A. The model aggregates messages from A's local graph neighbors, and in turn, the messages coming from these neighbors are based on information aggregated from their respective neighborhoods. then, the neighborhood information with the node's previous embedding is combined using a linear combination; finally, an elementwise non-linearity is applied. Now, in order to transform the node-level equation (7) into something we can implement, we can come up with the following graph-level definition of the model: where H (k) * R |V |×d is the matrix of node representations at layer k in the GNN, with each node corresponding to a row in the matrix. However, despite the straightforward intuition behind the update procedure (8), considerably notable is its two limitations. The first one is the non-consideration of the attributes of each node itself. This is crucial since the effectiveness of GNN becomes severely limited, as the information coming from the node's neighbors cannot be differentiated from the information from the node itself. This restriction originates form the fact that self-loops are not taken into account in the adjacency matrix A. The problem, however, can be easily resolved by substituting A with A + I, where I * R n×n is the identity matrix that lets the self-loops also be involved. Another issue that may raise concern regarding (8) is the non-normality of features, which can indeed lead to numerical instabilities. However, fortunately, to also prevent this shortcoming, it seems convincing to apply the symmetric normalization as it has turned out to drive powerful dynamics [37]. For doing this, it is sufficient to replace A + I with the normalization term D It is worth pointing out that since our goal is to employ the feed-forward message passing exclusively and not to implement the back-propagation procedure, training the weight matrix W is not specifically part of our requirements. Instead, the only thing that matters to us is finding a final vector representation for each node by which we can sufficiently reflect the similarities among nodes with respect to the information that originates/propagates from attributes. For this reason, we can 572 PROCEEDINGS OF THE FEDCSIS. SOFIA, BULGARIA, 2022 safely omit W from (9) and obtain the following embedding matrix H.
One important insight that could be gained by (10) is that GNN feed-forward message passing is capable of effectively encoding neighborhood information in such a way that after performing the updating procedure for a number of layers, similar nodes in the graph will tend to have analogous final embedding representation [38]. This is primarily due to the fact that each node inherits the information (attributes) from its neighborhood.
Accordingly, any techniques for computing the distance between each pair of final vectors representation could naturally lead to obtaining the similarities between the corresponding vertices with respect to their attributes. In this work, we employ the well-known z-Normalized Euclidean Distance to measure the similarities between nodes. In this fashion, the distance between two vectors is defined as the Euclidean distance between the normal form of the two vectors, where the normalized form associated with each sequence is obtained by transforming the vector so it has a mean distribution µ = 0 and standard deviation σ = 1. More explicitly, given two nodes i and j, with the corresponding vector embeddings H i . It is not difficult to check that for any given i, j * V , we have x 7 ij * [0, 1] [39] and that x 7 preserves the triangle inequality. Subsequently, x 7 forms a metric space, which we denote by Embedding Distance (ED), on graph G. Clearly, the larger the obtained ED between two nodes, the less similarity between them.

C. Advanced Max-Min Modularity Model
As already mentioned, the larger x 7 ij is, the less likely it is that i and j are similar, that is, the less correlated they are, and therefore, the more likely they are to be in distinct communities. This intuition and also the fact that the Modularity Maximization problem can be nicely expressed for weighted graphs [40] persuade us to propose an advanced Max-Min Modularity model by recharacterizing the relation matrix and so the complemented weighted graph G 2 = (V, E 2 ) using ED. Accordingly, we define the relation matrix A 2 = (a 2 i,j ) and G 2 ((a 2 i,j ) represents the weight of the edge between nodes i and j in G 2 ) as follows: Given a relation matrix A 2 = (a 2 i,j ), and noticing that in the induced metric ED, Constraints (3)-(5) guarantee the triangle inequality, for any i, j, k * V , the advanced Max-Min Modularity Maximization problem can be formulated as the following IP: Considerably important is the fact that maximizing IP-MM coincides with simultaneously maximizing the modularity over the original given graph G and minimizing the modularity over the complemented graph G 2 , determined by the proposed message-passing approach. Whereas the first component makes sure to return communities, each consisting of densely connected nodes, the latter attempts to extract the communities containing vertices with the most similar attributes possible.

III. SOLUTION APPROACH
Efficiently solving (IP-MM) consists of two main parts: 1) optimally solving (LP-MM) and 2) accurately rounding the obtained fractional solutions to the integer ones.
" By having the optimal solutionx 7 to (LPs-MM(I)), determine the optimal solution x 7 to (LP-MM) by Equation (16). To accomplish the second task, we can again apply the rounding algorithm proposed in [16], which can be shortly expressed as follows. Point out that the fractional optimal solution to (LP-MM) provides us with a metric space, where the distance between every pair of nodes is at most one. Let us call this metric LP distance. It is apparent that in an integral solution, the distance between each pair of nodes is either zero (implying that these two nodes belong to the same community) or one (inferring that these two nodes belong to separate communities). The rounding task is to push (some of) the nodes so as to determine a final valid configuration of the points in which the distance between every pair of nodes is either zero or one. Obviously, such a valid configuration correlates with a feasible integral solutionx to (IP-MM): x ij = 0, for points i and j which are co-located; andx ij = 1, for those with the distance one from each other. The main idea of the local search-based rounding procedure is to explore such promising configurations (using the LP information) and find a configuration leading to a solution (partitioning) whose max-min modularity value (2) is (locally) optimal.
Assume thatx is the optimal fractional solution to (LP-MM), obtained by the mentioned row/column generation technique. As explained above, to compute an integral solution to (IP-MM), we need to determine a set of final locations at distance one from each other (we refer to these final locations as community centers) and move the nodes to these centers so as to obtain a valid configuration. In fact, to compute an integral solution, we only need to determine a set of distinct centers, since we can then simply move each point to the closest center (with respect to the LP distance) and obtain a corresponding integral solution: for each pair i and j,x ij = 0 if i and j are co-located; and 1 otherwise. Therefore, the only task of the utilized local search algorithm is to determine a good set of final centers.
More precisely, the local search algorithm works as follows: Randomly pick a subset of V as initial centers. As discussed above, move other vertices to these centers to form a solution (partitioning) and then compute the max-min modularity value Facebook348 [38] of the resulting partitioning; see (2). The local search technique tries to improve the max-min modularity value by adding and/or deleting a center to/from the set of centers at a time.
The local search movement that yields the most significant improvement in the max-min modularity value is selected at each iteration. The algorithm terminates when no improving local search move exists.

IV. COMPUTATIONAL RESULTS
This section presents a comprehensive performance evaluation for the proposed method using five well-known realworld networks listed in Table I. Ground truth (i.e., the optimal community structures) is available and known for each of these networks, and therefore, one can facilely measure the quality of a community detection algorithm by estimating the similarities between the communities obtained by the algorithm and the ground truth. For doing this, we use the well-known performance metrics Adjusted Rand Index and Normalized Mutual Information, explained in the following subsection.

A. Performance metrics
Suppose that for a given graph G, C(A) = {C 1 , . . . , C k } and C 2 = {C 2 1 , . . . , C 2 k 2 } be respectively a set of communities obtained by an algorithm A and the ground truth.
Although NMI [44] is a well-known clustering comparison metric, it can perfectly evaluate the similarity between the optimal communities and those discovered by an algorithm. The NMI value corresponding to the algorithm A can be written as In the case where the detected communities are identical to the ground truth, the NMI takes its maximum value one, while in the case where the two sets totally disagree, the NMI score is zero. Generally, the more the NMI, the better community structures have been found. Adjusted rand index (ARI) [45], associated with algorithm A, measures the similarity between C(A) and C 2 as follows where a, b, c and d are respectively the number of vertex pairs that are in the same community in both C(A) and C 2 , in the same community in C(A) but not in C 2 , in the same community in C 2 but not in C(A), and in different communities in both C(A) and C 2 . Like the NMI measure, the value of ARI varies between 0 and 1, and the higher its value is, the more similarity is between the communities obtained by algorithm A and the grand truth of G.

B. Experiments
In what follows, we provide a process for comparing the performance of our proposed algorithm against five powerful rival algorithms: Node2Vec [28], Neural-Brane [19], DeepWalk [27], Line [46], and Text-Associated DeepWalk (TADW) [30]. These are all state-of-the-arts for integrating both network topology and nodal attributes for graph representation learning.
All algorithms are implemented with C++, and CPLEX optimizer 12.9 is used for solving linear programming. Fig. 4 provides a comprehensive comparison by evaluating communities in terms of the average ARI and NMI ranks over all datasets. It is apparent that the proposed method significantly outperforms other algorithms in the sense that the communities it discovered are much more similar to the ground truths than those obtained by the other methods.
Although the results obtained in Fig. 4 perfectly illustrate the proposed method's superiority and reliability against some other state-of-the-art algorithms, it could still be worth investigating the role of the applied GNN feed-forwards message passing procedure in improving the communities. For doing this, we compute the value of NMI associated with each of the networks' obtained final communities in terms of different hidden layer numbers k. In this regard, k = 0 refers to the case when the embedding layer is not applied, and only Modularity is employed to identify communities with respect to the graph's topological structure. Fig. 5 shows the results.  Table I, with respect to different values of k.
It is apparent from the results that, first of all, maximizing the Modularity alone (i.e., only relying on topological aspects and ignoring the nodal attributes) cannot lead to promising results. Furthermore, considerably notable is the approving effect of increasing the number of hidden layers. The more the value of k, the more accurate the embedding vector representation becomes due to the more information diffused through the neighboring vertices. However, interestingly enough, from a certain point on, increasing k does not influence the quality of communities, and this is primarily due to the fact that after a certain number of updates, each of the embedding vectors starts to converge.

V. CONCLUSION
In this work, we built a community discovery method on the basis of the mathematical programming formalism and the graph neural network feed-forward message passing manner. We managed to propose a systematic way to generate an authentic relation matrix for the Max-Min Modularity problem centered on an efficient node embedding technique, which enabled us to model the standard integer formulation for the Max-Min Modularity Maximization problem. A successful row/column generation technique and a local search-based rounding algorithm facilitated us in solving the model accurately and capturing the communities of a given attributed network. Furthermore, the proposed computational experiments showed that our results highly resemble the optimal solutions and that our algorithm outperforms the previous well-known algorithms.