Exploratory equivalence in graphs: Definition and algorithms

Motivated by improving the efficiency of pattern matching on graphs, we define a new kind of equivalence on graph vertices. Since it can be used in various graph algorithms that explore graphs, we call it exploratory equivalence. The equivalence is based on graph automorphisms. Because many similar equivalences exist (some also based on automorphisms), we argue that this one is novel. For each graph, there are many possible exploratory equivalences, but for improving the efficiency of the exploration, some are better than others. To this end, we define a goal function that models the reduction of the search space in such algorithms. We describe two greedy algorithms for the underlying optimization problem. One is based directly on the definition using a straightforward greedy criterion, whereas the second one uses several practical speedups and a different greedy criterion. Finally, we demonstrate the huge impact of exploratory equivalence on a real application, i.e., graph grammar parsing.


I. INTRODUCTION
G RAPHS are an ubiquitous format for structural-data representation and are gaining popularity in various scientific disciplines.They are used to represent diverse types of entities and relations between them in various areas, ranging from chemistry [1], [2], economy [3], politics [4], to popular culture [5].Such representation enables a more general and global view on the data.Additionally, researchers may benefit from powerful theoretical tools developed in graph theory to extract new insights.
One of the most general problems on various graphs is search for patterns, i.e., finding occurrences of small graphs in larger graphs.In theory, this is known as the subgraph isomorphism problem and has been thoroughly studied, as this is one of the fundamental problems in theoretical computer science.The decision version of this problem is NP-complete, and the counting version of the problem is #P -complete.Furthermore, no exponential-time algorithm with a lower bound better than the naive enumeration of pattern is known [6].This makes the problem intrinsically hard.Despite these pessimistic results, various algorithms exist for finding patterns, a vast majority of them based on the branch-and-bound method (e.g., [7], [8]).In many practical instances, however, these algorithms perform much better than the expected worst-case scenario and are able to solve relatively large instances (e.g., patterns of 1000 vertices in graphs of 10,000 vertices, and even larger).
Despite the practical usability of the current algorithms, there is a large set of problem instances that are often very hard for all the search algorithms.These are graphs with a lot of symmetries, i.e., graphs with many automorphisms.Detecting these symmetries before the start of the search can speed up the algorithm by very large constants, since the search does not have to be repeated for the symmetrical vertices.The goal of this paper is to formally define an equivalence on graph vertices, called exploratory equivalence, that captures such symmetries in graphs and can be easily utilized in algorithms for finding patterns (e.g., subgraph isomorphism) in graphs.Since there can be many exploratory equivalences in a graph (and some capture more symmetries than others), we also define the corresponding optimization problem.Our work is based on the ideas already developed by Fürst et al. [9] for the purpose of improving the Rekers-Schürr parser [10] for context-sensitive graph grammars.However, while Fürst et al. recognized the concept of exploratory equivalence (under the name 'interchangeability'), they did not treat it in a general graph-theoretic and group-theoretic manner.Besides that, they did not consider the possibility of having multiple exploratory equivalences for a single graph, nor did they define the notion of optimal exploratory equivalence.In this paper, we address all of these issues.
Informally, if a group of k vertices in an unlabeled graph belong to the same exploratory equivalence class, then they are interchangeable in the following sense: if each of them were labeled with a unique label, their labels could be arbitrarily interchanged with each other without affecting the graph.The graph would remain isomorphic after any of the k! possible interchanges.It is important to note that a single graph may have multiple exploratory equivalences, i.e., multiple ways of partitioning the graph vertex set into a set of exploratory equivalent classes.Among all possible exploratory equivalences for a given graph, the algorithms proposed in this paper seek the one that captures the largest number of symmetries.As we show later, this is the equivalence with the largest product of the factorials of the cardinalities of its equivalence classes.
Graph grammars [11] are production-based graph rewrite systems and are regarded as a generalization of well-known string-based formal grammars.The Rekers-Schürr parser is an algorithm that, for a given graph and a context-sensitive graph grammar, determines whether the graph belongs to the language generated by the grammar and returns a derivation of the graph in the grammar if this is the case.However, the algorithm may exhibit a heavily exponential behavior when presented with a grammar containing many symmetries.In particular, given a simple grammar for chemical formulas of linear alkanes, the algorithm failed to parse the structural formula of propane within several hours.By exploiting the symmetries in the grammar, the parser's performance is brought down to polynomial for several meaningful classes of grammars [9].For instance, the parsing of propane now takes less than a second.In general, however, the worst-case performance remains exponential, since the graph grammar parsing problem is NP-hard even for highly restricted graph grammar formalisms [12].Symmetry reduction techniques are not unique to graphrelated decision and optimization problems.Liberti [13], for instance, proposed a novel approach to symmetry reduction in branch-and-bound-based MIP (mixed integer programming) solvers.His approach was applied to the discretizable molecular distance problem in the field of organic chemistry [14].
The paper is structured as follows.In the next section, we briefly present definitions and notions used in the rest of the paper.The third section includes the definition of exploratory equivalence, the optimization problem of finding the best exploratory equivalence in a given graph, and an example demonstrating the introduced concepts.We also present the argument that exploratory equivalence does not belong to the class of well-known regular equivalences.The fourth section presents two heuristic algorithms for solving the optimization problem.In Section V, we briefly describe the relevant portion of the Rekers-Schürr parser, its improvement with regard to exploratory equivalence, and some experimental results.Finally, Section VI concludes the paper and gives some ideas for the future work.

II. PRELIMINARIES
Given a (finite) set S, a family {P 1 , P 2 , . . ., P s } of nonempty subsets of S is a partition of S if every element in S is exactly in one of the subsets, i.e., P i ⊆ S and P i = ∅, where 1 ≤ i ≤ s, 1≤i≤s P i = S, and P i ∩ P j = ∅ for all 1 ≤ i, j ≤ s where i = j.When the partition {P 1 , P 2 , . . ., P s } is given explicitly, we usually use {i ∈ P 1 | i ∈ P 2 | . . .| i ∈ P s } as a short form, e.g., {{1, 2}, {3}, {4}} is shortened to In what follows, the order of the sets in a partition is often important.In such cases, we use the form A group Γ = (A, •) with the underlying set A and the binary operation • on the elements of A is an algebraic structure satisfying the following conditions: closure, i.e., x • y ∈ A, associativity, i.e., (x • y) A permutation σ is a bijective function of a finite set S onto itself, i.e., σ : S → S. Let Π[S] denote the set of all permutations of the elements in the set S. Notice that the set Π[S] together with the operation of function composition forms a group, which is called the symmetric group.Since all the groups discussed in this paper are subgroups of a symmetric group, we write as a group its underlying set only.Additionally, we also define The set of all permutations for which i is a fixed point is a subgroup and is called the stabilizer subgroup, i.e., Notice that all stabilizer subgroups include the identity permutation.Now let us generalize the definition of a stabilizer from an element to a set.Given P ⊆ S, a stabilizer on P is a set of permutations which have a fixed point for all the positions in P : Stab Equivalently, Stab Γ (P ) can also be defined in terms of intersections of Stab Γ (i), where i ∈ P , i.e., From the latter definition it is clear that Stab Γ (P ) also satisfies all four group conditions.We thus have the following theorem.
Theorem 1: Given a set S, a set P ⊆ S, and a subgroup Γ of the group Π[S], Stab Γ (P ) is a subgroup of Γ.
The set of all images of i ∈ S under permutations of the group Γ is called the group orbit of i, i.e., Let G = (V, E) denote a simple undirected graph, where V = {1, 2, . . ., n} is a set of vertices and E ⊆ V ×V is a set of edges.When two graphs are considered, the second is usually denoted with H = (U, F ).To denote an edge (i, j) ∈ E, we usually use a shorter version ij ∈ E. A neighborhood of a vertex i ∈ V , i.e., a set of vertices adjacent to i, is denoted with N (i).More formally, A coloration C of a graph G is an assignment of colors to the vertices V of G, i.e., a surjective function C from V onto {1, 2, . . ., c} for some c, where colors are denoted with integers from 1 to c. Any coloration defines a partition of the vertices V , and vice versa.If S ⊆ V , then the spectrum of S, denoted C(S), is a set of all colors assigned to the vertices of S. If S = {i} is a singleton, then C(i) = C(S) denotes the color assigned to the vertex i ∈ V .A coloration C induces a graph partition {C −1 (1), C −1 (2), . . ., C −1 (c)}, and vice versa.A coloration C 1 is finer or equal than a coloration This implies that each set of the C 1 -induced partition is a subset of (or equal to) some set of the C 2 -induced partition.
and is called the automorphism group of a graph G. Constructing Aut(G) is at least as difficult as solving the graph isomorphism problem, since graphs G and H are isomorphic if and only if the disconnected graph formed by the disjoint union of G and H has an automorphism that swaps the two components.Several practical algorithms are known for finding Aut(G); the most well-known is probably NAUTY [15].

III. PROBLEM DESCRIPTION
As already mentioned in the introduction, our goal is to find equivalent (also called indistinguishable) vertices of a graph.There are many types of equivalences already discussed in the literature.We give several examples later in this section.Our definition of equivalence is associated with the algorithmic exploration of a graph; for example, when the task is to find a pattern graph that is a subgraph in another target graph.In particular, branch-and-bound search algorithms could exploit such equivalences by reducing the number of (partial) matches established between a set of equivalent vertices in the pattern graph and a corresponding set of vertices in the target graph.In the remainder of this section, we formally describe our type of equivalence and the problem of finding the corresponding equivalence classes.Additionally, we also discuss several other similar equivalences and argue that our type is novel.
First, let us define a few additional notions.Let S be a set, and let P ⊆ S be a set of positions.We say that a permutation σ 1 ∈ Π[P ] is covered by a permutation σ 2 ∈ Π[S] if the two permutations have the same image on the positions P , i.e., Observe that P is equal to the domain of σ 1 .

Now let A ⊆ Π[S]
. We say that a set A of permutations covers a set P of positions if every permutation of P is covered by a permutation in A. More formally, cover(A, P ) ≡ ∀σ ∈ Π[P ] ∃a ∈ A : σ a.
Given a graph G = (V, E), we say that a partition {P 1 , P 2 , . . ., P s } of V is exploratory equivalent if for all 1 ≤ i ≤ s the following two conditions hold: where A 0 = Aut(G).The sets P 1 , P 2 , . . ., P s are the equivalence classes.Notice that the order of classes regarding the partition {P 1 , P 2 , . . ., P s } is irrelevant, but it is important when checking the conditions (1), since not all orders of P 1 , P 2 , . . ., P s satisfy them.In this sense the exploratory equivalence is an algorithmic concept.In particular, an algorithm processing a vertex u ∈ P i may ignore all other vertices in P i , since the automorphisms A i−1 cover all permutations of P i .However, it is important to observe that equivalence classes are not independent.For example, when a vertex u ∈ P i is processed, this may influence the rest of the algorithm.Therefore, when determining the next class P i+1 , one must exclude the automorphisms corresponding to the already processed classes P 1 , P 2 , . . ., P i , which is the same as restricting to the automorphisms where the positions P 1 ∪ P 2 ∪ • • • ∪ P i are fixed points.That is the reason why in each step the automorphism group is restricted from Corollary 1: Given a graph and its partition {P 1 , P 2 , . . ., Now we are ready to define the problem.The input of the problem is a graph G = (V, E) and its automorphism group Aut(G), and the goal of the problem is to find an exploratory equivalent partition {P 1 , P 2 , . . ., P s } of V that maximizes the product The reason for using the product of factorials in the objective function is that each class P i covers |P i |! automorphic graphs, and the total number of automorphic graphs covered is thus the product above.In the following sections, we denote the problem with MAXEXPLOREQ.
In the paper [16], a large class of the so-called regular equivalences (called colorations therein) is surveyed.A coloration of a graph is regular when the equality of the spectra of two vertices implies the equality of the spectra of the corresponding neighborhoods.More formally, a coloration C of graph G is regular if and only if for all i, j ∈ V Many different types of colorations are regular, e.g., strong and weak structural coloration, orbit coloration, perfect coloration, and exact coloration.See [16] for details.For example, coloring each orbit of Aut(G) gives orbit coloration.However, as it turns out, exploratory equivalence is not regular.To demonstrate this, consider again the graph from Fig. 1 and its exploratory equivalent partition {1, 2 | 3, 4 | 5, 6}, where the color of each class is different.It is easy to see that it is not regular, since

IV. ALGORITHM DESCRIPTION
In this section, we will describe two greedy algorithms for the MAXEXPLOREQ problem.The first algorithm is based on restricting the set of automorphisms to the stabilizer of the equivalent vertices found in one iteration.The second algorithm is more time-efficient owing to a faster detection of equivalent sets.

A. Greedy algorithm based on stabilizer restrictions
The first algorithm for the optimization problem MAX-EXPLOREQ is based on the definition and will represent a reference algorithm that can be further improved.The idea of the algorithm is to start with the initial automorphism group, find one equivalence class of the partition, reduce the set of automorphisms only to the stabilizer of A, and recursively find new equivalence classes until the entire set of vertices is contained in the equivalence.
The input to this problem is the set of automorphisms (permutations) A and a set V ′ ⊆ V of vertices not yet included in any equivalence class; initially V ′ is the entire set V .
If the set of automorphisms contains only the identity, then each vertex in V ′ represents a different equivalence class (i.e., no new indistinguishable vertices exist in the graph).If there is more than one automorphism in A, then at least two vertices are indistinguishable.At this point, the goal of the algorithm is to find a subset S ⊆ V ′ that is covered by A. Usually, however, there are many possibilities for S, and different choices can lead to very different final solutions.The greedy criterion for this choice is the size of S, i.e., among many possibilities, the largest set S is chosen.When there are more sets with the start identity Fig. 3.The search space of Algorithm 1 for the graph in Fig. 1 same size, the algorithm chooses the one that has the largest stabilizer in A. The described algorithm is shown in more detail as Algorithm 1.
Algorithm 1 Greedy algorithm for MAXEXPLOREQ based on stabilizer restrictions. 1: for all P : P ⊆ V ′ ∧ cover(A, P ) do 6: bestP ← P 10: bestA ← A ′ 11: To make this algorithm a little more clear, we will show its trace on the simple example graph of Fig. 1.The initial set of all automorphisms A is already shown in equation (2).From this set, the algorithm finds the equivalence class {1, 2} and reduces A to the set Stab(A, {1, 2}), which is: In this automorphism group, it finds the equivalence class {3, 4} and reduces the automorphisms to the stabilizer: The final equivalence class from this group is {5, 6}, and the corresponding stabilizer contains only the identity.This yields the final result, namely the partition {1, 2 | 3, 4 | 5, 6}.If, at the moment when A ′ contained only the identity, the current partition did not include all six vertices of the graph, each of the missing vertices would be added as a singleton set to the equivalence.The entire search space for this example is shown in Fig. 3.Each vertex in this graph represents an automorphism group.The bottom vertex is the set of all automorphisms, and the top vertex is the set containing only the identity.Each edge represents a stabilization with the set that is written as the label of the edge.The bold vertices and edges are the ones that our algorithm follows.Now we will discuss the correctness of the described algorithm.Theorem 2: Algorithm 1 returns a partition of exploratory equivalent vertices.
Proof: Since the algorithm closely follows the definition, the proof is trivial.Each partition is covered by the automorphism group; the loop only iterates over the subsets that are covered.The second criterion from the definition is guaranteed by the recursion, since the set of automorphisms used in the recursion is only the stabilizer of the equivalence class found in the previous step.
Another question we need to address is the optimality of this algorithm.Unfortunately, the greedy criterion does not guarantee the optimality of the solution.We will demonstrate this by two examples shown in Fig. 4.These two examples were found by the exhaustive enumeration of all nonisomorphic connected graphs (starting with the smallest graph), and the graphs of Fig. 4 are the smallest examples where Algorithm 1 does not find an optimal solution.The optimal solution for the left graph in Fig. 4  Because of the exhaustive search over all subsets of V ′ , the described algorithm is not very practical for larger graphs.In the next subsection, we will describe a more efficient algorithm that utilizes an incremental procedure to find the possible equivalence classes.

B. Greedy algorithm based on positional restriction of automorphisms
For a more convenient presentation of our second greedy algorithm, let us define a few auxiliary terms.The positional restriction of an automorphism (permutation) a ∈ Π[S] to a set R ⊆ S (denoted ρ(a, R)) is a partial function a ′ : S → S with a ′ (i) = a(i) for all i ∈ R and a ′ (i) being undefined for all i ∈ S \ R. For example, ρ((3, 2, 1, 4), {2, 4}) = (↑, 2, ↑, 4).We use the one-line notation for representing automorphisms ((1, 2, 3, 4) ≡ 1234) and the symbol ↑ for indicating the undefined values.Therefore, a = (↑, 2, ↑, 4) represents the fact that both a(1) and a(3) are undefined, whereas a(2) = 2 and a(4) = 4.For a given set S and a given set of (positionally unrestricted or restricted) set of automorphisms A ⊆ Π[S], a permofix is a pair (P, F ) such that the following conditions hold: (1) P ⊆ S, (2) F ⊆ S, (3) P ∩ F = ∅, and (4) for each permutation σ ∈ Π[P ] there exists an automorphism a ∈ A such that a(i) = σ(i) for all i ∈ P and a(i) = i for all i ∈ F .In other words, a pair (P, F ) is a permofix if there exists a set of automorphisms A ′ ⊆ A that covers the set P (i.e., all permutations of P ) and simultaneously fixes all elements of F .Given a permofix (P, F ), the sets P and F will be called the perm-set and the fix-set, respectively.A kpermofix is a permofix
Given the set of automorphisms A ⊆ Π[n] of a n-vertex graph, the algorithm works as a greedy iterative process.In each iteration, it produces the set of all permofixes in the current set of automorphisms (in the first iteration, this is the unrestricted set A) and greedily selects a permofix with the largest potential.After making its selection, the algorithm positionally restricts all automorphisms to the fix-set of the selected permofix.The positionally restricted set of automorphisms serves as the input to the next iteration.The process stops once all automorphisms have become completely undefined functions.The output of the algorithm is a set composed of all perm-sets of the permofixes selected in individual iterations and of the singletons containing the individual vertices that are not present in any of the selected perm-sets.Later, we shall show that the algorithm's output is an exploratory equivalent partition of the vertex set.
The rationale for selecting a permofix with the highest potential is based on the following heuristics: Recall that the algorithm's goal is to find a partition {P 1 , . . ., P s } of {1, . . ., n} with a maximum value of |P 1 |! . . .|P s |!.A permofix (P, F ) is guaranteed to contribute at least a factor of |P |! to the target product |P 1 |! . . .|P s |! (since the perm-set of the selected permofix is part of the algorithm's output), but it can potentially contribute up to |P |! |F |!.The optimal scenario takes place when the entire fix-set F serves as a permset of some permofix selected later in the process.Therefore, a permofix (P, F ) having the largest value of |P |! |F |! may potentially contribute the largest factor to the target product.
The pseudocode of the greedy algorithm based on positional restrictions of the automorphism set is shown as Algorithm 2.
To show that the output produced by the algorithm conforms to our problem definition, we shall first prove the following lemma: Lemma 1: Each element of the set returned by the procedure GREEDY2 is a perm-set of the input set A of automorphisms.
Proof: The singletons are perm-sets by definition, so let us focus on the elements of the set P inside the procedure GREEDY2.In each iteration, the algorithm first applies the procedure FIND2PERMOFIXES to the current set of automorphisms A. This procedure returns a set of all pairs ({p, q}, {r 1 , . . ., r t }) such that there exists an automorphism a with a(p) = q, a(q) = p, and a(r 1 ) = r 1 , . . ., a(r t ) = r t .By the definition of automorphism group, the set A always contains the identity automorphism a id with the property a id (p) = p, a id (q) = q, and a id (r 1 ) = r 1 , . . ., a(r t ) = r t .The automorphisms a and a id jointly form a proof that the pair ({p, q}, {r 1 , . . ., r t }) is indeed a permofix.
The procedure EXTEND iteratively produces k-permofixes based on sets of (k − 1)-permofixes in the set of automorphisms A. For k = 3, the procedure creates a pair PF = ({p, q, r}, F 1 ∩ F 2 ∩ F 3 ) from the permofixes PF 1 = ({p, q}, {r} ∪ F 1 ), PF 2 = ({p, r}, {q} ∪ F 2 ), and PF 3 = ({q, r}, {p} ∪ F 3 ).Neglecting the sets F 1 , F 2 , and F 3 for the time being, the permofix PF 1 represents the permutation (p q)(r) in the cycle notation.Likewise, PF 2 and PF 3 represent the permutations (p r)(q) and (q r)(p), respectively.Since (A, •) is a group, the permutation (p q)(r) • (p r)(q) • (q r)(p) = (p q r) has to be completely present in A; in other words, A has to contain an automorphism for each of the 3! permutations of the set {p, q, r}.Therefore, {p, q, r} is a perm-set in A. The fix-set corresponding to this perm-set is (a superset of) the intersection of the fix-sets of PF 1 , PF 2 , and PF 3 .Consequently, PF is a permofix in A. This reasoning can be straightforwardly extended to the general case of k > 3.
Therefore, every pair created by the procedure EXTENDS is a permofix in the current set of automorphisms.
The procedure CLEANUP does not produce anything new; it merely reduces the number of permofixes.For a permofix (P, F ), all permofixes (P ′ , F ′ ) with (P ′ , F ′ ) ⊑ (P, F ) are heuristically pronounced redundant.If P ′ = P and F ′ ⊆ F , the permofix (P ′ , F ′ ) is clearly superfluous.If P ′ ⊂ P , then the permofix (P, F ) has been created from (P ′ , F ′ ) within the EXTEND procedure.
The positional restriction can only reduce the set of permofixes.It is easy to see that if a pair (P, F ) is a permofix in a positionally restricted set of automorphisms, then it is a permofix in the original set, too.
In summary, the set R consists of permofixes of the initial set of automorphisms A, and every element of the set returned from the procedure GREEDY2 is a perm-set of A.
In the following theorem, we show that the algorithm produces a solution to our problem, i.e., an exploratory equivalent partition of the vertex set.
Theorem 3: The procedure GREEDY2 returns an exploratory equivalent partition of the vertex set V .
Proof: Let {P 1 , . . ., P s , {i 1 }, . . ., {i r }} be the result of the algorithm GREEDY2, where P 1 , . . ., P s are the perm-sets produced in individual iterations, and {i 1 }, . . ., {i r } are the singletons created from the vertices that do not belong to the set P 1 ∪ . . .∪ P s .By construction, the elements of the output set are mutually disjoint sets that collectively cover the entire vertex set.The output set is thus a partition of the vertex set.
By definition, each of the produced perm-sets P 1 , . . ., P s is covered by the initial set of automorphisms A 0 ≡ A, i.e., we have cover(A 0 , P i ) for all i ∈ {1, . . ., s}.Let us now show that cover(Stab(A 0 , P s ), P s−1 ) also holds.The permset P s has to be a subset of the fix-set F s−1 ; otherwise, the algorithm would, at some earlier stage, have set a 1 (j) := ↑, . . ., a |A| (j) := ↑ for at least one j ∈ P s and hence could not produce P s .By the definition of permofix, there exists a set of automorphisms that fixes F s−1 and simultaneously covers P s−1 .Since P s ⊆ F s−1 , the same set of automorphisms also fixes P s .Consequently, the set of automorphisms where P s is fixed (i.e., Stab(A 0 , P s )) covers P s−1 .In the same manner, we can prove cover(Stab(Stab(A 0 , P s ), P s−1 ), P s−2 ), etc.Therefore, the perm-sets P s , P s−1 , P s−2 , . . ., P 1 , together with the singleton sets formed by the missing elements, constitute an exploratory equivalent partition of the vertex set V .
In practice, the algorithm GREEDY2 is more efficient than GREEDY1.For each combination P of the current set of vertices, the first greedy algorithm checks whether P is covered by the current set of automorphisms (in other words, whether P is a perm-set in the current set of automorphisms).By contrast, the algorithm GREEDY2 generates candidate perm-sets (and the associated fix-sets) in an incremental fashion: a perm-set with k elements is generated by merging k perm-sets with k − 1 elements.If no k-element perm-sets are generated, the algorithm will not attempt to generate any (k + 1)-element perm-sets.
Let us illustrate the algorithm GREEDY2 with two examples.Consider the graph of Fig. 5. Given the set of its automorphisms as input (enumerated in Eq. 3), the algorithm produces the following 2-permofixes (after executing the procedure CLEANUP): The procedure EXTEND produces two 3-permofixes: ({1, 3, 5}, ∅) and ({2, 4, 6}, ∅).The procedure CLEANUP subsequently removes all permofixes (P, F ) with |P | = |F | = 2.In the next step, the algorithm selects a permofix with the highest value of |P |! |F |!.This is either ({1, 3, 5}, ∅) or ({2, 4, 6}, ∅).In either case, the fix-set is empty, so the procedure RESTRICT sets all elements of all automorphisms to ↑.As a result, the algorithm immediately stops with the result {1, 3, 5 depending on its selection).Among all exploratory equivalent partitions, these two both have the highest product of the factorials of the cardinalities of their constituent sets and hence represent two optimal solutions to the MAXEXPLOREQ problem.
Interestingly, the graphs of Fig. 4 are not counterexamples for the second greedy algorithm, and the graph of Fig. 6 is not a counterexample for the first algorithm.In contrast to the algorithm GREEDY1, the algorithm GREEDY2 considers the combined sizes of individual perm-sets and fix-sets when making greedy selections.In the right graph of Fig. 4, for example, the algorithm GREEDY2 has to choose between the permofix ({2, 3}, {1, 4, 5, 6, 7}) (or an equivalent permofix with potential 2! 5!) and the permofix ({2, 4, 6}, ∅) (or an equivalent permofix with potential 3!).The first permofix is obviously preferable, leading to an optimal partition.Conversely, since the algorithm GREEDY1 considers perm-sets without the associated fix-sets, it prefers the perm-set {1, 2, 3, 4} over all 2-element perm-sets (regardless of the sizes of their associated fix-sets) when dealing with the graph of Fig. 6.

V. EXPLORATORY EQUIVALENCE AND THE IMPROVED REKERS-SCH ÜRR PARSER
As we mentioned in the introduction, the concept of exploratory equivalence was developed by Fürst et al. [9] for the purpose of improving the Rekers-Schürr graph grammar parser [10], although the authors did not provide a rigorous graph-theoretic and group-theoretic definition of exploratory equivalence and did not consider the possibility of multiple exploratory equivalent partitions for a single graph.In this section, we show how a proper consideration of exploratory equivalence may lead to immense performance gains when parsing graphs against graph grammars.The Rekers-Schürr graph grammar parser (both the original and the improved version) accepts a graph and a contextsensitive graph grammar on its input.A context-sensitive graph grammar (called just 'grammar' in the sequel) is a quadruple (N , T , P, A), where N is a set of nonterminal labels, T is a set of terminal labels, P is a set of productions, and A is a set of axioms.Each production p is a rule of the form Lhs might not be proper graphs, since they may contain dangling edges.A sample production, as well as the graphs and the graph element sets associated with it, is shown in Fig. 7.In contrast to the graph depictions shown so far, the inscriptions inside the vertices represent vertex labels rather than vertex indices.The indices are displayed next to individual vertices.The yellow-colored vertices belong to the graph Common[p] and hence to both the LHS and RHS simultaneously; this is also reflected in the fact that such vertices have the same index on both sides of the production.A derivation of a graph G in a graph grammar is a sequence of production applications beginning with an axiom graph and ending with the graph G.The language of a graph grammar GG is the set of all terminally labeled graphs that have a derivation in GG. (A graph is terminally labeled if all of its elements are labeled by labels from the set T .)A parser is an algorithm that, for a given graph G and a given graph grammar GG, determines whether G belongs to the language of GG and produces a derivation of G in GG if this is the case.Figure 8 shows a grammar for generating the structural formulas of linear alkanes.All graph labels belong to the set T , including the 'non-label' -a fictitious label for unlabeled edges.Figure 9 displays the derivation of the propane graph in that grammar.The derivation starts with the axiom (the methane graph) and passes through the ethane graph.The Rekers-Schürr parser works as a two-stage process.In the first stage, the input graph G is analyzed in order to obtain a partially ordered redundant set S of candidate production applications that might take part in a potential derivation of G.In the second stage, the parser tries to find, using backtracking if necessary, a sequence of production applications within the set S that constitutes a correct derivation of the graph G.The improvement by Fürst et al. pertains only to the first stage of the parsing process.
At the beginning of the first stage, the parser creates a graph G as a copy of the input graph G.After that, it iteratively searches the graph G for all r-occurrences of individual productions.For each discovered r-occurrence of a production p, the graph G is augmented by attaching fresh copies of the elements Xlhs[p] to the r-occurrence, giving rise to a production instance -a homomorphic image of the entire production p that defines a candidate application of p in a potential derivation of G.The augmentation of the graph G might result in new r-occurrences among the added elements.The discover-and-augment cycle finishes once all r-occurrences of all productions have been discovered.
To guarantee the discovery of all r-occurrences, each RHS has to be matched against the graph G in all possible ways.In other words, all RHS-to-G r-homomorphisms have to be established, including different r-homomorphisms between a production and each of its r-occurrences.However, exploratory equivalence can make some (or all) of the r-homomorphisms between a production and its r-occurrence redundant.Let us assume that a production p contains k distinct vertices v 1 , . . ., v k with the following properties: Then it can be shown [9] that the set of r-homomorphisms h : Rhs[p] → G established between the production p and the graph G can be safely restricted to those r-homomorphisms h for which index (h(v 1 )) < . . .< index (h(v k )), where index (v) is a unique index assigned to a vertex x ∈ G.This rule reduces the number of established p-homomorphisms between the production p and each of its occurrences by a factor of k!.Since each discovered r-homomorphism is followed by an augmentation of the graph G, immense performance gains can thus be attained.This rule can be straightforwardly extended to multiple non-singleton classes of an exploratory equivalent partition.
Consider the grammar of Fig. 8.The optimal exploratory equivalent partition for the axiom graph is {1 | 2, 3, 4, 5}.This implies that we can employ the rule h(2) < h(3) < h(4) < h(5) whenever searching for occurrences of the axiom graph in the graph G.For the RHS of the production p 1 , the optimal partition is {1 | 3 | 4, 5, 6}.Since the graph Union[p 1 ] also has an exploratory equivalent partition in which the vertices 4, 5, and 6 are part of the same equivalence class, we can enforce the rule h(4) < h(5) < h(6) for every r-homomorphism established between the RHS of the production p 1 and the graph G.Because of the interleaved discover-and-augment cycle, the enforcement of these rules may significantly reduce the parsing time.
For the task of parsing the graphs of methane, ethane, and propane against the grammar of Fig. 8, Table I compares the duration of parsing without considering exploratory equivalence (EE) and the duration of parsing when exporatory equivalence is taken into account in the form of imposing constraints on r-homomorphisms between the RHSs and the graph G.The experiments were conducted on a 3.40-GHz Intel Core i7 machine.
The difference between the two versions of the parser is striking.Without using the rules based on exploratory equivalence, the parser quickly succumbs to a combinatorial explosion as the size of the input graph increases; it took more than 11 hours to parse the graph of propane with 3 vertices C and 8 vertices H.By contrast, when exploratory equivalence is taken into account, the parser takes less than one second (0.989 seconds) even when parsing the graph C 30 H 62 (30 vertices C, 62 vertices H).Asymptotically, for a graph with n vertices C, the original parser creates Ω(6 n ) production instances (possibly much more than that), while the version that makes use of exploratory equivalence generates exactly 12n − 7 production instances.For many grammars containing symmetries in the sense of exploratory equivalence, the use of exploratory equivalence can reduce the asymptotical parsing time from exponential to polynomial (see [9] for additional examples).

VI. CONCLUSION
We introduced a novel type of graph equivalence, called exploratory equivalence because of its applicability to various graph search algorithms.Exploratory equivalence was defined as an automorphism-based equivalence relation on graph vertices.In contrast to our usual perceptions about equivalence, exploratory equivalence may induce several distinct vertex set partitions for a given graph.
In addition to defining exploratory equivalence itself, we have also introduced the concept of an optimal exploratory equivalent partition for a given graph.We presented two greedy algorithms for finding such a partition.Both algorithms produce optimal results for a vast majority of input graphs.For instance, considering all non-isomorphic graphs on 8 vertices, the second greedy algorithm produces an optimal partition for 11116 graphs out of 11117, the sole exception being the graph of Fig. 6.Among all non-isomorphic 9-vertex graphs, the algorithm produces suboptimal results for only 2 graphs out of 261080.
In subgraph search algorithms, exploratory equivalence can be employed to prevent or at least reduce multiple discoveries of individual occurrences of graph patterns in a given host graph.In the Rekers-Schürr graph grammar parser, this strategy may bring about immense performance gains, since each discovery of a graph in a host graph results in an augmentation of the same host graph.
A possible direction for the future work is a generalization of exploratory equivalence.As defined in this paper, exploratory equivalence can be regarded as a global relation between vertices.Informally, a pair of vertices may potentially belong to the same exploratory equivalence class only if the entire graph 'looks the same' from the viewpoint of both vertices.For this reason, exploratory equivalence is a fairly infrequent phenomenon for large random graphs, except for sets of leaf vertices attached to the same internal vertex.A natural generalization of 'global' exploratory equivalence is therefore a 'local' version of this concept, where only a limited neighborhood is inspected when determining the equivalence of a set of vertices.However, practical implications of such a definitions have yet to be discovered.
As shown in Section V, exploratory equivalence can be used to impose constraints on graph homomorphisms when searching for occurrences of a given pattern graph inside a given host graph.The purpose of such constraints is to eliminate multiple discoveries of the same occurrence.However, in some cases, the constraints induced by exploratory equivalence do not suffice to cover all automorphisms of the pattern graph.Consider, for example, the graph of Fig. 5.This graph has 12 automorphisms, but the optimal exploratory equivalent partition ({1, 3, 5 | 2 | 4 | 6}) only covers half of them.Consequently, the rule h(1) < h(3) < h(5) still allows for two different isomorphisms between a pair of 6-cycles.Besides the constraints induced by the optimal exploratory equivalence, we would need another constraint to cover the rotational symmetry of the graph.The relationship between exploratory equivalence (and other types of equivalence) and graph search constraints is thus another promising direction for the future work.

Fig. 1 .Fig. 2 .
Fig. 1.An example graph with several exploratory equivalences.Let us demonstrate the introduced concepts with an example.Consider the 6-vertex graph of Fig. 1.Its automorphism group consists of the following eight permutations (written in the one-line notation): 123456, 123465, 124356, 124365, (2) 215634, 215643, 216534, 216543.There are twelve exploratory equivalent partitions of the graph.They are given in the form of a Hasse diagram (using the refinement relation between two partitions) in Fig. 2. The

Fig. 4 .
Fig. 4. Two graphs on which Algorithm 1 returns a suboptimal solution.The left graph is the smallest counterexample in terms of the number of vertices, and the right one is the smallest counterexample in terms of the number of edges.

Fig. 6 .
Fig. 6.The smallest graph on which Algorithm 2 returns a suboptimal solution.

Fig. 7 .
Fig. 7.A sample production and the associated graphs and graph element sets.

Fig. 8 .
Fig.8.A grammar for generating the structural formulas of linear alkanes.

Fig. 9 .
Fig. 9. Derivation of the propane graph in the grammar of Fig. 8.
v k constitute an equivalence class in at least one exploratory equivalent partition of the graph Rhs[p]; • v 1 , . . ., v k constitute an equivalence class in at least one explorationally equivalent partition of the graph Union[p].

TABLE I .
THE TIME REQUIRED TO PARSE THE INDIVIDUAL GRAPHS OF FIG. 9 AGAINST THE GRAMMAR OF FIG. 8.