A new algorithm for the determinisation of visibly pushdown automata

Visibly pushdown automata are pushdown automata whose pushdown operations are determined by the input symbol, where the input alphabet is partitioned into three parts for push, pop and local pushdown operations. It is well known that nondeterministic visibly pushdown automata can be determinised. In this paper a new algorithm for the determinisation of nondeterministic visibly pushdown automata is presented. The algorithm improves the existing methods and can result in significantly smaller deterministic pushdown automata. This is achieved in a way that only necessary and accessible states and pushdown symbols are computed and constructed during the determinisation.


I. INTRODUCTION
P USHDOWN automata, which accept context-free formal languages, are one of the fundamental models of computation of the Theory of formal languages and automata [8].Every nondeterministic finite automaton, which accepts a regular language, can be determinised and the theory of the determinisation of finite automata is simple and well-researched: states of an equivalent deterministic finite automaton represent so-called deterministic subsets of states of a given nondeterministic finite automaton [8].The general determinisability does not hold for the case of all types of nondeterministic pushdown automata.The class of deterministic context-free languages is a proper subclass of context-free languages, ie for some nondeterministic pushdown automata their equivalent deterministic versions do not exist.Generally, it is not known how to decide for a given nondeterministic pushdown automaton whether there exists a deterministic equivalent or not.There is a lack of results in the theory of the determinisation of nondeterministic pushdown automata, although such results would be usable, eg when constructing practical deterministic algorithms from nondeterministic pushdown automata.
Visibly pushdown automata [3] are an important and well motivated subclass of pushdown automata, where pushdown operations are determined by the input symbol: the input alphabet is partitioned into three parts A c , A r and A l for push, pop and local pushdown operations, respectively.This relates This research has been supported the Czech Science Foundation as project No. 13-03253S.
to function calls, for example.A function call is represented by push operation, local operations executed in the context of the called function are represented by local transitions, and, finally, the return from the function is represented by pop operation.The push, pop and local operations are sometimes referred as call, return and internal operations.Visibly pushdown automata are widely used, researched and known to be used in many practical applications, such as XML processing for example [1], [2], [5], [6], [7], [11].
It is well known that nondeterministic visibly pushdown automata can be determinised [3], [12].These determinisations use principles that are also used in the well-known determinisation of finite automata: states of the equivalent deterministic automata are represented by so-called deterministic subsets [8].Alur and Madhusudan [3] presented the proof of the determinisability of a given nondeterministic visibly pushdown automaton with n states by creating a cartesian product consisting of all possible states and then creating deterministic subsets, which resulted in 2 n 2 +n states of the deterministic version of the pushdown automaton.This was improved in [12], where the upper bound for the number of states was lowered from 2 n 2 +n to 2 n 2 and the upper bound for the number of pushdown store symbols was lowered from In this paper a new algorithm of the determinisation of nondeterministic visibly pushdown automata is presented.The algorithm improves the existing methods and can result in significantly smaller deterministic pushdown automata in many practical examples.Only necessary and accessible states and pushdown symbols of the deterministic pushdown automaton are computed and constructed during the determinisation, which is done by analysing which states are used in transitions on the same level of the nesting of pushdown operations and which pushdown store symbols can appear at the top of the pushdown store for each state.
The paper is organized as follows.Section 2 defines basic notions.Section 3 contains information on related works.Section 4 presents the new incremental algorithm for visibly pushdown automata determinisation.An example of the use of the presented algorithm is presented in Section 5. Finally, the conclusion of the paper is in Section 6. II.BASIC NOTIONS Basic notions are defined as in standard texts, such as [8].

A. Alphabet, string
An alphabet A is a finite nonempty set of symbols.
A string s is a sequence of n symbols a 1 a 2 a 3 . . .a n from a given alphabet, where n is the length of the string.A sequence of zero symbols is called empty string.Empty string is denoted by symbol ε and its length is 0.

B. Language
A * denotes the set of all strings over an alphabet A including the empty string.Set A + is defined as A + = A * \ {ε}.A language L over an alphabet A is a set L ∈ A * .Similarly, for string x ∈ A * , symbol x m , m ≥ 0, denotes the mfold concatenation of x with x 0 = ε.Set x * is defined as

C. Pushdown automata
A nondeterministic pushdown automaton (nondeterministic PDA) is a seven-tuple M = (Q, A, G, δ, q 0 , Z 0 , F ), where Q is a finite set of states, A is an input alphabet, G is a pushdown store alphabet, δ is a mapping from Q × (A ∪ {ε}) × G * into a set of finite subsets of Q × G * , q 0 ∈ Q is an initial state, Z 0 ∈ G is the initial pushdown store symbol, and F ⊆ Q is the set of final (accepting) states.
Triplet (q, w, x) ∈ Q×A * ×G * denotes the configuration of a pushdown automaton.Top of the pushdown store x is written on its left hand side.The initial configuration of a pushdown automaton is a triple (q 0 , w, Z 0 ) for the input string w ∈ A * .
The relation ) is a transition of a pushdown automaton M .It holds that (q, aw, αz) ⊢ M (p, w, βz) if (p, β) ∈ δ(q, a, α), where z, α, β ∈ G * .The k-th power, transitive closure, and transitive and reflexive closure of the relation A pushdown automaton M is a deterministic pushdown automaton (deterministic PDA), if it holds:
The intuition behind the partition is: A c is the finite set of call (push) symbols, A r is the finite set of return (pop) symbols, and A l is the finite set of local symbols.
A visibly pushdown automaton (VPA) is a finite pushdown store alphabet, a special symbol ⊥∈ G represents the bottom-of-pushdownstore, which can be popped from the pushdown store unlimited number of times, δ = δ c ∪ δ r ∪ δ i is the transition mapping, where ⊥∈ G is initial pushdown store symbol, and F ⊆ Q is a set of final (accepting) states.

III. RELATED WORKS
Visibly pushdown automata were introduced in [3].Moreover, it was shown that any nondeterministic visibly pushdown automaton can be transformed into an equivalent deterministic one.The determinisation principle is similar to the the determinisation principle of finite automata [8].
In [3], states of the resulting deterministic visibly pushdown automaton consist of two components (S, R).Component R ∈ P(Q) is an element of powerset of the states of the original automaton.Component S = P(Q × Q) is a powerset of pairs of states of the original nondeterministic pushdown automaton that keeps tracking beginning states on path from push transitions to all states listed in R component.We note that, given the union of states in second parts of pairs in S component is equal to R component, the R component can be omitted but for keeping the automata hierarchy simple we maintain this R component in the following definition as a connection to finite automata [12], where the states of the determinized automata correspond to the R component. Let Pop: For every r ∈ A r , • if the pushdown store is empty : where The equivalent deterministic automaton has at most 2 n 2 +n states and at most |A c |2 n 2 +n pushdown store symbols.The size of the transition relation can be at most In 2009 an improved upper bound of the number of states has been found by Nguyen Van Tang [12].In that paper 916 PROCEEDINGS OF THE FEDCSIS.Ł ÓD Ź, 2015 two optimizations for Alur-Madhusudan's determinisation of visibly pushdown automata were introduced.First, the set of summaries S component of a state pair for some special cases concerning initial states was minimized.Second, R component of the state pair was removed.By removing R component of determinised visibly pushdown automaton the upper bound for the number of states was lowered from 2 n 2 +n to 2 n 2 and there were at most |A c |2 n 2 pushdown store symbols.The optimization is based on the observation that information stored in R component of a state pair is already contained in S component of the state pair [12].However, that determinisation algorithm is still not practical.As pointed by Nguyen Van Tang in its implementation of visibly pushdown automata determinisation library, named VPAlib, the determinisation was performed in an exhaustive way.Therefore, their determinisation easily gets stuck with visibly pushdown automata of small size [12].

IV. DETERMINISATION ALGORITHM
This section presents our new algorithm for the determinisation of nondeterministic visibly pushdown automata.The algorithm improves the original determinisation algorithm [3], [12].As stated in the Introduction our algorithm computes and constructs only necessary and accessible states and pushdown symbols of the deterministic pushdown automaton.The basic idea of this improvement is analysing and tracking pushdown symbols that can appear on the top of the pushdown store for a particular state.With this information the explored pop transitions for the state can be reduced to only those that correspond to the possible pushdown store top symbols.Also, it is shown below that this information on the possible pushdown store top symbols for a state has certain interesting properties, which can be exploited for an effective way of calculation and storing this information for all states in the automaton.
We will use T q to denote pushdown store top symbols.The pushdown store top symbols of state q, T q ⊆ G ′ are the set of all pushdown store symbols that could appear at the top of the pushdown store for a state q ∈ Q ′ .
We will also use symbol λ for a local connection: State q ′ is locally connected to q ′′ if there is a sequence of transitions from state q ′′ to state q ′ , the pushdown store depth in both states is the same and the pushdown store depth for all other states along the sequence of transitions is greater than the pushdown store depth in q ′′ .This relation between two states is not symmetric but is transitive.See Figure 1 shows an example of various local connections.Note that for example the path 1 → 7 → 8 is not a local connection.The notation a|α → β denotes a transition that reads symbol a ∈ A and replaces α ∈ G ′ * with β ∈ G ′ * on the top of the pushdown store.This notation will be used in figures throughout the paper.
It can be easily seen that the pushdown store top symbols are shared between locally connected states.
With local connection from q ′ to q and local connection from q to q ′′ , the q ′ is locally connected to q ′′ by transitive closure.If T q is the set of pushdown store top symbols of state q, then T q ′ ⊆ T q and T q ⊆ T q ′′ and also T q ′ ⊆ T q ′′ .The local closure of state q is λ * (q).See Figure 2 as an example of a local closure.See Figure 1 and note that the path 1 → 7 → 8 mentioned before is in fact a local closure.
Closing all states under the λ * (q) connects all T q .See Figure 3.
We define these notions formally: Definition 1: A local connection λ(q) of state q.Given a deterministic visibly pushdown automaton Definition 2: A local connection closure λ * (q) of state q.The local connection closure λ * is defined by these equalities: Definition 3: A set of pushdown store top symbols T q of state q.Given a deterministic visibly pushdown automaton Due to convenient properties of λ * (q), T q can be stored in a space optimal way.Given q, q ′ ∈ Q ′ , then ∀q ′ ∈ λ * (q) holds that T q ′ ⊆ T q , ie parts of T q can be shared between locally connected and locally closed states.See Figure 4 as an example of this process.
Further, we show by induction on the length of an input sentence that pushdown store top symbols are given by the λ * (q) and states that are source and target of the appropriate push and pop transition.
Notice that γ does not change between states q 1 and r 1 .Pushdown store top symbol (q 2 , s) was pushed in state q 2 and popped in state q 3 so γ does not change between states q 2 and r 3 either.Pushdown store top symbols are given by the λ * (q) and states that are source and target of appropriate push and pop transition for first i + 1 symbols of input word.The induction holds for i + 1.
More informally: The deterministic automaton is constructed from the initial state.The deterministic subset of the Fig. 1.Various λ relations of state 8 to state 1.  initial state is created from all initial states of the original automaton.Initial pushdown store symbol ⊥∈ T q ′ 0 forms pushdown store top symbols set.
In every iteration, all local and push transitions are explored for the known states.Base set of possible pushdown store top symbols T q of the pushdown store for given state q ∈ Q ′ is given by push transitions.Then, we track pushdown store top symbols for each state.
Any two states q, q ′ , where a local transition exists from state q ′ to state q, share part of T q ′ in form of that everything from T q ′ is in T q .
Any two states q, q ′ , where a pop transition popping symbol (q ′ , r) exists from state x to state q, share part of T q ′ in form of that everything from T q ′ is in T q .Given q, q ′ , q ′′ ∈ Q ′ , l ∈ A l , c ∈ A c , r ∈ A r , then for T the following properties hold: ∀(q, ε) ∈ δ(q ′′ , r, (q ′ , c)) ⇒T q ′ ⊆ T q , (10) The algorithm of the determinisation is formally described by Algorithm 1.Given q, q ′′ ∈ Q ′ , its main part can be written in an abstract way as follows: 1) create the initial state, 2) while ∄(q ′ , ε) ∈ δ(q, r, γ), where r ∈ A r and γ ∈ T q , do create pop transition (q, r, γ), while ∄(q ′ , ε) ∈ δ(q, l, ε), where l ∈ A l , do create local transition (q, l, ε), while ∄(q ′ , γ) ∈ δ(q, c, ε), where c ∈ A c , do create push transition (q, c, ε), 3) set final states.Only a pair of states (q ′ , q), where q ′ ∈ λ * (q), can appear as an element in S component.As it is described in II, the new pairs of S component are only created from push and local transitions.The push transitions only yield elements of S component based on identity.On the other hand, the local transitions yield exactly the pairs that conform to local closure, because they connect appropriate targets of push and sources of pop operations.
Let L be the set of all distinct pairs of locally closed states of the nondeterministic automaton.Then, 2 |L| is the maximum number of states of the deterministic automaton.The |L| is at most n 2 when all states in the nondeterministic automaton are locally closed.

V. EXAMPLE
In this section the determinisation of a simple nondeterministic visibly pushdown automaton is demonstrated.For the sake of clarity, the component R of states and pushdown store symbols is omitted, because the information it holds is already contained in the component S (it is the second value of each pair), as it was described in [12].
Example 5.1: The nonderterministic visibly pushdown automaton a 1 is shown in Figure 6. Let F ) be a nondeterministic visibly pushdown automaton.An equivalent deterministic visibly pushdown automaton ) can be constructed as follows.The initial state is constructed as powerset of all identity pairs of initial states of automaton a 1 , so q ′ 0 = {(0, 0)}.The push and the local transitions can be easily deduced from determinisation rules from Section IV.All transitions are shown in Figure 5.
In each push transition, the pushdown store top symbol is tracked for target state of the push transition.When a local transition occurs, all pushdown store top symbols are shared from a source state of the local transition to a target state of the local transition and all other locally connected states.The tracking of all locally connected states could be achieved by creating virtual transitions serving as a transitive closure.
Then, the pop transitions are created based on known input symbols (from the nondeterministic automaton) and tracked pushdown store top symbols.The transitions are created according to the determinisation rules from Section IV.
For an illustration of the determinisation see Figure 5.The pushdown store top symbols are tracked as follows.Push, local and pop transitions are marked green, gray and red, respectively.Arrows describe movements of tracked tops of the pushdown store.Dashed arrows represents source of top pushdown store symbols that are shared with target state when local transition occur.
The resultant deterministic PDA d 1 is shown in Figure 7.
The resultant deterministic PDA can be also reproduced by running Algorithm 1.
Given the nondeterministic pushdown automaton from the example above, the VPAlib algorithm [9] constructs a deterministic pushdown automaton with 45 states and 1206 transitions.We note that 45 (states) is not a power of 2, which is caused by the fact that the implementation of VPAlib library does not consider states in which components R or S are empty sets (in this way, it performs another optimization of the determinisation algorithm [12]).For comparison, our algorithm constructs an equivalent deterministic pushdown automaton with only 3 states and 8 transitions.This is a significant improvement over the previously existing determinisation algorithms.

VI. CONCLUSION
A new incremental algorithm of the determinisation of nondeterministic visibly pushdown automata has been described.The algorithm creates only necessary states and pushdown symbols by analysing and tracking which states are achievable by computing transitions on the same levels of pushdown operations nesting.Possible tops of the pushdown store are stored for each state when a pop transition is in progress and then they are shared through local transitions with states on the same levels of the nesting.The behavior of the algorithm