Denotational Model and Implementation of Scalable Virtual Machine in CPDev

Denotational semantic model and its implementation in C/C++ are presented for a virtual machine executing programs written in the CPDev development environment according to IEC 61131 standard. Programs written in IEC ST language are compiled to control-oriented intermediate language designed specifically for the machine. Architecture of the machine and its operation are represented by formal semantic model which assigns abstract algebraic objects to denote machine behaviour. Execution of intermediate language instructions is described in details by denotational semantic equations followed strictly by C/C++ implementations to assure reliability of the machine.


I. INTRODUCTION
T HE concept of virtual machines as platforms for software execution had a significant impact on computer science for almost half a century [1], [2]. A virtual machine (VM) is understood as a kind of processor with a certain instruction set and data types, which is implemented by software on particular hardware platforms. A VM processes an intermediate code generated by a compiler from a source program. The concept of VMs has been gaining importance due to the widespread use of the Java [3] and the .NET [4], [5]. Solutions based on VMs have some important advantages, namely a) source program and intermediate code are independent of target platforms, b) one compiler is sufficient, c) programs are executed in safe environments. The disadvantages include slower execution of the intermediate code and the need to develop a runtime environment suitable for the target platform. This paper deals with the development of a runtime environment for control programs written according to the standard IEC 61131 [6]. The IEC standard defines the programming languages: Structured Text (ST), Instruction List (IL), Ladder Diagram (LD), Function Block Diagram (FBD) and Sequential Function Chart (SFC). Here, employing the VM concept appears to be particularly justified in order to cope with the large variety of target platforms. The CPDev engineering environment [7] uses this concept to program controllers according to the IEC standard. It consists of a compiler translating ST to intermediate code and a VM-based runtime system written in C. Initially, small and medium-scale controllers were considered [8]. Recently, however, motivated by applications with extensive calculations, arose the need to extend the CPDev compiler and its VM. Therefore, some additional assumptions were imposed, namely: • to develop a semantic model of the machine and its intermediate language followed by a C implementation, • to achieve scalability of the machine depending on the particular hardware and application requirements, The model formalizes the VM description as an interpreter of the intermediate code, including instruction and operand decoding, and low-level operations while executing the instructions. Denotational semantics [9], [10] appropriate to formally describe programming languages are applied [11], [12]. For denotations the λ-notation is adequate and, therefore, applied [9], [13].

II. VIRTUAL MACHINE ARCHITECTURE
The architecture of the VM includes [14]: code and data memories, stacks and registers. The instruction processing module fetches successive instructions from Code memory and executes them acquiring values of operands either from Data or Code memory. Results are stored in Data memory.
Registers: The program counter is kept in the CodeReg register. The data base register DataReg is set by calls to and returns from subprograms, including function blocks and functions. When entering a subprogram the current values of CodeReg and DataReg are pushed onto Code stack and Data stack. The machine also includes the Flags register with status flags signaling errors or unusual situations.
VMASM intermediate language: The virtual machine operates as an interpreter of assembly code called VMASM (VM Assembler). The syntax is:

III. SCALABILITY
The data types and instructions of the VMASM language are defined in XML-formatted library configuration files (LCF).
Types and instructions: A portion of type definitions is shown in Listing 2. By applying deny-type one can restrict some data types. Aliases to existing types and special types not specified in the IEC standard can be defined, too. Listing 2. Type definition <deny-type name="LREAL" /> <type name="USINT" implement="alias"> <alias name="BYTE"/> </type> ...

Functions:
The definition of one in the group of ADD functions is presented in Listing 3. The virtual machine code vmcode consists of two bytes, with the first one 01 identifying the group, whereas the second * 2 indicates a flexible number of inputs ( * ) and identifies the data type (2) processed. The two components of vmcode are called group and type identifier, and are denoted by ig and it. By choosing an appropriate it, type-specific functions such as ADD:SINT, ADD:INT, etc. are defined.

IV. SEMANTIC MODEL
Semantic models provide formal descriptions of programming languages [11], [15]. In case of VMASM, the model consists of domains describing the virtual machine's states, memory functions, value interpreters relating memory to VMASM types, limited range operators, and a universal semantic function.
Semantic domains: The domain BasicT ypes consists of four sets reflecting the memory sizes of the VMASM types. The domain Address specifies 16-or 32-bit implementation. The general domain M emory is a function mapping Address to Byte1. Stack models a sequence ( * ) of Address domains (Kleene closure). The two Addresses represent source and target, respectively with Byte1 denoting number of bytes being moved.

BasicT ypes
Value interpreters: The following sample functions provide numerical interpretations of memory chunks.
Limited range operators: The virtual machine executes arithmetic operations in limited ranges, dependent on the particular types. For signed integers addition ⊕ is defined by Unification: The operator := used in expression unifies both sides. If the right side is an expression, then the left side is a variable with the value of the right side (assignment). If the left side is a tuple and the right side a variable, then the variable is split into the tuple's components.
Universal semantic function: To jointly express the concept of decoding group and type, followed by execution of a particular instruction, one may define a universal function covering all instructions Internally, after decoding ig and it, this function calls a specific function of the form Instruction decoding: Instruction decoding can formally be expressed by the denotational semantic equation shown in Listing 5. According to [9] or [13], the λ-expression has the form of λs.body, where s denotes the current state and body determines the value returned by the function. The body consists of a sequence of operations, the first of which splits current state s into a tuple composed of model components. The other operations decode the values of identifiers ig and it, update the code register to cr 2 and, by means of match ... with statements, call particular C functions. The result provided by C defines the new state s 1 returned by the function U .
is the first operation in body.
Assume that while calling a particular function C by the universal function U , the code register cr points to the first operand. (actually cr 2 in Listing 5). If the operand is a variable or label, then its value, i.e. address, is acquired from code memory cm by operand := GetAddress(cr, cm) (4a) In case of a global variable or label, operand stands for a direct address in data or code memory. If, however, the operand is a local variable of a subprogram, then the value operand means an address relative to the current value of data base register dr, which was set earlier by a subprogram call. Therefore, the address of a local variable is obtained by adding operandaddr := dr ⊕ operand (4b) The value of a variable, here shown for a Boolean, in data memory dm is read out and interpreted by composition If an instruction has another operand, the code register cr is incremented to point to the next memory location by Defining the new state s 1 as the tuple is The equation of JNZ has two operands, the conditional variable cnd in data memory and the code label clbl as before. The address cndaddr is determined according to (4a) and (4b) (with the content dr of the data base register equal to zero in case of a global variable). The code register is incremented to cr 1 to obtain the address clbl and, then, to cr 2 pointing to the next instruction. The Boolean value ctl controlling execution is determined as in (5). Depending on ctl, the code register of s 1 includes either clbl or cr 2 .
The first operand of CALB is the label of an instance in data memory for which the subprogram beginning at the label clbl is executed. The instance address iad and the subprogram address clbl are determined as before. cr 2 points to the next instruction. Since the contents of cr 2 , dr must be remembered for the subprogram return, they are pushed onto corresponding stacks.
Listing 8 shows C implementation of the JNZ and CALB. All system procedures having the common group identifier 1C are handled by a single general function IG_SYSCPROC_1C, with type identifier it as its parameter. The command switch selects a particular procedure. Each of the code segments sets codeReg to a new value depending on the respective meaning. CALB also modify dataReg. ADDRESS clbl = GetCodeAddress(); BOOL ctl = BOOLOf(G1BMData(cndaddr)); if (ctl) codeReg = clbl; } break; case 0x16: / * CALB call a function block * / { ADDRESS iad = dataReg + GetCodeAddress(); ADDRESS clbl = GetCodeAddress(); push_CodeStack(codeReg); push_DataStack(dataReg); dataReg = iad; codeReg = clbl; } break; ... / * other procedures * / default: / * unknown code * / flags |= FAULT; break; } Selected functions: The semantics of function NOT presented in Listing 9 negates the value stored at op1. The addresses raddr, op1addr are determined as before, followed by the Boolean value bv obtained as in (5). By means of the function U 1BM (Sec. IV) the value at raddr in data memory dm is then updated. That value is determined from bv by the F romBool and match ... with construct. um denotes the new state of the data memory. The C implementation of the NOT function presented in Listing 10 corresponds directly to its semantics.
In the case of the function EQ (Listing 11) two LINT operands op1, op2 are checked for equality. The Boolean value cmp follows from comparison (=) of the LINT numbers determined by LIntOf . The updated data memory in s 1 is the result of invoking U 1BM . The byte stored at raddr is given by F romBool(cmp). The function IG_EQ_12 from Listing 12 implements the comparison EQ for all relevant data types (group) via a parameterized macrodefinition EQ_TYPE.