Two new operations over extended index matrices and their applications in Big Data

The Index Matrices (IMs) are extensions of the matrices of algebra. Over the IMs different operations, relations and operators are defined. In the present paper, two new operations over IMs are defined and some of their properties are studied. An application of these operations for describing of Big Data procedure is discussed. Hortonworks Data Platform (HDP) is used to provide capabilities for data warehouse processing in the Big Data environment. Apache Hive is selected for data warehouse construction and querying. Firstly, data warehouse for product sells is implemented. The tables for employees, customers, products and product sells are created. Thereafter the new index matrix operations for difference between two IMs are executed for the first time in the Big Data environment . SQL queries are written to demonstrate the operations. The new index matrix operations are executed using SQL JOIN notation and logical operator NOT EXISTS.


I. INTRODUCTION
T HE CONCEPT of Index Matrix (IM) was introduced in 1984 and in more details -in 1987 [2], but the full description of the research over them was published in [3] exactly 30 years later.
Different extensions and modifications of the concept of an IM are described in [3]. One of them is an Extended IM (EIM), introduced firstly in [4]. They include as partial cases standard IM with elements of real or complex numbers, the (0, 1)-IM with elements from set {0, 1}, the logical IM with elements variables, propositions or predicates, the intuitionistic fuzzy IMs. The elements of the EIM can be each objects, in this number -whole IMs.
Different relations, operations and operators are defined over IMs and more general -over EIM. Only part of them have analogues in the theory of the standard matrices (see, e.g., [6], [7]).
Here, two new operations over EIMs are defined and some of their properties are studied. It will be obvious that these new operations can be transfer over each one of the partial cases of the EIM.
Firstly, we give the definition of an EIM.
Let I be a fixed set of indices, Let X be a fixed set of some objects. In the particular cases, they can be either real numbers, or only the numbers 0 or 1, or logical variables, propositions or predicates, etc.
Let operations ç, 7 : X × X ³ X be fixed and let (X , ç, e ç ) and (X , 7, e 7 ) be groups with unit elements e ç and e 7 , respectively. For example, when operation "ç" is "+" or "-", e ç will be 0, while when it is "×" or ":" -it will be 1. In some cases, it is suitable to define the unit element by § and it to be an empty object.
An EIM with index sets K and L (K, L ¢ I 7 ) and elements from set X is called the object (see, [4], [3]):  In [3], for the IMs A = [K, L, {a ki,lj }], B = [P, Q, {b pr,qs }], operations that are analogous to the usual matrix operations of addition and multiplication are defined, as well as other, specific ones. Here, we give only three of these operations, that will be used below. Addition where here and below "2" is the set-theoretic difference operation and where M ¦ K, N ¦ L, and for each k i * M and each l j * N , b ki,lj = a ki,lj .

II. DEFINITIONS OF THE TWO NEW OPERATIONS
Let the two EIMs A = [K, L, {a ki,lj }] and B = [P, Q, {b pr,qs }] are given.
The first, simpler, operation is defined by , where and below for every two arbitrary sets X, Y : The second operation is defined by The geometrical interpretations of both operations are shown on Figures 1 and 2.
From the definitions and geometrical interpretations of both operations we see immediately that the following assertions are valid. Theorem 1. For every two EIMs A and B, for each operation ç, and for operation 7 defined for every two x, y * X by x 7 y = e 7 there are follows: For the definitions of the operations¸and · ç we obtain pr,qs , for t 2 u 2 = e f = p r * P 2 K and v 2 w 2 = g h Obviously, operation ç cannot be applied over the elements of both EIMs because there are not at least two elements from the both EIMs that have equal indices.
The second equality is proved by a similar way. Theorem 2. For every two EIMs A and B, for each operation ç: A " 1 B = pr K2P,L2Q A · (ç) pr P 2K,Q2L B, The proof is similar to the above one. Let M be the set of all EIMs with elements from X 1 . Let Theorem 3. (M, " 1 , I ' ) is a commutative group. Proof. From the definition of operation " " 1 " it follows directly that for each two EIMs A, B * M, A " 1 B * M. Using the well-known equality for every three sets X, Y and Z: for every three EIMs A, B, C * M, where we obtain 1 When X is a set (class) in the sense of NBG-set theory of all predicates, then M will be a set (class). (where each element y α β ,γ δ is some element a ki,lj or some element b pr,qs ), or some element c d f ,eg )

Now, for EIM I ' we obtain
A " Analogously is checked that From the well-known equality X ÷ Y = Y ÷ X it follows that i.e., the operation " 1 is commutative. Finally, because of lack of indices, element x uv,wt must be §. Theorem 4. (M, " 2 , I) is a commutative monoid. The proof is similar to this of Theorem 3, without the last part, i.e.,as above, we can check that (M, " 2 , I) is a commutative monoid, but it is not a group because the fact that there is not a set Y that for some non-empty set X to be valid X * Y = '.

III. AN EXAMPLE OF DIFFERENCE OPERATIONSIN THE BIG DATA ENVIRONMENT
Hortonworks Data Platform (HDP) is a an open source framework for distributed storage and processing of huge datasets retrieving from different sources. HDP is used to discover insights from structured and unstructured data in the cloud or on-premises.It includes Big Data tools as Hadoop, Yarn, MapReduce, Hbase, Hive, Flume, Kafka, Druid [5]. The example of the difference operations is performed in the environment of Apache Hive. Apache Hive is a data warehouse used for reading, writing, and managing large amounts of data stored in distributed storage in SQL. Apache Hive flexibility can be extended using the used-defined functions (UDF) [1]. In the current investigation a data warehouse for product sells is implemented. The tables are uploaded using previously prepared csv files. Apache Hive supports only the sets operations U nion All and U nion [Distinct]. The Intersect and Except (M inus) operations are not included. In the current investigation for the first time the new index matrix operations performing operation difference will be applied by the analogy of the SQL JOIN clause and the logical operator N OT EXIST S in the Big Data environment. The tables Customers and Employees from bigdatadb data warehouse (Fig. 3). The authors will not compare the indexed field Number -the comparison will be performed using the columns for the first name and the last name of the tables. There are from the same data type. The tables Employees (Fig. 4) and Customers (Fig. 5)  The result presents the names of the customers and the employees. The SQL query in the Apache Hive environment is presented on the Fig. 6.
The simulation of difference operation is performed using the N OT EXIST logical operator in the SELECT statement and comparison of the desired columns from the tables in the sub-query (Fig. 7).
SELECT * FROM customers WHERE NOT EXISTS (SELECT * FROM employees WHERE (employees.FName = customers.CFname AND employees.LName = customers.CLname) ); The result of the query presents customers not included in the list of the employees. The columns for first names and last names are compared.
The same query with replaced tables in the query and sub-query can be executed to receive the employees not included in the table of the customers (Fig. 8).
The result of the query is presented on the Fig. 10. It combines the employees that are not customers and the customers that are not employees.
The combination of the previous queries using the UNION operator is executed (Fig. 9).

IV. CONCLUSION
Two new operations are introduced in the paper. In future, other properties of them will be studied and some other applications will be discussed. A part of them will be related to Big Data and Data Mining as continuation of the discussed in the present paper.