Current Trends in Automated Test Case Generation

The testing is an integral part of the software development. At the same time, the manual creation of individu-al test cases is a lengthy and error-prone process. Hence, an intensive research on automated test generation methods is ongoing for more than twenty years. There are many vastly dif-ferent approaches, which can be considered automated test case generation. However, a common feature is the generation of the data for the test cases. Ultimately, the test data decide the prog-ram branching and can be used on any testing level, starting with the unit tests and ending with the tests focused on the behavior of the entire application. The test data are also mostly independent on any specific technology, such as programming language or paradigm. This paper is a survey of existing literature of the last two decades that deals with test data generation or with tests based on it. This survey is not a systematic literature review and it does not try to answer specific scientific questions formulated in advance. Its purpose is to map and categorize the existing methods and to summarize their common features. Such a survey can be helpful for any teams developing their methods for test data generation as it can be a starting point for the exploration of related work.


I. INTRODUCTION
ESTING is an essential part of software development.At the same time, the manual creation of individual test cases is a lengthy and error-prone process.In many realworld projects, there is not enough time to ensure sufficient testing of the developed software product, which leads to its lowered quality.The programmers of the test cases can also miss some inputs, which leads to unexpected behavior of the software product.Hence, an intensive research of automated test generation methods is ongoing for more than twenty years, as can be seen, for example, in [1] or [2].

T
In the existing literature, automated test case generation is used or at least proposed on various testing levels.These levels include unit testing focused on the functionality of  This work was supported by Institutional support for long-term strategic development of research organizations.isolated features of the developed application (usually a method, procedure, or function), but also the regression and integration testing focused on the correct cooperation of the individual parts of the application.Automated test case gene-ration can also be used during the high-level testing of the functionality of the entire application and its adherence to the specified requirements.Automated test case generation is tempting and seems to be promising, as it should reduce the time the programmers spend on manual test case preparation.Nevertheless, there are several limitations.
First of all, it is difficult to automatically verify that the tested application or its part provides correct results.This would require generating the expected outputs for all the generated inputs, which is an inherently difficult task.Never-theless, this ability is crucial for the usage of the automated test case generations in real software projects.However, it should be noted that, in many cases, it is possible to detect the incorrect behavior of an application even without the known correct outputs.An obvious example is when the application crashes, but there can also be limitations of the outputs, which can be used for incorrectness checking (e.g., the calculated volume of a cube cannot be negative).
Another issue, which is often discussed (e.g., in [3]) is related to the combinatorial explosion.Consider a unit test of a method with several parameters where various combinations of the parameters should be considered.Even when the parameters can be grouped into several discrete classes, the number of all the possible combinations grows very fast with the growing number of parameters and classes.This problem is even more pronounced in higher-level tests, when multiple methods are executed during one higher-level functionality testing.Various settings and running environments of the tested application only worsen this problem.Hence, even in tools, which are used in real projects, such as EvoSuite or Randoop [4], the number of generated test cases can be very high, which leads to long running times.This partially limits the usability of automated testing.Nevertheless, the problem can be mitigated by employing efficient test case selection in order to generate and run the test cases, which provide the highest expected code coverage and/or have the highest expected error detection rate.The increasing power of contemporary computers is also helpful, as running a huge number of tests is more and more feasible.
The last issue, we would like to mention, is the validation of the automated test-generating methods themselves.Several different approaches for the evaluation of the automated test-generating methods can be found in the existing literature.From the practical usability of the methods in real projects point of view, there are two most important questions -how realistic the methods are and how well they perform in finding different types of realistic errors.
There are many different approaches, which can be considered automated test case generation.However, a common feature is the generation of the data for the test cases.Ultimately, the test data decide the program branching and can be used on any testing level, from the unit tests to the tests focused on the behavior of the entire application.The test data are also mostly independent on any specific technology, such as programming language or paradigm.
This paper is a survey of existing literature of the last two decades that deals with automated test data generation or with tests based on it.This survey is not a systematic literature review and it does not try to answer specific scientific questions formulated in advance.Its purpose is to map and categorize the existing methods and to summarize their common features.Such a survey can be useful for any teams developing their methods for test data generation as it can be a starting point for the exploration of related work.
The remainder of this paper is structured as follows.Related surveys are discussed in Section II.The selection of the papers for this survey is described in Section III.The existing methods for test data generation are discussed in Section IV.Their common features and trends are described in Section V. Threats to validity are described in Section VI.The conclusions and the future work are in Section VII.

II. RELATED WORK
There are multiple studies, which survey the existing testing approaches.Our survey is intended to complement them from the automated test data generation point of view.

A. Existing Methods Studies
Ref. [5] summarizes the methods for test generation based on control flow analysis, automatic random data generation, and program execution analysis and/or the methods designed to produce tests, which maximizes the code coverage.The majority of the methods described in this survey is designed to deal only with simple program constructions and are often based on the models of the program instead of real programs.This is quite understandable, since the survey is rather old (from 1999).Nevertheless methods based on the same principles repeat again and again in more modern papers, only the methods or at least the examples, on which the methods are demonstrated, are usually more complex.For example, a more recent orchestrated survey [6] is focused on adaptive random testing among other methods.
A thorough review in [7] focuses on the papers dealing with search-based test case generation.The review makes it obvious that there is a constant increase in the number of testing-related publications between 1995 and 2007.The main focus of the review is the quality of the verification of the test generation methods.It is concluded that there is a lack of a standardized rigorous method to perform, asses, and compare the individual methods.Moreover, in many papers, there is even not enough empirical data to perform any comparison.It is also pointed out that, while many methods can achieve relatively high code coverage, it is not clear, whether the tests covering the code are able to find errors in the code.Another survey focused on search-based test case generation can be found in [8].
The search-based testing with an emphasis on mutationbased methods is also the theme of the survey in [9].The methods described in papers published between the year 1996 and 2014 are based on genetic algorithms, ant colony optimization, simulated annealing, or hill climbing.The survey discusses also the relations and development of the methods in multiple papers.There are several conclusions.One is that the above-mentioned meta-heuristics significantly reduce the number of generated test cases without negative effects on the code coverage.Another is that the automated test generation methods are not designed for the concurrency problems.The last conclusion is that the comparability of the automated methods is difficult, similarly to [7].
The review in [10] focuses on the dynamic symbolic execution.There are twelve tools, which are compared based on various features, such as the number of publications dedicated to each tool, the utilized method for automated test generation, and the environment, in which the tool can be used.The ability of the tools to detect errors in the software is not among the investigated features.This feature is investigated in [11], which is focused on the methods utilizing aspectoriented programming (namely Wrasp, Aspectra, Raspect, and EAT).One of the conclusions is that the structural evolutionary testing (EAT) shows the most promising results but at the cost of greater effort compared to random testing.
The short survey in [12] focuses on papers dealing with test data generation.It discusses various types of data generation from their architecture and usage points of view.The advantages and disadvantages of the methods as well as the best practices are discussed.
Although the majority of the surveys described above are focused on a technology or a set of technologies, there are also surveys focused on a specific type of software.An example is a systematic literature review [13], which deals with automated functional testing of mobile applications.Another example is a study [14], which discusses application of several different techniques for verification of flight software in Jet Propulsion Laboratory.

B. Practical Usability Studies
There are also studies, which focus on the usage of the automated testing methods in real projects, such as [15].This study is not focused on published papers, but rather describes how the testing methods are used in real software projects and how the automated testing methods would improve the situation.The study has a bit darker tone than the studies mentioned in Section II.A as it points out that there is a lot of additional effort necessary when a promising method described in a paper should be used in a real industry project.
The study described in [16] is focused on the comparison of existing tools for automated test generation, such as Randoop, AutoTest, AnalitiX, Jtest, and so on.This study describes how the comparison of different methods for automated test generation should look like -precisely the aspect, which was mentioned as missing in [7] and [9].In [16], a complex benchmark consisting of over 30 cases is described, which enables to empirically determine whether the automated test generation methods are able to uncover specified conditions.The results from this benchmark can be used for comparison of the methods.Although this benchmark is a good basis for the comparison of the automated test generation methods, it still utilizes synthetic cases, not real software [16].
An unorthodox practical study is the Java Unit Testing Tool Contest, which is held annually and its results are reported at various conferences (e.g., in [17] or [18]).The contest is intended for test generation tools designed for Java.Their ability to find errors in programs is tested using a benchmark consisting of real-life classes taken from various opensource GitHub projects.The contesting tools are evaluated based on the code coverage and mutation score [17], [18].

III. SURVEY DESCRIPTION
This paper is an intermediate result of our exploratory work to create a substance for a systematic literature review, which is the main aim of our current and future work (see Section VII).Although this intermediate result is only a (non-systematic) survey, the collection of the primary studies was performed in a rigorous manner described in following subsections, as the collected papers will also form part of the basis for our future systematic literature review.

A. Papers Searching
As the sources of the papers, we used the IEEE Xplore1 library, which includes full texts of a large number of technology-related papers from both conferences and journals and the ScienceDirect2 library, which includes papers from a large number of technology-related journals.Due to the institutional subscription, we have access to the majority of the full texts of the papers contained in both libraries, which is essential for the survey.Both libraries enable basic and advanced searching, but the available filters are quite different.For this reason, we used different settings for each library to obtain manageable numbers of relevant results.We made several attempts with various filters and search strings before we reached the final settings for both libraries.
The final search string for the IEEE Xplore library was "automated test data generating".It was used together with two filters.The year of publication had to be from 2000 to 2022 and the publication topic had to be "Program Testing".Using this setting, 461 results were obtained.The final string for the ScienceDirect was "automated test data generating program testing".It was used with three filters.Similarly to IEEE Xplore, the year of publication had to be from 2000 to 2022.Additionally, the subject area had to be "Computer Science" and the title of the paper had to contain "test data".Using this setting, 58 results were obtained.The searching in both databases was performed in April 2023.

B. Papers Filtering
From the search results, only the papers focused on the issues of automated test data generation for software testing, were selected.In first round, the selection was performed based on the titles.In second round, the selection was performed based on the abstracts, but only from the papers, which passed the first round.After the second round, there were 179 papers left (see Table I).The full texts were downloaded and investigated only for the 179 papers, which passed the second selection round.From these papers, some were eliminated from further processing, because, despite the promising title and abstract, the theme of the paper was outside the scope of this survey.Of the remaining papers, only 67 were included into the study, because they best represent the current trends in test data generation.
It should be noted that many of the obtained papers were already processed during our preliminary work with different search strings and filter settings in 2022.Hence, only the newest papers and papers not obtained previously due to different search settings of the libraries had to be processed.This enabled us to finish the paper in a relatively short time after the final search was performed.

C. Aims of the Survey
As this survey is intended to serve as a starting point for the exploration of related work for research teams dealing with automated test data generation, the aims of the survey can be summarized as follows: • To categorize existing automated test data generation methods (see Section IV).
• To summarize and discuss common features of the methods (including their verification, implementation availability, testing level, and target platform) and observable trends (see Section V).

IV. EXISTING METHODS IN LITERATURE
The categorization of the surveyed automated test data generation methods was performed based on the primary technology used for the test data generation.This categorization enables the readers to focus mainly on the papers related to the technology of their interest.It is also consistent with the existing surveys, as they are often focused on a relatively narrow set of technologies (see Section II).The papers of individual categories are discussed in following subsections.

A. Pseudorandom Generation-based Methods
The most basic approach, how to obtain test data, is to generate them using pseudorandom generators.Though the basic method can give relatively good results (e.g., code coverage) for number inputs, its usage for a more complex (and valid) data, such as specific strings or objects is difficult.Nevertheless, pseudorandom number generation is often combined with other approaches.In [19], the stochastic process models of the objects and their random initiation is used together with random method invocation.
In [20], the pseudorandom generating is combined with the constraint solving for the generation of test data for relational database schemas.The testing of object-relational mapping (ORM) based on the pseudorandom generation and formal models is described in [21].In [22], data description using XML and regular expressions is used together with pseudorandom generating to generate invalid and atypical testing inputs for robustness testing.

B. Control-Flow-based Methods
The control-flow-based methods create control-flow graphs of the tested program using, for example, the static analysis.From these graphs, the tests are generated.A common aim is to achieve a high code coverage, which can be observed for example in [23], [24], or [25].
The method described in [24] is rather basic.It generates input data for the tested program in order to ensure the execution of all branches of the program.The number of generated test cases is limited by the elimination of already explored paths in the control-flow diagram.However, the method is limited to the numerical inputs only.A similar limitation can be also observed in [23].
The control-flow-based methods are often used for web applications.The method described in [25] is designed for the testing of the frontend of web-based applications.It analyzes the content and structure of the investigated website, creates the possible paths of the user, and generates the input testing data for the web forms in order to ensure path coverage.In [26], a method for the generation of test data for testing REST APIs is described.The connected control flow graphs are traversed in order to find patterns of variable usage to produce usable variable values.Another example of the usage for the web application can be found in [27].
The control-flow-based methods are also quite often combined (among other technologies) with the pseudorandom generation of the input data.In [28], stochastic hill climbing is used for the finding the probabilistic distribution.This distribution is then used for the generation of the pseudorandom input testing data.The combination of control-flow diagrams and pseudorandom data generation can be found also in [29].

C. Specification-based Methods
The specification-based methods utilize a form of the specification of the investigated software to generate the test cases.This approach is tempting, as it should compare the actual behavior of the software with the expected behavior given by its specification.The existing methods utilize the UML models (e.g., in [30], [31], [32], or [33]), specification of use cases (e.g., in [31], [34], [35], or [36]), or contracts (e.g., in [37]).Program states description is utilized in [38].
In [30], tests of the entire system are generated from the UML use case and state diagrams.From these diagrams, a usage model is created, which is then used as the basis for the tests.In [32], the activity diagram, the sequence diagram, and the system testing graphs are used to create a combination graph, which is then explored using a modified Depth-First Search (DFS) to generate expected test cases.The contracts in [37] are used similarly to the use case diagrams in [30].They are transformed into models describing the expected behavior of the investigated program.From this form, the executable test cases are created.
The method described in [35] utilizes textual use case specifications for the generation of acceptance tests.The method is based on natural language processing (NLP) and constraints solving.In [36], the use cases are used to generate a control flow graph and a NLP table, which are, in turn, used for test case generation.The method described in [39] is designed for process-driven applications.The method utilizes analysis of the application and the specification of tests to generate test codes.

D. Program Execution Analysis Methods
The methods based on the program execution analysis utilize the observation of the application behavior in order to generate test cases.There are two main approaches -the approaches based on the instrumentation and on the dynamic symbolic execution (also known as concolic testing).
First approach is based on instrumentation of the tested application in order to enable a simple observation of its behavior.Examples include wrappers around tested functions or methods (e.g., in [40]) or probes near important points of the program, such as control structures (e.g., in [41]), usage of augmented virtual machines (e.g., LLVM [42]), or usage of runtime instrumentation (e.g., in [43]).
Second approach is used for example in [44], [45], [46], [47], [48], or [49].A dynamic symbolic execution is used in [45] to observe the behavior of the tested application.This observation is used for checking whether new randomly generated input data lead to better path coverage than already stored paths.In [48], the dynamic symbolic execution works with additional attributes enabling to check the efficiency of the paths produced based on the random input data.It is also possible to check whether the expected boundary values described by the contracts are observed.In [50], the dynamic symbolic execution is used for the testing of C++ Qt Framework classes.A source code preprocessing phase is used to find constructors of Qt classes parameters.A similar approach is used in [51], but for C++ templates.
In [52], automated guided symbolic execution combined with constraint solving is used to avoid exploring useless paths in the program.The method is used for system vulnerability detection.In [44], preprocessing of enterprise applications to enable usage of existing symbolic execution tools for their testing is described.In [53], the tested program is transformed into a set of constraints, which are then solved using a symbolic reasoning engine.So, the approach resembles the dynamic symbolic execution.The evaluation of the CREST concolic testing tool's ability to find real-life errors in real embedded applications is described in [54].In [43], a concolic test generation tool is combined with the automatic generation of test cases from a formal description of the program (e.g., database table definitions, process-flow diagrams, etc.).

E. Data-Description-based Methods
In some papers, the described methods are not focused on a program, but rather on the specification of the input testing data.This approach is common in relation to the increasing number of web-based applications and with the necessity to test their text-based APIs.The frequently used description formats include the Web Services Definition Language (WSDL) used for example in [55], [56], and [57] or the JavaScript Object Notation (JSON) used for example in [58].XML Schema Definition (XSD) is used in [59].
An interesting comparison is described in [57] where a realistic WSDL-based data set is compared to a fully random data set.The conclusion is that the utilization of realistic data leads to a higher code coverage.In [53], a method for generating complex interconnected data from a WSDL specification is described.The method enables to generate both valid and invalid input data.In [60], a method for the preparation of the test data for web forms utilizes an ontology and types of the fields of the web form.In [61], existing data and rules for their converting were used for testing a data warehouse.
A quite different approach is used in [62].It uses static analysis of existing tests for mining of literals, which can be suitable as input values in generated tests in a specific domain.Yet another different approach is described in [63].There, the test cases are generated from inputs specification in natural language.Natural language processing (NLP) and key phrases detection are employed for this purpose.

F. Search-based Methods
A common aim of the search-based methods is to provide high code coverage with a relatively low number of generated test cases.These methods typically do not rely on the knowledge of the program structure, but rather employ various search meta-heuristics to find efficient input test data.Regardless of the utilized meta-heuristic, there must be a way to evaluate the solutions found by the heuristic.Hence, these methods are combined for example with models of the tested program behavior, such as the control flow [64] and event flow [65], or with the program instrumentation [66].
The commonly used meta-heuristics include genetic algorithms, which are employed, for example, in [67], [68], [69], [70], [71], or [72], ant colony optimization (e.g., in [73]), or particle swarm optimization (e.g., in [74] or [75]).A genetic algorithm is used for test data generation for unit testing of Java programs in [67].In [76], a genetic algorithm is combined with grammar-based fuzzing to generate highly structured testing input data.In [77], a genetic algorithm is combined with random search and database instrumentation to generate test data for SQL queries testing.In [78], a genetic algorithm, an evolutionary algorithm, and an alternating variable method combined with an Object Constraint Language (OCL) description of constraints are investigated.
In [73], the ant colony optimization is employed to achieve higher branch coverage with a relatively small set of testing data.The method is based on the simulation of the pheromone path and is reported to provide better branch coverage than a standard genetic algorithm or particle swarm optimization.In [74], the particle swarm optimization is combined with formal specifications (written in SOFL) and mutation testing.Improved particle swarm optimization is also employed together with predicate functions and path similarity calculation in [75] for test case generation.An unspecified meta-heuristic is employed in [79] together with constraint solving of manually added constraints.

G. Machine-Learning-based Methods
The methods based on machine learning usually utilize artificial neural networks (ANNs) for the test data generation.In [80], a neural network is used for black-box testing of the graphical user interface (GUI) of Android applications.The input of the neural network is a set of screenshots of the tested application.In [81], generative adversarial networks are employed for automated test data generation.A neural network for test generation, which uses the execution trace of the program as an input, is employed in [82].In [83], the dataset for the neural networks training for source code vulnerability detection is prepared using a mutation approach.
In [84], two approaches for test oracle generation are described.One is based on an artificial neural network and the second is based on data mining from decision trees.The advantages and limitations of both approaches are discussed.In [85], no artificial neural network is used.Instead, random forest, which is a generalization of tree-based classification, is employed for predictive mutation testing.

V. COMMON FEATURES OF EXISTING METHODS
Regardless of the technology utilized by the methods described in Section IV, there are common features and issues of these methods discussed in following subsections.

A. Methods Verification
The lack of verification possibilities or of standard ways how to compare various methods is mentioned in several works (e.g., in [7] or [9]).Based on the investigated papers, it can be concluded that an objective comparison and assessment of the methods cannot be done by using the text of the papers only.Simply, there is not enough information and the provided examples and technologies are quite often vastly different.Some papers (e.g., [29]) contain only a very general description of the verification or testing of the proposed method.Some papers (e.g., [33]) contain no testing at all and focus solely on the description of the proposed method.
Nevertheless, some papers provide means for assessing the quality of described methods, which are "above average".For example, in [38], [49], [61], or [78], very thorough descriptions of the evaluation process of the proposed methods can be found.It is reported that the evaluation process includes tests performed on realistic programs with actual errors found by the methods.This is in contrast with the majority of the paper, in which the methods are often demonstrated on quite simplified examples (e.g., in [67] or [75]).

B. Implementation Availability
It would be beneficial if the implementations of the methods described in individual papers were available for download and further trials.If this is not possible, a complete data set with data supporting the quality of the described method would be also quite informative.However, from the investigated papers, the majority does not enable to perform a replication study without a reimplementation of the methods from the description in the paper.Of the 67 primary studies referred in this survey, there were only 15 studies with direct links to tools with implementation of the described methods.
From the available tools, 11 tools are provided in the form of GitHub repositories (see Table II) and the remaining 4 tools have dedicated websites.The website of the CREST [54] also contains a link to the GitHub repository along with Tool name Link [35] Fig. 1 Percentage of individual testing levels in primary studies a downloadable The website of the CATG [43] contains downloadable .jarfiles.The method described in [72] is implemented in the DCRTT, which appears to be a commercial product, as we were unable to find direct download links on the website.Finally, the website of the SDG [79] contains downloadable .zipfile.As of May 21 2023, all the links are functional.The available tools are summarized in Table II.

C. Testing Level
As it was stated in Section I, the automated test case generation methods exist for various testing levels.From the primary studies referred in this survey, the vast majority (specifically 39 papers) was focused on unit testing (see Fig. 1), for example [19], [23], [27], [59], or [69].One of the possible reasons could be that the methods are often demonstrated on quite simple and/or short examples (see Section V.B).Short examples correspond well to unit tests, which usually deal with relatively short part of the source code with limited functionality.

D. Target Platform
The methods described in primary studies are designed for a specific platform, for example for a specific programming language or a specific domain, such as web applications or databases.The methods can be also sufficiently general to be utilizable for multiple platforms.Such general methods usually do not use source code for the of the tests, but rather other forms of descriptions of the application, such as UML diagrams (e.g., [35] or [36]).There were 7 target platforms, which were represented by more than one primary study, including the generally utilizable methods (see Fig. 3).The generally utilizable methods also form just the largest group with 19 papers (e.g., [23] or [39]).The specific target platform with the largest number of papers was Java language (18 papers, e.g., [59] or [81]) followed by C/C++ languages (14 papers, e.g., [28] or [42]).Further groups include C#/.NET platform (2 papers - [48] and [56]), web applications (6 papers, e.g., [26] or [55]), databases (DB -2 papers - [61] and [77]), and programmable logic controllers (PLCs -2 papers - [46] and [47]).There were also 4 methods designed for other target platforms, each represented by a single primary study (4 papers, e.g., [64] or [74]).These papers/methods are grouped as "others" in Fig. 3 and 4.

E. Observable Trends
Since the time period of the analyzed primary studies is more than two decades (2000 to 2022), there are a few observable trends.Two technologies, which exist for a relatively long time, but are practically used only recently for the test case are natural language processing (e.g., [35] or [36]) and artificial neural networks (e.g., [80] or [81]).Of the primary studies referred in this survey, the oldest study is from 2021 and 2020 for the NLP and the ANNs, respectively.This can be attributed to the relatively recent but significant progress in these fields leading to the practical usability of both technologies.
Another observable trend is the slight increase in the number of studies with direct links to the tools implementing the proposed methods (see Fig. 5).As can be observed in Fig. 5, studies with 11 of 15 available tools were published in 2017 and later.From the primary studies referred in this survey, there was no available tool before 2007.

VI. THREATS TO VALIDITY
As pointed out in Section I, this survey is not a systematic literature review and does not attempt to answer specific research questions formulated in advance.It also does not attempt to exhaustively list all papers related to the test case or test data generation.Hence, there are papers, which would fit the theme of this survey, but we did not include them.There are several possible reasons: 1.The paper was not discovered in the libraries, because it did not pass the utilized filters (see Section III.A). 2. The paper was not present in the two utilized libraries, but may be present in others.3. The paper was discovered and its full text was read, but because of the similarity to other papers (in the sense of used techniques and/or their combinations), it was not included into the survey.For the reasons described above, the reader should have in mind that this survey is not exhaustive in any sense, but tries to summarize the approaches and technologies currently in use in the field of automated test data generation.

VII. CONCLUSION AND FUTURE WORK
In this paper, the existing literature that deals with test data generation or with tests based on test data generation was summarized.The commonly used approaches were discussed and their common issues and features were described including a few observable trends.
The collected primary studies, which this (non-systematic) survey summarizes, will be used as part of the basis for our future systematic literature review that will cover the theme of this survey, but will add specific research questions and formalization of the entire review process.
Another branch of our current and future work is the creation of a benchmark for the test data generation methods.Such a benchmark would allow us to objectively compare the ability of the methods to find known realistic errors.For this purpose, we are currently developing the Testing Applications Generator (TAG) [86].This tool is intended to generate applications with selected introduced errors of various types.It enables to introduce errors on the method level meaning that each method can have several different implementations with various introduced errors.The resulting generated application is a general Java application with few limitations and with a structure of the entire project (not only source codes, but also libraries, additional files, and folder structure).The common types of errors should be also obtained during our future research.The tool will be used to create a set of several applications (with several versions each) with multiple introduced errors.This set will serve as the benchmark for automated test generation methods.

Fig. 2
Fig. 2 Main utilized technologies for individual testing levels

Fig. 3
Fig. 3 Percentage individual target platforms in primary studies

Fig. 4
Fig. 4 Main utilized technologies for individual target platforms

Fig. 5
Fig. 5 Number of available tools in individual years