Parallelized Population-Based Multi-Heuristic System with Reinforcement Learning for Solving Multi-Skill Resource-Constrained Project Scheduling Problem with Hierarchical Skills

—In this paper the Parallelized Population-based Multi-Heuristic System controlled by the Reinforcement Learning based strategy is proposed to solve the Multi-Skill Resource Constrained Scheduling Problem with Hierarchical Skills, denoted as MS-RCPSP. It is an extension of the classical RCPSP where some given pool of skills has been assigned to the resources. The MS-RCPSP as well as the RCPSP belong to the class of strongly NP-hard optimization problems. To solve the MS-RCPSP the approach consisting of evolving a population of solutions and using a set of several heuristic algorithms controlled by the reinforcement learning strategy, and executed in parallel, has been proposed. To implement the system and take advantage of the speed-up offered by the parallel computations the Apache Spark platform has been used. The system has been tested experimentally using benchmark problem instances from the iMOPSE dataset with the makespan as the optimization criterion. The proposed approach produces good quality solutions often outperforming the existing approaches.


I. INTRODUCTION
R ESOURCE MANAGEMENT plays an important role in different domains.It involves planning, scheduling, and allocating various resources such as machines, technology, money, people, or teams to a project.In a majority of organizations, the task of determining some schedules occurs regularly, often daily.Deciding on schedules requires the allocation of resources which, usually, are limited and not freely available.In project management, there are three basic types of constraints imposed on the availability of resources.These are time, cost, and scope constraints.Therefore, effectively utilizing scarce resources is important for the success of any project.Extensive research in project management has led to the proposal of different models and methods aimed at optimizing resource utilization to achieve project goals.Among several possible project management problem formulations the best known and intensively researched is the Resource Constrained Project Scheduling Problem (RCPSP) and its numerous extensions.In recent years a high amount of papers have reported on various methods of solving the RCPSP and its variants.Extensive reviews of this research effort can be found in [5], [6].Among possible RCPSP extensions focusing on the use of human resources is the idea of considering problems where to complete a project various skills on the part of human resources are needed.The idea of considering the multi-skill resource-constrained problems has been motivated by the practical needs of projects where staff with different skills is required and needs to be scheduled and assigned.In the MS-RCPSP human resources are considered each possessing a particular set of skills, which can be applied to these activities in the project that require such skills.The primary Multi-Skill Resource Constrained Project Scheduling Problem (MSRCPSP) with skillsets has been introduced in [24] and next considered, for example in [3], [17], [1].The most recent classification for the MSRCPSP and its extensions can be found in [27].One of such extension is MSRCPSP with hierarchical skills proposed in [2] and commonly denoted in the literature as MS-RCPSP.It is based on both the classical RCPSP and the Multi-Purpose Machine Model Problem to find a schedule that optimizes a performance criterion like, for example the project duration i.e. makespan.
Both, the MSRCPSP and MS-RCPSP as the generalizations of RCPSP belong to the class of strongly NP-hard optimization problems [4].Hence, most of the approaches in the literature consider applying metaheuristic algorithms.Example successful approaches include Ant Colony Optimization [20], Greedy Randomized Adaptive Search Procedure [21], [22] Teaching-learning-based optimization algorithm [28], Differential Evolution and Greedy Algorithm [22], Genetic Programming [16], Genetic Programming Hyper-Heuristic [16].In [19] the bicriteria MS-RCPSP optimization variant was proposed including project duration and cost.In [23] a new benchmark dataset was made available for public use.The approaches proposed and made available in [19], [23] involved a Greedy Algorithm that optimizes schedule duration and a Greedy guided search controlled by a Genetic Algorithm, for minimizing schedule duration and cost [23].The Decomposition-Based Multi-Objective Genetic Programming Hyper-Heuristic has been proposed in [29].
Heuristic and meta-heuristic approaches have been important and intensively expanding area of research and development for many years.With the emergence of advanced technologies, the multi-heuristics and hyper-heuristics are commonly used in various fields, including optimization problems, search algorithms, scheduling, routing, and more generally various artificial intelligence applications.They often involve selecting, combining, or switching between different heuristics based on certain conditions, problem characteristics, or performance metrics.They are also used in solving project scheduling problems [15], [18], [25], [29].The idea behind a multiheuristic approach is that different heuristics may be more effective or efficient in different parts of a problem solution space.By employing a combination of heuristics, the approach can exploit their complementary strengths and mitigate their individual weaknesses.This can lead to improved problemsolving performance, such as faster convergence to a solution, better quality solutions, or increased robustness to different problem instances.
Multi-heuristic approaches can be more efficient using parallel computations which are commonly used in optimisation.They provide significant advantages in terms of speed, scalability, solution space exploration, and handling complex problem structures.By leveraging multiple processing units or distributed computing resources, parallel algorithms can efficiently solve optimization problems, leading to an improved performance, faster convergence, and better quality solutions.One of the tools for parallel computing is Apache Spark.Apache Spark is an open-source framework for processing big data in parallel across clusters or cloud architectures.Spark's core data structure is called Resilient Distributed Datasets (RDDs), which improves the performance of iterative algorithms and data mining tools.The platform automatically handles program distribution and data splitting.Spark's scheduler optimizes operations using data locality and lazy evaluation.
In this paper a Parallelized Population based Multi-Heuristic (PPMHRL) for solving MS-RCPSP is proposed, implemented and validated.The approach belongs to the population-based metaheuristics class.It is based on using four types of optimization heuristic algorithms controlled by a strategy based on Reinforcement Learning technique.The heuristic algorithms include three types of local search algorithms, the path relinking algorithm and exact solution based heuristic.To implement this approach the Scala language, Apache Spark framework and RDD collections have been used.The proposed approach has been tested experimentally using benchmark instances from the iMOPSE [30] library.The makespan minimization has been used as the optimization criterion.
The paper is constructed as follows: Section II contains the formulation of the MS-RCPSP problem.Section III provides a description of the proposed Multi-Heuristic Population Based Approach with Reinforcement Learning for solving instances of the MS-RCPSP.The section contains also a description of the optimization heuristic algorithms used: local search and path relinking.In section IV the computational experiment carried out has been described, including parameter settings experiment plan, experiment results, and comparisons of results with some other approaches.Finally, Section V contains conclusions and suggestions for future research.

II. PROBLEM FORMULATION
In the paper, we consider the project management problem where activities to be executed require skills, and the available multi-skilled resources possess these skills.
The considered Multi-Skill Resource-Constrained Project Scheduling Problem with hierarchical skills can be described using classification scheme proposed in [7] for scheduling problems.An extension of this classification scheme that allows the representation of multi-skilled resource-constrained project scheduling problems and their extensions was proposed in [27] recently.The considered problem class is denoted as ms, 1, H, T R, F lex|cpm, 1|C max , C.
In the MS-RCPSP problem the set of n activities (tasks) and m renewable resource types are considered.Each activity has to be processed without interruption to complete the project.The duration of activity a j , j = 1, . . ., n is denoted by d j .The types of resources represent human staff with different skills.Every resource r k , k = 1. . . ., m possesses a subset of skills Q k from the skill pool Q defined in a project and the salary paid for performed work as hourly rate (cost) c k .In a given period of time, only one resource can be assigned to a given activity.
Each activity requires a set of skills to be executed denoted as Q j , but not every resource can be applied to its realization.Each resource skill is labelled with familiarity level, that is the resource r k is capable of performing the activity a j only if r k disposes skill required by a j at the same or higher level.
There are precedence relations of the finish-start type with a zero parameter value (i.e.F S = 0) defined between the activities in the project.In other words activity a j precedes activity a i if a i cannot start until a j has been completed.S j (P j ) is the set of successors (predecessors) of activity a i , j = 1, . . ., n.
The objective is to find a schedule S of the project activities finishing times [f i , . . ., f n ], where the resource and precedence constraints are satisfied, such that the schedule duration (makespan) M S(S) = s n is minimized.
Since the MS-RCPSP is a generalization of the RCPSP, it belongs to the class of the strongly NP-hard problems [4], [19].

A. Apache Spark based Implementation
To implement the proposed system Scala language and Apache Spark environment have been used.Apache Spark is an open-source framework designed for processing big data in parallel across clusters or cloud architectures.It prioritizes ease of use and leverages data locality to optimize computations while maintaining the required fault tolerance.Apache Spark is currently one of the most popular and fastest distributed computing frameworks, and it stands-out as the largest opensource project in data processing.
The architecture of Spark involves a master node and multiple worker nodes.The master node handles task scheduling, resource allocation, and error management, while the worker nodes perform parallel processing of Map and Reduce tasks.The platform automatically handles program distribution and data splitting for the users.
The core data structure in Spark is called Resilient Distributed Datasets (RDDs).RDD collections enhance distributed, parallel computation of iterative algorithms and interactive data mining tools.RDDs enable using parallel data structures and parallel computing.
Spark's scheduler efficiently executes operations specified by RDDs, exploiting data locality to avoid producing unnecessary data copies between nodes.RDDs are so called lazy structures evaluated, meaning the operations are performed only when the result is requested.This allows to increase the efficiency of parallel computing.Spark's built-in constraint solver can optimize the transformation graph by eliminating certain operations.RDDs also enable efficient fault tolerance by tracking the history of transformations rather than duplicating data between nodes.
The proposed Parallelized Population-based Multi-Heuristic system controlled by Reinforcemant Learning (RL) strategy is denoted as PPMHRL.The PPMHRL uses the parallel computing capabilities of Spark in order to solve MS-RCPSP problem instances stored in a population.In this approach to use the Spark capabilities efficiently its build-in parallelization mechanism has been used.To solve the MS-RCPSP the population of solutions, optimization heuristic algorithms and control strategy have been proposed and implemented.Individuals from the population of solutions are improved by optimization heuristic algorithms controlled by the RL strategy.The proposed optimisation heuristic algorithms are described in the following subsection.

B. Optimization Heuristic Algorithms Solving MS-RCPSP
To solve the MS-RCPSP with makespan minimalization the heuristic algorithms coded in Scala language have been used.The algorithms proposed in [12] have been improved and adjusted to the new system.Hence, five kinds of optimization heuristic algorithms are used: The above mentioned LSA is a simple local search algorithm which finds the local optimum by moving (LSAm) activities or exchanging (LSAe) pairs of activities in the solution schedule.Simultaneously, the necessary change of assigned resources is checked and performed.In one iteration all possible moves or exchanges are checked and the best one is carried out.The best solution found is remembered.The only parameter of these algorithms is: • maxIt LSAm -the maximum number of iterations without improvement for activities moving, • maxIt LSAe -the maximum number of iterations without improvement for activities exchanging.The LSAc is LSA based on one-point crossover operator applied to the pair of solutions.The crossover operation can be applied in each crossing point.Hence for project with n activities maximum n − 2 crossing points can be checked.Because for some projects it may be too time consuming the algorithm stops after fixed number of iteration without improvement.The best solution found is remembered.The only parameter of this algorithm is: • maxIt LSAc -the maximum number of iterations without improvement.The PRA is a path-relinking algorithm where for a pair of solutions from the population a path between them is constructed.Next, the best of the feasible solutions from the path is selected.To construct the path of solutions the activities are moved to other possible places in the schedule.All possible moves are checked.Only feasible solutions are accepted.The best solution found is remembered.The algorithm has no parameters.
The EPTA is an exact precedence tree algorithm based on the concept of finding an optimum solution by enumeration for a partition of the schedule consisting of some activities.The implementation proposed for RCPSP [9] has been adopted for solving MS-RCSP by adjusting constraints for multi hierarchical skill levels.An exact solution for a part of the schedule beginning from activity on chosen position is found.The activity position is chosen randomly without repetition.The best solution found is remembered.The algorithm has two parameters: • maxIt EP T A -the maximum number of iterations without improvement, • nP art EP T A -the size of schedule partition for which the exact solution is found.

C. Architecture of the PPMHRL System
The Parallelized Population-based Multi-Heuristic system controlled by Reinforcement Learning strategy (PPMHRL) searches for solutions of MS-RCPSP using a set of improvement heuristic algorithms.The initial population of solutions (individuals) is generated using random priority rule and serial forward SGS (Schedule Generation Scheme).An individual is represented by the sequence of activities with resources EWA RATAJCZAK-ROPEL, PIOTR J ĘDRZEJOWICZ: PARALLELIZED POPULATION-BASED MULTI-HEURISTIC SYSTEM WITH REINFORCEMENT LEARNING 239 assigned.To generate a solution from the sequence, the serial forward SGS is used.Individuals from the population are, at the following computation stages, improved by optimization heuristic algorithms described in section III-B.The behaviour of the system is controlled by the strategy.The control strategy defines parameters and methods for the whole system and is based on Reinforcement Learning.
The set of used priority rules includes ones known for RCPSP and proposed for MS-RCPSP [13], [14], [15], [12]: To implement the approach in Spark two main RDD collections are used, one to store individuals in the population and the second one to store tuples.Each tuple contains a solutions and the algorithm that has been assigned to improve them.For selecting solutions and assigning algorithms to them the control strategy is responsible.The system state is stored and used by the control strategy to manage effectively the process of searching for the best solution.The general schema of the proposed approach can be seen in Fig. 1.
The Reinforcement Learning (RL) based cooperation strategy to control agents was proposed in [10], [11] for RCPSP and next partly adopted in the approach of solving MS-RCPSP by Asynchronous Team (A-Team) of agents in Multi-Agent System (ATMAS) described in [12].In the approach proposed in this paper the concept of the RL based strategy has been used to control the execution of optimization heuristic algorithms using RDD collections in Spark environment.To describe the approach in a more detailed manner the following notation is used: • nSGS -number of SGS procedure calls, • maxSGS -maximum number of SGS procedure calls, • angDiv -average diversity in the population P , • minAvgDiv -minimum average diversity in P , • nS new -number of newly generated solutions, • pRA -percent of solution to be removed from population Additionally to control the system, two probability measures have been used: • p mg -probability of selecting the method mg ∈ M g to generate a new individual, • p ma -probability of selecting the optimisation algorithm ma ∈ M a used to improve individuals in P .To generate new individuals we have proposed the following four possible methods: • mgr -randomly, • mgrp -randomly using random priority rule, • mgb -random changes of the best individual in P , • mgw -random changes of the worst individual in P .For each method the weight w mg is calculated, where mg ∈ M g, M g = {mgr, mgrp, mgb, mgw}.The w mgr and w mgrp are increased when the population average diversity decreases and they are decreased in the opposite case.The w mgb and w mgw are decreased where the population average diversity increases and they are increased in the opposite case.
There are five optimization heuristic algorithms described above.For each of them the weight w ma is calculated, where ma ∈ M a, M a = {maLSAm, maLSAe, maLSAc, maP RA, maEP T A}.The w ma is increased if the optimization agent received the improved solution and is decreased if this not the case.Additionally, the weights for maLSAc and maP RA are increased where the average diversity of the population decreases and they are decreased in the opposite case.The weights for maLSAm, maLSAe and maEP T A are increased where the average diversity of the population increases and they are decreased in the opposite case.
The probabilities of selecting the method are calculated as following: The parameters settings and the resulting probabilities allow to control the system behaviour and balance the exploration and exploitation processes.The p ma is used in selM et for selecting the optimization algorithm for individuals from the population that are subject to an intended improvement.The method is showed as Algorithm 1.

Algorithm 1 selM et(P )
generate RDD collection 2: arrange the solutions in P in random order for all solutions in P do end if add the tuple (as, ma) to RDD 12: end for return RDD The p mg is used in merM et method to merge the old population with the new one created in improvement stage by RDD transformation.The pseudocode of the merM et method is presented as Algorithm 2.
It can be noticed that, as a result, the better solutions received from optimization algorithms replace the worse ones in the population.Moreover, the new solutions are generated in each stage with the calculated probability according to genM et presented as Algorithm 3.
All decreasing-increasing operations are performed following the proposed control strategy.As the stopping criterion the average diversity in the population avgDiv(P ) and the maximal number of SGS procedure calls are used.

A. Problem instances
To evaluate the effectiveness of the proposed approach the computational experiment has been carried out using the benchmark instances of MS-RCPSP accessible as a part of Intelligent Multi Objective Project Scheduling Environment (iMOPSE) [30].The test set includes 36 instances representing Algorithm 2 merM et(P, P n , pRA) for each solution S n in P n do 2: add S n to P end if

6:
end if if S n is obtained from S k1 and S k2 in P then The detailed descriptions and benchmark data analyses can be found in [19], [23].

B. Settings
The computational experiment has been carried out using Intel Core i7 Quad Core CPU 2.6 GHz, 16 GB RAM.The PPMHRL is coded in Scala using Apache Spark environment.In the experiment the following values of parameters have been used: • Population P of 30 and 50 solutions, • 5 optimization heuristic algorithms: LSAm, LSAe, LSAc, PRA, EPTA using the following parameters: -maxiIt LSAm = 20, -maxIt LSAe = 20, -maxIt LSAc = 10, -maxIt EP T A = 10, -nP art EP T A = 3, • maxSGS = 100000 -maximal number of SGS procedure calls, • minAvgDiv = 0.01 -minimal average diversity in the population, • pRA = 10% -given initial value is decreased when the avgDiv has increased and increased in the opposite case.Computations are stopped when the average diversity in the population is less then minAvgDev or the number of SGS procedure calls is grater than maxSGS.

C. Results
During the experiment the following characteristics of the computational results have been calculated and recorded: best schedule duration (makespan) (Best), average schedule duration (AV G) and standard deviation (ST D).Each problem instance has been solved 10 times and the results have been averaged over these solutions.
The computational experiment results for proposed Parallelized Population-based Multi-Heuristic system controlled by Reinforcement Learning strategy (PPMHRL) have been obtained for population size including 30 and 50 individuals are presented in Table I.
The results obtained by PPMHRL are good and promising.The results for the population with 50 individuals are better than for the population with 30 ones.The average Best result is better by an average of 1.9%, the AV G by 1.7%, and simultaneously the ST D is slightly lower.Results for both considered population sizes outperform the earlier approaches based on A-Team Multi-Agent Algorithm [12] but in this approach the optimization heuristic algorithms have been modified and one additional optimization algorithm has been used.from the literature are presented in Table II.It can be noticed that the results produced by the proposed approach are comparable with the results from several recently published papers.Among several algorithms proposed for solving MS-RCPSP instances, one seems outstanding and outperforms all others including the proposed one.It is also a populationbased algorithm with the search for the best solution enhanced by a hyper-heuristic proposed in [24].The best makespan value for the GP-HH algorithm is better on average by 0.7% as compared with our approach.It should be noted that the difference in performance between the proposed approach and the GP-HH one gets smaller or even nonexisting as the number of activities increases.

V. CONCLUSION
Results of the computational experiment show that the proposed Parallelized Population-based Multi-Heuristic System control by Reinforcement Learning strategy (PPMHRL) is an efficient and competitive tool for solving MS-RCPSP instances.The obtained results are comparable with solutions presented in the literature.
We believe that there is still room for further improvement of the proposed approach.Future research will focus on finding more effective methods for tuning optimization algorithms parameters.The use of reinforcement learning techniques could be further refined by finding better rules for controlling the number of iterations, population merging, and generating new individuals.Another performance improvement can be expected from running the solution procedure on a powerful computer cluster that can easily handle a bigger population of individuals and thus profit from the scale and synergy of interactions between optimization agents.It would also be worthwhile to investigate using different types and numbers of optimization algorithms.

Fig. 1 .
Fig. 1.Proposed PPMHRL system architecture schema EWA RATAJCZAK-ROPEL, PIOTR J ĘDRZEJOWICZ: PARALLELIZED POPULATION-BASED MULTI-HEURISTIC SYSTEM WITH REINFORCEMENT LEARNING 241 All proposed algorithms search for feasible solutions only and feasible solutions only are stored in the population.
Shortest Processing Time first, • LPT -Longest Processing Time first, • EST -Earliest Start Time first, • EFT -Earliest Finish Time first, • LST -Latest Start Time last, • LFT -Latest Finish Time last, • HLSR -Highest Level of Skill Required first -activities are sorted by the level of skill required and SPT, which means that activities with the same level are sorted according to SPT, • LLSR -Lowest Level of Skill required last, • MRS -Most Required Skills first -for each skill in the project the sum of durations of activities which need this skill is calculated, next the activities are sorted by the duration of required skill and LPT, • LRS -Least Required Skills last -for each skill in the project the sum of durations of activities which need this skill is calculated, next the activities are sorted by the duration of required skill and LPT.
k2 then add S n to P for each mg in M g do generate pRA • |P | • p mg solutions using mg method and add them to RDD 4: end for return RDD projects consisting from 78 to 200 activities.The file names of the instances are in the form n_m_pr_st.def, where n means the number of activities, m the number of resources, pr the number of precedence relations and st the number of skill types.

TABLE I PERFORMANCE
OF THE PROPOSED PPMHRL SYSTEM IN TERMS OF SCHEDULE DURATION (MAKESPAN).

TABLE II COMPARISON
WITH THE RESULTS KNOWN FROM THE LITERATURE IN TERMS OF SCHEDULE DURATION (MAKESPAN).