A new benchmark dataset for Multi-Skill Resource-Constrained Project Scheduling Problem

In this paper novel project scheduling difficulty estimations are proposed for Multi-Skill Resource-Constrained Project Scheduling Problem (MS-RCPSP). The main goal of introducing the complexity estimations is an attempt of estimation the project complexity before launching the optimization process. What is more, the dataset instance generator is also presented as a tool to create new instances for extending the research area. Furthermore, the dataset proposed in previous works is extended by new instances, described thoroughly and released as a benchmark dataset. The dataset instances are also scheduled using simple heuristic and greedy algorithm in duration- and cost- oriented optimization modes. Finally, a brief summary of investigated methods and potential further research directions is presented.


I. INTRODUCTION
R Esource-Constrained Project Scheduling Problem stands as one of the most important and the most investigated [9], [10] kind of known types of scheduling problems.It is because of its practical nature and the need to find good ways for resolving it not only for scientific, but also industry purposes.Its goal is to find the best schedule for the project, by assigning scarce resource to defined tasks.The quality of the schedule is mostly defined as its duration, cost or some combinations of those indicators.
As MS-RCPSP is the extension of classical RCPSP, it makes the problem NP-hard [2].Hence, there is no way to find a method that would be able to find the optimal solution in polynomial, reasonable time.Therefore, one of the main approach to solve RCPSP and its potential extensions is to use soft computing methods, especially metaheuristics [19].
To make the problem definition more practical in industrial point of view, we introduced the skill domain.Tasks require some specified skill to be performed by resources owning some subset of skills defined in the project.Therefore, not every resource is able to perform every task in the project.It makes the problem more constrained but on the other hand -more realistic.RCPSP extended by skills domain is called Multi-Skill RCPSP (MS-RCPSP).
The goal of the paper is to present several indicators for the difficulty of the project to be scheduled.The difficulty could be understood as a measure how much the solution space is constrained -how hard is to build feasible and good enough schedule.The secondary objective is to share the dataset and propose it as a benchmark for other researchers, to build a common platform for evaluating methods solving MS-RCPSP.
The rest of the paper is organised as follows.Section II presents some approaches in solving MS-RCPSP using some of the mentioned metaheuristics.Then section III describes the problem statement.Then Section IV presents complexity estimations we proposed for MS-RCPSP.Section V describes the way how new instances are generated.Furthermore in the Section VI the dataset has been presented and then its instances have been used in experiments in Sec.VII while the last Section VIII describes approaches we have recently investigated and proposes ways for further research.

II. RELATED WORK
NP-hard [2], combinatorial nature of MS-RCPSP is one of the reasons of common use of metaheuristic-based approaches in solving the problem.Nevertheless, some constraint programming methods or simpler heuristics are also used to solve this kind of problems [20].
However, there is still lack of papers regarding multiobjective Multi-Skill extension of RCPSP.Some approaches solving MS-RCPSP in project duration domain [1], [16] or project cost domain [12] could be found.On the other hand, there are methods solving classical RCPSP extended by cost domain, but without skills considerations.Such research has been presented in [15], [6], [4], [13] and [24].Hence, we have decided to combine those two elements: multi-objective optimization and multi-skill domain for project scheduling problem.
Although classical RCPSP is deeply investigated and numerous approaches could be easily compared using PSPLIB instances, it is very hard to find multi-objective MS-RCPSP methods working on datasets that could be regarded as a benchmark.Some papers describe instances artificially generated ( [5], [16]), while some others propose methods of PSPLIB dataset adaptation ( [1], [3], [7], [12]).We analysed some published benchmark datasets, but they were usually unsuitable for our approach as they do not cover multiobjective nature of the problem, even multi-skill domain has been developed ( [23]).The other unsuitable example is a benchmark for Multi-Mode RCPSP (MM-RCPSP) published in [21], but it does not involve skills constraints and make the main focus on multi-mode characteristics.Hence, the need of definition new dataset has arisen.Some difficulty estimations for any project scheduling problems could be found in [11] or [13].However those proposed difficulty estimations based mostly on tasks, precedence relations between them and resource properties.There is a lack of difficulty estimations that would be dedicated for MS-RCPSP, involving skills domain.
Among many papers regarding the resource -constrained project scheduling problems and its extensions, we found that we had something in common with the approach presented in [13].Despite some similarities, there are some crucial elements that make our approach, defined in detail in [14], different.We regard both of them as worth of investigating.Tab.I presents the comparison -similarities and differences in four main areas we decided to point out.
Based on the information in the Tab.I, some common elements between our approach and the one presented in [13] in all of the mentioned areas can be found.Firstly, investigated problems are similar in a way that both of them regard multiobjective optimization in Multi-Skill Resource-Constrained Project Scheduling Problem (MS-RCPSP).Both problems are additionally defined by some resource, precedence and skills constraints.However those constraints differ in details (i.e.distinguishing skill types).
There are also some similarities regarding the dataset published.First of all, both of the datasets are published in the Internet, so anyone has an access to dataset instances and can use them to investigate own optimization methods in MS-RCPSP.What is more, the number of dataset instances is the same.However, the strategy of building the dataset was different.We used information about number of different skill types and number of precedence relations.Not only the most common indicators, like number of tasks or number of resources, what can be found in [13].Based on those additional indicators we tried to build as most balanced dataset as possible.Therefore, we tried to adjust the number of resources, skills and precedence relations in a way to make our complexity indicators similar for 100 and 200-tasks project instances as well.

III. PROBLEM STATEMENT
The goal of the MS-RCPSP is to order given set of tasks and assign resources to them in a way to provide feasible and as good solution as possible.The quality of the solution could be measured in its duration, cost or any other measure defined according to business requirements.
In MS-RCPSP the set of tasks (J) is given, while every task has to be performed during the project execution.Each task is described by its start (S j ) and finish dates (F j ), duration (d j ) and skill required by it to be performed.What is more, tasks can be related between themselves by precedence relations.It means some tasks (successors) cannot start before some other would be finished (predecessors).It makes the solution space for given instance more constrained, as there are fewer possibilities to put the tasks in given period.Predecessors of given task j are defined as P j while overall number of predecessors in a project is p.
Furthermore, the set of resources (K) is also given.Every resource k is described by its salary (s k ) and skills covered (Q k ).Therefore, a subset of tasks than can be performed by k resource could be obtained and is denoted as J k .
Skill required by task to be performed determines which resource can be assigned to it.Every skill type could appear in a project in various familiarity levels, denoted as an integer value from 0 to 4. Resources with skill type required by given task but on the lower than required level cannot be assigned to such task.The number of all skills (including different familiarity levels) is denoted as q, while the number of skill types in the project is denoted as q What is more, any resource cannot be assigned to more than one task in the overlapping period.If such a situation occurs, conflict is detected and has to be resolved, to get valid, feasible solution.It is made by shifting some of conflicting tasks in time-line in a way to make it start just after another conflicted task would be finished.The decision which of conflicted tasks should be taken to be shifted is made by checking which has been previously added to the project definition, because we do not distinguish various levels of task priority.Each task is equally important to be scheduled in the project.
Any project schedule (P S) has to be conflicts-free and has to satisfy the precedence constraints between tasks.If it satisfied those both kind of constraints, we would call it as a feasible schedule.Only feasible ones can be regarded as finite solutions.What is more, every infeasible solution could be made feasible.However, making schedule feasible could make its duration longer.

A. Calendar restrictions
Due to use Microsoft Project as a base for our dataset, we needed to obtain some calendar restrictions that are strictly related to used software.
First of all, standard calendar in Microsoft Project is designed to handle projects in real-life, where classical fivedays week of work is used.It was also a requirement asked by the VolvoIT enterprise that we cooperate in the research field of project scheduling.However, weekends are taken into account.If resource is assigned to task that cannot be finished before weekend, it will be finished after the weekend.The task duration is bigger but number of man-hours (or man-days) required for given task does not change.Therefore various project duration measures can be obtained.One can be made based on the overall duration of project -between its start and finish dates, including weekends where tasks are not performed but influence on other tasks' start dates and the project finish.Other approach could be to ignore any festivals and weekends and regard seven days week of work.Until now we prefer the first approach as it is more practical.
Furthermore, the localization issue has to be taken into account when considering calendar restrictions.Depending on the localization settings, some changes in the calendar could appear, regarding some national or cultural-related festivals.In example, there would be other free days in China than in Poland, where different festivals are taken place.Linking above constraints with potential dynamic date of project start -the date, when the first task is assigned to resource in the timeline -there is a risk that the same project with the same task-to-resource assignments (schedule) can be finished in various dates, depending on the day of start.Project instances start at various dates, except the ones with D* suffix that have been prepared strictly for given enterprise and were required to be start all at the same day.It has been set to 12th April of 2012.
To avoid those calendar restrictions .defformat has been introduced.It is described in detail in Subsec.VI-A.

B. Evaluation function
The goal of MS-RCPSP is to find the best (as quick or / and as cheap as possible) final project schedule.Hence, we could present it as bi-objective optimization problem.Because of totally different domains of duration and cost, we cannot simply aggregate those two objectives.Therefore, the normalization process is performed, to get the value scope between 0 and 1.It allows us to aggregate those two objectives and combine them into one evaluation function.
We have also preserved the possibility to choose which objective is more important in given optimization process.It is made by setting weights both for the duration (ω τ ) and cost aspect.The sum of both weights sum to 1 and the scope of values is from 0 to 1.It means that setting the weight of duration aspect to one automatically sets the weight of cost to 0 and vice versa.Naturally that weight can be set by float value.Specifically, both weights could be set to 0.5.In that case, both objectives would be equally important in the optimization process.We proposed three baseline weight configurations: duration optimization (DO, ω τ = 1), balanced optimization (BO, ω τ = 0.5) and cost optimization (CO, ω τ = 0) [14].
An important remark is that those objectives are in opposition to each other.It means that setting weights to make the optimization process more cost-oriented could cause getting cheaper project schedule, but with the risk that final schedule would be longer.Analogously, shorter project schedule could be obtained with spending more money on it.
Evaluation function is formulated as follows: where: w τ -weight of duration component, f τ (P S) -duration evaluation component, f c (P S) -cost evaluation component.Both components are non-negative values, while The time component f τ (P S) is calculated as follows: Where: τ max -maximal (pessimistic) possible duration of the schedule P S, computed as the sum of all tasks' duration [14].
It occurs when all tasks are performed serially in project: oneby-one.No matter, how many and how flexible resources are.
The cost component f c (P S) is defined as follows: where: c min -minimal schedule cost -a total cost of all tasks assigned to the cheapest resource, c max -maximal schedule cost -a total cost of all tasks assigned to the most expensive resource [14].Note: c max and c min do not involve skill constraints.It means that c min value could be reached also for non-feasible solution.Analogously to c max .

C. Solution space size
Given number of tasks and number of resources, we can estimate the solution space size (SS), as: Where n is a number of tasks and m is a number of resources [14].However, that estimation also takes into account nonfeasible solutions, because skill-constraints are not satisfied.
To give an example, let's assume n = 10 and m = 5 -without any precedence relations we get SS(10, 5) = 3.54 * 10 13 combinations.It is worth mentioning that each task can be placed only once in the schedule, but resources could be assigned more often.An extreme situation occurs if the same one resource would be assigned to perform each task.Large solution space size makes impossible checking each of the combinations manually.However, space includes also non-feasible solutions that do not satisfy defined conditions.Moreover, above example is a simplification and in real world problems we meet a higher number tasks (about n = 100) and resources (m = 20) -it gives SS(100, 20) = 1.19 * 10 288 of all solutions.

IV. COMPLEXITY ESTIMATIONS
As a result of cooperation with VolvoIT Department in Wroclaw [18], [20], [19], we defined following elements [14]: • requirements and constraints dedicated to the industry, • project scheduling difficulty indicators.Project difficulty indicators have been verified and approved by experienced project manager in the enterprise.
The main goal of investigating such estimations was to compare how the project elements (tasks, resources, precedence relations, skills) characteristics could influence on the optimization process based on the quality of obtained result (project schedule duration or performance cost) or optimization processing time.
Proposed difficulty estimations are described below.All estimations are normalised before being taken to compute the overall complexity measure.

A. affiliation (λ)
States, how much tasks are related between them.The bigger value means the tasks are more related.The project complexity is bigger, because the scheduling flexibility is restricted (more tasks are related to others, so they cannot be scheduled flexibly).It is computed as follows: Where p -number of precedence relations, n -number of tasks.

B. load (ν)
Reflects, how much resources are loaded by tasks.The bigger value means, the more tasks are assigned to one resource (the project complexity is bigger, because the solution space is bigger).It is computed as follows: Where m -number of resources.

C. time difference (Φ T )
Describes how tasks are varied by their duration.The bigger value means tasks are more varied.That makes scheduling more difficult, because tasks' order influences on overall duration time.It is computed as follows: Where: σ d -standard deviation of tasks' duration in schedule, d max -maximal task duration in schedule, d min -minimal task duration in schedule.

D. cost difference (Φ C )
Indicates how tasks are varied by their performance cost.The interpretation is similar to the time difference (Φ T ).It is computed as follows: Where: σ C -standard deviation of tasks' cost in schedule, c max -maximal task cost in schedule, c min -minimal task cost in schedule.

E. variety (µ)
Reflects how resources are varied by their skills.The bigger value means the project is more difficult to be scheduled because tasks are more dedicated to resources (no other can be assigned to the specified task).It is computed as follows: Where: q -number of different skills existing in the project.Important: each level of the same skill name is regarded as a new skill.

F. universality (β)
States the average number of resource skills.The bigger value means it is easier to schedule a project because resources are more universal.It is computed as follows: Where: Q i -number of skills owned by i resource.

G. adjustment (π)
Shows how many resources available to be assigned in the project are capable of performing tasks that are needed to be performed.Ergo -how many resources can deal with each task.The bigger value means it is more difficult to schedule a project because resources are strictly adjusted to the tasks by their skills covered, and skills needed.It is computed as follows: max(∆(q 1 ), ∆(q 2 ), ..., ∆(q q )) * q (11) Where: Where: RQ(i) -number of resources covering skill i (normalized by number of all resources (m) in the project).
T Q(i) -number of tasks, that require skill i to be performed (normalized by number of all tasks (n) in the project).

H. Flexibility (θ)
The flexibility θ of the instance P S has been estimated as the sum of a number of potential assignments of tasks to a given resource divided by number of resources (n).It can be stated as follows: Where Jk is the number of tasks that can be performed by resource k, while m is the number of resources in a project.
Having discussed the usage of those estimations' legitimacy, each measure has been subjectively weighted and confirmed by an experienced project manager (to determine their priority in overall project's difficulty measure).Having those weights set up, the project's (PF) difficulty measure function could be defined as follows: The bigger the value dif f (P F ) is, the more difficult to schedule the project is.Universality measure has been taken with a negative value.It is because the bigger the universality value is, the project is easier to schedule as resources are more skill-flexible and can be assigned to more different tasks, relaxing more skill constraints.
Depending on project manager preferences, weights assigned to given estimations could be changed, what would influence on the overall dif f (P F ) measure.

V. INSTANCES GENERATOR
The main goal of implementing the dataset instance generator is to provide other researchers the possibility to investigate their methods not only on proposed dataset instances, but also on some other that would be created individually by given researcher.Dataset instance generator has been prepared for MS-RCPSP but it can be easily adjusted to handle classical RCPSP instances like PSPLIB [8].It has been implemented in JAVA programming language, using MPXJ1 library for processing project files from MS Project.It can create project definition not only in .mpp(XML) format, but also the simpler (.def) one.The more detailed description of .defformat is available in Subsec.VI-A.Instances have been created based on the real-life project instances got from international entreprise (Volvo IT).
Instances generator is an element of resources developed in our Intelligent Multi-Objective Project Scheduling Environment2 platform.Besides instances generator, the platform contains solution validator (see Subsec.VI-B), instances we generated and used to verify our approaches and the best found solutions for those instances in three above-mentioned optimization modes: DO, BO, CO.Every solution is saved in ready-to-use in MS Project .xmlformat, containing all tasks, resources, skills, precedence relations and obtained schedule.
The general process of generation new instances could be split into main steps: 1) Read and validate parameter values provided by the end user 2) Define resources, 3) Define skills and assign them to resources, 4) Define tasks and precedence relations, 5) Assign resources if necessary, 6) Save project.In the further parts of this section, following steps would be described in detail.The pseudocode of the generator has been presented in Alg. 1 Algorithm 1 Generator pseudocode q ← setN umSkills(min, max) for j = 0; j < q do 9: set skill type f rom range(minST, maxST, q j ) 10: set skill level f rom range(minSL, maxSL, q j ) 11: if skill not exists(q j , pool) then 12: pool ← pool.add(qj ) 13: r ← addSkill(q j ) 14: #generate tasks 15: for t ∈ J do 16: t ← set duration(minDuration, maxDuration) The number of resources that would be generated is provided as a parameter for the proposed tool.For every generated resource its standard rate salary is set as a random between the minimal and maximal value (see Alg. 1, line: 5) set by the end-user in the configuration of the generator.

B. Skills
Analogously to resource definition, the number of different skill types is set by the end-user during the configuration.We declared 4 levels of the skill familiarity for given resource.However, the end-user is also obliged to define how many types of skills could be covered with given resource.It is desired that number of skill types owned by resource would be no greater than the number of skill types existing in the project.Number of skill types is set randomly from the minimal and maximal value (set by end-user).However, during recent dataset instances generation, we decided to make the number of skill types for every resource as a constant -minimal (min) and maximal (max) number of skill types have been set as the same value -line 7.
During skill generation process for given resource, a skill type is selected from given range of types (line: 9) while skill level is also selected from given scope (line: 10).We decided to make four levels of skills as it covered the requirements presented by project manager from the enterprise.If selected skill is not available in skills pool, it is both assigned to the resource and added to the skills pool (line 13).It provides that skills required by any tasks to be performed would be selected from the pool that are owned by at least one resource (line 17).

C. Tasks
Having resources, and skills covered by them defined, tasks could be obtained.The number of tasks is set by the enduser.What it more user also sets the duration scope of the task (line: 16).Those are the bounds within the task duration would be randomly set (line: 16).For the sake of generation project instances for our research, we made an assumption that task duration would be the number between 8 and 40 hours.It reflects to the range between 1 and 5 days of any task's duration.The skill required by any task is selected from the pool of available skills in given project instance (line: 17).

D. Precedence relations
One of the last steps during generation process is to define the precedence relations.End-user defines the number of those relations.S/he is also responsible for defining the general scope of relations.It means, the bigger relations scope set, the bigger distance between tasks is allowed in building the precedence relations diagram (line: 21).In other words, setting small relation scope could cause that resulted schedule would contain precedence relations between tasks that have been defined one by one or with slight distance (like task first and third).However, if the relation scope would be set to a bigger value, there could be relations in the final schedule between some tasks defined in the beginning and the end of the generation process (e.g.precedence relation between the first and the last task defined).
The bigger the distance between source and destination task is, the more complex the project instance critical path is.As a consequence, duration-based optimization would potentially be more difficult for such project instance.

E. Assign resource
Finally, the initial schedule could be built by assigning resources to given tasks, preserving precedence and skill constraints (lines: 26-28).Produced schedule would always be feasible.The way how resources are assigned to tasks is set randomly.Hence, if there is more than one resource that can be assigned to given task, then generator could assign this task in different ways in different executions of generation process.Schedule is generated using the Serial Generation Scheme [9], what provides that generated schedule would be always feasible.

F. Save project to file
The last step in the process of generation an instance is to save (line: 29) the resulted project.If user sets the output file type to xml (mpp), then generator produces the result in the format that could be easily loaded in Microsoft Project tool.If user selects def output format or does not select any, then the result would be saved in more compact format that could be read by any text editor.If assign resources option has been ticked, then tasks can have resources assigned.However it regards only generating output only in xml (mpp) format.The name of produced file relates to the name proposed by the end-user in given text field in the configuration screen.
VI. DATASET SUMMARY Due to evaluate not only the project schedule duration, but also the cost of the schedule including skills domain, we cannot use the standard PSPLIB benchmark dataset [8] in our research; that does not contain any information about the task performance cost.What is more, PSPLIB dataset instances do not reflect the MS-RCPSP.Hence, we prepared the dataset, containing 36 project instances, which have been artificially created, in a base of real-world instances, got from the Volvo IT Department in Wroclaw.
The dataset summary has been presented in the Table II.There are two groups of created project instances: one contains 100 tasks and the second -200 tasks as typical ones performed in given international enterprise.Within each group, project instances are varied by number of available resources and the precedence relationship complexity.Number of resources for instances from both groups were chosen in a way to preserve constant average resource load and average task relations ratio for given instances.The skill variety has been set up to 9 or 15 different skill types for each project instance while any resource can dispose of exactly six different skill types.Because of the different resources and relations number, the scheduling complexity for each project is varied.
This dataset stands as an extension of dataset presented in [18], [19], [20], and that is the reason some instances are named with suffix Dx.This suffix refers to dataset instances that have been previously created and presented in those papers.Because of the extension the dataset, the need of introducing more clear namesystem has arisen.Suffix has been added to refer previously created files, keeping the naming convention applied after dataset extension.

A. Project definition format (.def)
Because of changing the research's approach to be more generic, we decided to focus more on the dataset instances  stored in .defformat that is easier to maintain and use by researchers.Hence we adopted instances created for MS Project to more generic form.It led to remove the summary tasks that are specific for MS Project .mppformat.Summary tasks are used to group atomic tasks into more complex (i.e.task called 'development' could be split to some atomic tasks: database structures creation, development of business logic and development of user interface).However, MS Project allows to use summary tasks as predecessors for others.Therefore we multiplied precedence relations by copying them from predecessor's summary task to all of tasks included by this summary one.As a result new Dx instances have been created.Furthermore, some tasks, represented as summary ones, have been removed from the project.To be consistent with previous works, we keep names of those files the same.Those files are provided with additional description explaining the difference in number of tasks and precedence relations between file name and file content.
Adjusted dataset instances with Dx suffix have smaller num-ber of tasks.Number of precedence relations is significantly bigger in all Dx instances.Roughly describing, it is more than twice precedence relations as in former instances, while number of tasks has been decreased in all instances in about 20% (about 20 tasks for instances with 100 tasks and 40 for instances with 200 tasks).
We have also presented in Tab.II the values of proposed complexity estimations.Finally, the overall complexity measure, as an aggregation value of complexity estimations components has been presented.Based on the overall complexity value, the most complex projects in scheduling point of view has been highlighted by bold.The overall complexity measure has been computed according to the Eq.14. Changing weights of complexity estimations components would affect the final complexity value for each dataset instance.Hence, the complexity of each project could be different depending on priorities set by project manager.
In our approach all universality estimations values are equal to 1.It is because we made an assumption that every resource has the same number of different skills and this is also the maximal number of potential skills covered by resource, used in normalization.If we decided to make the number of skills covered by resource various, depending to given resource, then the universality estimation values would not be always equal to 1.0.

B. iMOPSE Solution Validator
We released also an additional tool to validate generated solutions in case of preserving all constraints defined in MS-RCPSP.Such validator is available on the iMOPSE project website 3 .Validator checks whether all tasks have any resource assigned (assignments validation), final schedule is conflictfree (conflicts validation), any task having predecessors is set to be started after all its predecessors would be finished (precedence relations validation) and whether any task has resource assigned that is capable of performing it (skill validation).
Validator shows not only the validation results but also the quality the validated solution -its duration measured in hours and cost measured in some currency units.If some validation rules are broken, they are shown to end-user.
Validator is compatible with .defproject definition format.For further information how to use the validator, please refer 3 http://imopse.ii.pwr.edu.pl to documents related with the tool -User's Manual or Case study -available on iMOPSE Platform.

VII. EXPERIMENTS AND RESULTS
The main goal of conducted experiments was to link and compare both (.mpp [14] and .defbased) approaches, considering the impact of calendar restrictions.
We decided to use simple duration-and cost-oriented heuristic [20], greedy algorithm and compare them with ACO and HAntCO approaches described in [14].Furthermore, greedy algorithm and simple heuristics have been used to schedule .defdataset instances.
However, proposed heuristic and greedy approaches for cost optimization turned out to become the same method.Therefore, presented results are divided into main two parts regarding optimization modes: duration optimization (ω τ = 1) and cost optimization (ω τ = 0).Each of those main part is also divided for three parts in cost optimization (heuristic, ACO, HAntCO) and four parts in duration optimization (heuristic, greedy, ACO, HAntCO).
Table III presents the obtained results for both optimization modes using both proposed methods (simple heuristic and greedy algorithm).It also contains results obtained by ACO and HAntCO approaches described in detail in [14].This table presents optimization results for dataset instances with calendar restrictions (.mpp).
Greedy algorithm is a method that works iteratively.In every step of greedy scheduling, one task is added to the schedule.The decision, which task to which resource should be assigned in given algorithm step is made based on the current partial schedule.In other words: in a given step, currently best taskto-resource assignment option is chosen and the next step is performed until all tasks would be scheduled.Classical greedy algorithm assumes possibility to analyse not only current state of the partial schedule, but also to investigate several further steps.In that approach combinations of several tasks are analysed and the best one, containing given number of tasksto-resource assignments is selected and added to the partial schedule.However, in our approach we discuss only current schedule state, omitting the analysis of several assignments.Therefore, number of steps of proposed greedy algorithms would be always equal to the number of tasks in a project.
For duration oriented optimization mode, greedy algorithm analyses which task should be added to make the partial schedule the shortest.For cost-oriented optimization mode, the criteria of selecting tasks bases on the cost of the assignment given task to given resource.For every task, various resources are analysed to be assigned, and the cheaper one is chosen.
In cost-oriented optimization mode, both greedy algorithm and simple heuristic (Resource Salary based [20]) works according to the same schema, described above.However, for the duration-oriented optimization, heuristic and the greedy algorithm differs in details.In the greedy algorithm task is assigned to given resource and then added to the partial schedule then conflicts are fixed and finally the project duration is computed.In simple heuristic (Successors List Size based [20]) firstly the resource that would be the earliest free (not assigned to any task) is selected.Then task is assigned to this found task while its start time is set just after then end of the last of tasks previously assigned to given resource.It allows to build feasible schedule without the necessity of fixing conflicts as the method is the resource conflict-free.
Taking into account results gathered in the Tab.III we can conclude that for duration optimization method, the HAntCO outclassed other methods, provided the best results in 28 of 36 cases (78%).ACO turned out to be the best method for 6 cases (16%), while greedy gave best results in 3 of 36 cases (8%) and heuristic was the most suitable in 2 of 36 cases (6%).
For cost optimization method, simple heuristic gives the best result for almost all of dataset instances (34/36, 94%).However, for remaining two instances heuristic also provided solution with the smallest cost, but the schedule duration was bigger than for other method (HAntCO).For most of the instances (32/36, 89%) HAnt-CO provided the same, best result than obtained from heuristic.ACO-approach provided the same, best results in 18/36 (50%) cases.The most interesting fact for cost optimization is that ACO provided best results mostly for dataset instances containing 100 tasks -17/18 cases (94%) and only once for dataset instances containing 200 tasks (6%).
In the Tab.IV we compiled the summary of obtained best results for classical optimization methods -heuristics and greedy algorithm for instances not regarding calendar restrictions (.def ).It can be also found in the iMOPSE website.As we are oriented to use .defformat in further research, obtained project schedules are measured by hours rather than days as it has been so far, in .mppformat.Obtained results stand as a benchmark for further research when using .defformat.On the other hand, Tab.III is still regarded as a benchmark for methods working on .mppformat.Results obtained in the Tab.IV show that SLS [20] heuristic provides better results in DO mode in all of 36 dataset instances.It clearly shows that SLS heuristic is definitely better optimization method than greedy algorithm in this problem.
However the project definitions are compatible to each other between .def and .mppformats, there are some small differences in cost result in CO, using the same method.It is because of the adjustment made when transferring .mpp to .defformat.For the sake of simplicity, task's duration in .defhas been rounded up to the integer values.It lead to the differences, because cost of performing project is a sum of each task's performance cost.While task's performance cost is computed as a multiplication of task duration and assigned resource's salary.As a result of rounding up, cost of each task has increased slightly, even though the same resource is assigned to it.Therefore the overall cost is slightly bigger for solutions obtained for .deffiles.

VIII. CONCLUSIONS AND FURTHER WORK
In this paper some novel difficulty indicators for instances of Multi-Skill Resource-Constrained Project Scheduling Problem have been presented.Furthermore the extended dataset has been presented and suggested as a benchmark for this problem, as no other benchmark dataset can be found that satisfies proposed constraints.Furthermore, those instances have been scheduled using greedy algorithm, to provide an initial platform for comparing results obtained by various researchers.
Proposed complexity estimations stand some first step in project scheduling data analysis.Guessing the project complexity could be helpful in parameters' tuning for various optimization methods.As more complex / difficult to schedule project is, the optimization process would potentially last longer for the same parameter configuration than for other instances.Hence, the decision maker could decide to change the parameters, e.g. by decrease number of method iterations.We managed to make those observations sure in our EA-based approach, where building schedule for the project with suffix D2 generally lasts longer than for the project with suffix D1.
The goal of presenting the dataset instance generator is to allow and encourage other researchers to focus on the problem and possible solutions and methods we propose.We still believe there is a lot to investigate and research.What is more, the dataset instance format we propose is very common in many industries, as the MS Project is a common standard.
We are also on the point of investigating approaches concentrated to different multi-objectiveness handling methods.Most of them we analyse are based on Pareto-front (like NSGA-II [22], [17] or other methods).One of the goals is to find a way how to provide a set of non-dominated results to the project manager, to delegate the matter of making decision which of those proposed solutions is the best, according to the specificity of the company it regards.E.g. in some industries the aim is to finish the project as soon as possible while in some others the most important is to perform it in the cheapest way.Still we would like to give the choice from a pool of some solutions.

TABLE I SIMILARITIES
[13]DIFFERENCES BETWEEN IMOPSE[14]APPROACH AND THE APPROACH PRESENTED IN[13]

TABLE II COMPLEXITY
INDICATORS AND DIFFICULTY MEASURE FOR IMOPSE DATASET INSTANCES.PROJECT INSTANCES REGARDED AS THE MOST DIFFICULT TO BE SCHEDULED ARE WRITTEN BOLD, WHILE THOSE ONES, WHO ARE INDICATED AS THE EASIEST TO SCHEDULE ARE WRITTEN ITALIC.

TABLE IV SUMMARY
OF BEST OBTAINED RESULTS FOR DATASET INSTANCES NOT REGARDING CALENDAR CONSTRAINTS (.def ).