Reliability Modeling of OSS Systems based on Innovation-Diffusion Theory and Imperfect Debugging

Open Source Software (OSS) has obtained widespread popularity in last few decades due to the exceptional contribution of some well established ones like Apache, Android, MySQL, LibreOffice, Linux etc. not only in the field of information technology but also in other sectors such as research, business and education. These systems are characterized by a huge shift in development pattern they adopt in comparison to proprietary software. Reliability modeling for such systems therefore is a growing area of research now days. Number of users adopting and working on refinement of such systems post-release play an indispensible role in their reliability growth. In this paper, we have proposed a software reliability growth model (SRGM) based on Non-homogeneous Poisson process (NHPP) based on number of users, under the phenomenon of Imperfect Debugging. The renowned Bass Model from Marketing based on the Theory of Diffusion of Innovation is used to depict the user growth phenomenon. Various fault content functions are considered in proposed models to represent imperfect debugging conditions and their performance is evaluated on fault dataset of GNOME 2.0. Four goodness-of-fit criteria namely Coefficient of Determination, Mean Square Error, Predictive Ratio Risk, and Predictive Power are used to calculate the estimation accuracy of all the proposed models and it has been observed that prediction capabilities of models based on imperfect debugging phenomenon is better than model assuming perfect debugging situation.


I. INTRODUCTION
ITH the advancements in field of technology, a visual expansion can be seen in the software industry.Open source software (OSS) has revolutionized the development trend of software in past few decades.Ubiquitous acceptance of OSS has become the sole reason for a huge inclination of developers towards these software systems.OSS can be defined as the software for which the source code is shared to learn, modify, and extend the software under some licensing guidelines [1].Some of the characteristics of OSS due to which traditional SDLC models cannot be applied to OSS are described as follows: W A. Characteristics of OSS  Unclear Requirements: OSS development doesn't witness a dedicated requirement elicitation phase where requirements are documented as in case of closed software.Here, development just starts with a single developer or a small group with a random idea.Requirements are not properly framed and freezed before development.
 Unstable Team Size: On contrary to closed source software a dedicated team of fixed number of people work on a project, OSS doesn't has fixed number of people working on it.It varies with time and attractiveness of software.
 Minimal Testing Effort: In OSS development, the main focus is given to implementation of idea as opposed to closed source software where sincere efforts and significant time is spent on testing phase after development completes.
 No Deadlines: Since development is totally dependent on voluntary participation of developers across the world, no strict deadlines on deliverables can be imposed.
As observed in above characteristics, during software development life cycle (SDLC) of OSS, it doesn't undergo an exhaustive testing phase as opposed to proprietary software where extensive testing efforts are spent before release of software.Because OSS cannot be tested rigorously for its functionality in such a limited duration of time, it becomes significant to study its reliability growth during its user phase.Software Reliability is the probability of software to perform its operations without any fail for a specified time interval in specified conditions [1].Reliability growth of software can be modeled by a software reliability growth model (SRGM).An SRGM is a mathematical representation that depicts the software fault detection process as a function of different factors and parameters e.g.CPU time, Testing effort expenditures, Number of test cases, Code coverage etc.In the process of simulating software fault process with an SRGM, it is often assumed that each time a fault is encountered, the responsible fault is removed with certainty.However this assumption is quite impractical as correction of a fault involves modifications in original source code which may lead to addition of new faults.In this paper, we propose an SRGM in which phenomenon of software fault detection is modeled with respect to user growth function on real life failure dataset (GNOME 2.0) under imperfect debugging conditions.

II. LITERATURE REVIEW
Numerous studies have been done in past to model reliability growth phenomenon for OSS and various SRGMs have been proposed.Goel-Okumoto model [3], Yamada S-Shaped model [4], Musa-Okumoto model [5] are some of the traditional NHPP based software reliability growth models for closed source software systems.Paulson et al. [6] performed an empirical study on open and closed source software to quantitatively investigate and validate the perceptions about these software.Rossi et al. [7] discussed the pattern of occurrence of faults in various OSS and thus make reliability predictions for future.Yamada and Yamaguchi [8] discussed a statistical process control method for OSS to determine the stability and thus getting estimation of development time needed to attain desired level of reliability.Li et al. [9] proposed reliability analysis model and used the model to predict optimal version-updating for OSS.Tamura and Yamada [10] combined software growth modeling with neural network approaches to estimate the reliability of OSS.Yang et al. [11] have given an SRGM for reliability estimation of multi-release OSS.Several other studies also have performed analysis on reliability of open source software [12,13,15].Various studies demonstrated the reliability analysis of software under imperfect debugging scenarios.Kapur et al. [16] proposed frameworks to derive SRGMs in the presence of some realistic processes like imperfect debugging and error generation.Pham [17] developed a cost model with imperfect debugging conditions considering penalty cost due to delays in software delivery and random length of software development.Chu-Ti Lin [18] investigated the effects of imperfect debugging in modeling reliability via a simulation based model.In our paper, we model software fault process for OSS in its functional phase relating the usage factor with reliability modeling and comparing its performance under different fault content functions used for imperfect debugging element in model.

III. MODEL FOUNDATIONS
The SDLC followed by OSS is unlike normal commercial software.For OSS, the testing effort spent is almost negligible in comparison to commercial software and therefore voluntary participation from developers across the world is crucial in refinement of quality of OSS.Studies done in past in direction of reliability assessment of OSS assumed that fault detection rate for OSS in operational phase will follow a hump-shaped curve.Initially, it grows with the growth in volunteer participation and reaches the highest point.This is due to the craze in developers for the new product in market.And then starts decreasing with the decrease in number of volunteers with time.This may be due their fall in interest for same product over time or due to the introduction of some new product in market.Similar trend is observed in user growth pattern of an Innovation.According to Rogers (1962) The important features of OSS like cost effectiveness, easy access and sharing of code etc are easily observable by people using it like educational institute, employees in corporate sector, freelancers etc. Due to the presence of above characteristics in OSS, it can be considered as an Innovation.The usage growth process for an innovation can be best explicated with the renowned Innovation Diffusion Model of marketing (Bass, 1969).Since OSS is an Innovation it is justified to apply this model to describe user growth for an OSS.

B. Diffusion of Innovation
According to Rogers [19], the process of propagation of innovation i.e. new idea or new product, among the members of the communication system over a time frame is known as diffusion.As OSS is an Innovation, This theory perfectly applies to it.Among the potential volunteers of OSS, initially it is adopted by ones who are opinion leaders i.e. people who adopt it based on their own interest in software.They are also known as Innovators.Another group of people who later start using OSS based on word-ofmouth from innovators are known as Imitators.To represent the usage growth factor in our proposed SRGM we have used The Bass Model [20].• The number of failures during operational phase is dependent upon the number of faults remaining in the software.• As soon as any deviation from expected behaviour is encountered, it is considered as a failure and the fault that causes that failure is located.There is a possibility that at the time of fixing of a fault, few additional faults may get introduced.• The number of faults removed is a function of number of users working on that software.• The number of users is assumed to be a function of time and they are represented by the diffusion model given by Bass [20].Based on above assumptions, the failure phenomenon can be illustrated as follows: (3) The components on the right side of the equation are discussed as follows: This component describes the rate of detection of faults with respect to users.The rate at which failures occur depends upon the number of faults remaining in the software.As per this assumption the differential equation for fault removal can be written as, (4) where, α(N) denotes the fault content function with respect to number of users.

Component-2
The concept of user growth in adoption of OSS is the focus of the paper.The Innovation-Diffusion Model proposed by Bass [20] is used to describe user growth phenomenon in this study.According to Bass, the process of adoption of innovation in the potential population comprises of innovators and imitators and is defined by the equation ( 1) and is used to represent this component in equation ( 3).The solution of this equation for initial condition N(0)=0 is given by equation ( 2).

D. Fault Content functions
In this paper, we have assumed fault content function as a function of number of users working on OSS in its operational phase and is denoted by a(N).The reason behind this is as the number of users working on an OSS increase, the more are the changes and modifications done in original code and hence the fault content also increases.Different fault content functions for corresponding to the cases of perfect and imperfect debugging are discussed as follows, Perfect Debugging: The process of detection of faults and their rectification is termed as Debugging.Debugging process that does not incorporate additional faults in the software is referred to as perfect debugging.
Case 1: In this case the fault content function represented by α(N) is assumed to be a constant.i.e. no new faults are introduced during debugging process.Therefore here, a ( N )=a (5) Imperfect Debugging: Imperfect debugging is the phenomenon where new faults get introduced while correcting the previous ones.Various fault content functions pertaining to this phenomenon are discussed hereby, Case 2: In this case, we considered number of faults to be a linear function of number of users.
Case 3: In this case, we consider an exponential fault content function using Yamada et al. [21], it means faults are introduced exponentially with respect to number of users.a ( N )=a e αN (7) Case 4: Here, we have adopted rate of introduction of new faults as a function of the number of faults already removed in the software.
Case 5: Under this case of imperfect debugging, we assume that the new faults can be introduced exponentially per detected fault (Pham & Zhang [22]).Here c is assumed as constant.
Here, N=N (t) represents the number of users of OSS up till time t.
(5) Proposed Models The proposed models obtained by combining both components of Equation (3) are explained as follows: SRGM 1: We obtained following model under above stated case 1.
SRGM 2: This model is obtained under case 2 which represents imperfect debugging.

IV. NUMERICAL STUDY
For the purpose of parameter estimation, we have used the real time failures data set of a very renowned OSS i.e.GNOME 2.0 provided by Li et al. (2011) as given in Table 1.Estimation of proposed models is performed with the Least Squares Principle.The estimation results of the models proposed in this study as described in Equation 10-14 on the GNOME 2.0 dataset are presented in Table 2 .Also, we have performed the comparison of our proposed SRGMs using four goodness-of-fit criteria namely Coefficient of Determination (R 2 ), Mean Square Error (MSE), Predictive Ratio Risk (PRR), and Predictive Power (PP).The expression and interpretation of these is discussed may be found in Ref. [23].
The goodness-of-fit values obtained for the above mentioned criteria for the five proposed SRGM are summarised in Table 3.The goodness-of-fit curves obtained corresponding to SRGM 1, SRGM 2, SRGM 3, SRGM 4, and SRGM 5 are shown in Figure 1-5.

V. DISCUSSION ON RESULTS
From Table 2, It can be observed that the value of q which represents the coefficient of imitation is always greater than p which is the coefficient of innovation.This observed relation between p and q is consistent with the Bass Diffusion Model and illustrates the importance of word of mouth on the decision making of used by the potential users.From the Table 3, it can also be concluded that the all the SRGMs based on bass growth function fits data well.Moreover, from the values of goodness-of-fit criteria obtained it can be inferred that models biult over imperfect debugging assumption (SRGM 2-SRGM 5) outperform the model depicting perfect debugging situation (SRGM 1) which explains the fact that imperfect debugging models are close to realistic situation.

VI. CONCLUSION AND FUTURE SCOPE
In this study, we have related user growth with reliability growth phenomenon for OSS.Bass Innovation Diffusion model has been used to represent user growth function.Five SRGMs have been proposed using various fault content functions to model perfect and imperfect debugging phenomenon.The mean value functions for the proposed SRGMs are used and parameters' estimates are calculated using least square estimation technique.Performance of all the proposed models is compared using four goodness-of-fit methods: R 2 , MSE, PRR, and PR and it is found that imper- fect debugging models gave better estimation results as compared to model with assumption of perfect debugging.In future studies the functions to model imperfect debugging can be extended to include the concept of change point or randomness in user growth.For user growth we have used Bass Innovation-Imitation model.In future, we may extend our work for multi release OSS.
is a mathematical representation of process of diffusion.It depicts the concept of adoption of a product (OSS) among the potential users i.e.Innovators and Imitators through a mathematical expression.According to Bass, the diffusion process is defined by the following equation, of adopters at time t, represents those adopters (Innovators) who are not influenced by number of users already adopted therefore p denotes the coefficient of innovation.The term represents those users (Imitators) who are influenced by number of previous adopters and thus q depicts the coefficient of imitation.Solution of equation (1) under the initial condition of N(0)=0 results in equation (2) which is given as follows: Expected number of faults removed in time interval (0, t] N, N(t) Cumulative number of users in time interval (0,1] a Constant, the number of initial faults in a software b Constant, fault removal rate p Constant, coef cient of innovation q Constant, coef cient of imitation Constant, total number of potential users of the software Proportion of error generation Assumptions • Software fault process is a NHPP phenomenon.

Figure 4 :Figure 5 :
Figure 4: goodness-of-fit curve for SRGM 4 , An Innovation can be characterized by following key features.It refers to the level of difficulty an adopter of innovation will face to learn or use innovation.Various communities are there on internet for discussion of problems and challenges on OSS.They serve perfect to carry any discussion on OSS and pass on the reviews of the product to other potential users.
4. Trialability: It refers to the degree to which the innovation can be explored and tested by potential users.OSS is released in beta versions for the trial of users.Moreover after the release source code is also made available so that it can be expanded and customized as per needs.5. Observability: It is the extent to which results of innovation are accessible to group of potential adopters.

Table 1 :
Detected faults in GNOME 2.0 release

Table 3 :
Goodness of fit Comparison Table