Regression Approach to Parameter Estimation of an Exponential Software Reliability Model
Albert Orwa Akuno*, Timothy Mutunga Ndonye, Janiffer Mwende Nthiwa, Luke Akong’o Orawo
Department of Mathematics, Egerton University, Egerton, Kenya
Email address:
To cite this article:
Albert Orwa Akuno, Timothy Mutunga Ndonye, Janiffer Mwende Nthiwa, Luke Akong’o Orawo. Regression Approach to Parameter Estimation of an Exponential Software Reliability Model. American Journal of Theoretical and Applied Statistics. Vol. 5, No. 3, 2016, pp. 80-86. doi: 10.11648/j.ajtas.20160503.11
Received: March 10, 2016; Accepted: April 5, 2016; Published: April 21, 2016
Abstract: Mathematical studies about the likelihood of failures of software systems have been advanced by various researchers. These studies have modeled the behavior of software systems by using failure times and time between failures in the past. The Goel-Okumoto software reliability model is amongst the many software reliability models proposed to model the failure behavior of software systems. To be able to use the model in software reliability assessment, it is important to estimate its parameters α and β and the intensity function λ(t). In this paper, classical parametric regression methods have been utilized in the estimation of the parameters α and β, the intensity function and the mean time between failures of the Goel-Okumoto software reliability model. The parameters α and β and the mean time between failures (MTBF) of the Goel-Okumoto software model have been estimated using the maximum likelihood estimation (MLE) method, regression approach applied to the model and simple linear regression model without assuming the Goel-Okumoto model. When these three estimation methods were validated using root mean squared error (RMSE) and mean absolute value difference (MAVD), which are the common error measurement criteria, regression approach applied to the Goel-Okumoto model outperformed MLE and simple linear regression estimation methods.
Keywords: Goel-Okumoto model, Regression Approach, Maximum Likelihood Estimation
1. Introduction
Various software reliability growth models have been proposed in the last three decades. The models enable software vendors to predict the behavior of software systems before a decision is made to release or to ship the software to users. Amongst the many software reliability growth models is the Goel – Okumoto software reliability model, a Non-Homogeneous Poisson process (NHPP) with intensity function
(1)
where are parameters and
is the failure time. The software reliability model with intensity function given in Equation (1) was proposed by [1] in 1979 hence the name Goel-Okumoto (1979) software reliability model. The model is also called an exponential software reliability model. The reliability and the behavior of the software systems are studied by estimating the parameters of the software growth models. Various parameter estimation criteria have been advanced by different researchers in the past. These methods include but are not limited to, maximum likelihood estimation (MLE) method, least squares method, interval estimation and particle swam optimization method. Most researchers, for instance, [2], [3] and [4] have considered estimation of the parameters of Goel-Okumoto (1979) software reliability model whose intensity function is given in Equation (1) using MLE criteria. Literature from various research, for instance, [5,6] and [7] have indicated that the Goel-Okumoto software reliability model is a good model to represent TBF of software systems.
In this work, based on the Goel-Okumoto software reliability growth model, predictive properties of mean time to failure (MTTF) and thus the estimators of the parameters are computed using three methods; MLE method, regression approach using logarithm of the software failure data with Goel-Okumoto software reliability model assumption and simple linear regression applied directly to the software failuredata. The performance of the three methods of estimation is evaluated using RMSE and MAVD, which are the commonly used performance error measurement criteria in predictive analyses.
Reference [8] considered the point estimation of the power law process using regression approaches and the results were comparable to the traditional methods of estimation.
1.1. Methodology
What follows in this section is the methodology upon which this paper is based. We define mean time to failure (MTTF) and mean time between failures (MTBF) as is frequently used in reliability studies. We also provide software reliability data that will be used in illustrating the derived methods and procedures in section 2.
1.1.1. Mean Time to Failure
Mean time to failure (MTTF) is the average interval of time expected to the next failure time. In other words, given the reliability function , MTTF is a measure of the average time to failure for system with life distribution
.
1.1.2. Mean Time Between Failure
The Mean Time Between Failures (MTBF) is the expected interval length from the current failure time, say, to the next failure time
. Let
denote the conditional distribution of failure time
given
, then the MTBF is defined by
The reciprocal of the intensity function is used to represent the expected time to the next failure time, given that the
failure time occurred at time
, that is,
is considered as the MTBF. Under special conditions, MTBF can be approximated by
. That is,
(2)
1.1.3. Mean Residual Time
Let be a continuous random variable denoting failure time and in the interval
. The mean residual time (MRT) is the average time to the next failure given that no failure occurs up to time
and is defined by
The theorem under section 2.2.2 shows the relationship between MRT and reliability.
1.1.4. Software Failure Data
The following software failure data obtained from [4] has been used for the purposes of estimation and analysis in this study. The data is given in form of TBF, failure times (cumulative time between failure) and the failure number.
Table 1.Time between failures data.
Failure No. | Time between failures | Cumulative time between failures | Failure No. | Time between failures | Cumulative time between failures |
1 | 30.02 | 30.02 | 16 | 15.53 | 151.78 |
2 | 1.44 | 31.46 | 17 | 25.72 | 177.50 |
3 | 22.47 | 53.93 | 18 | 2.79 | 180.29 |
4 | 1.36 | 55.29 | 19 | 1.92 | 182.21 |
5 | 3.43 | 58.72 | 20 | 4.13 | 186.34 |
6 | 13.2 | 71.92 | 21 | 70.47 | 256.81 |
7 | 5.15 | 77.07 | 22 | 17.07 | 273.88 |
8 | 3.83 | 80.90 | 23 | 3.99 | 277.83 |
9 | 21 | 101.90 | 24 | 176.06 | 453.93 |
10 | 12.97 | 114.87 | 25 | 81.07 | 535.00 |
11 | 0.47 | 115.34 | 26 | 2.27 | 537.27 |
12 | 6.23 | 121.57 | 27 | 15.63 | 552.90 |
13 | 3.39 | 124.96 | 28 | 120.78 | 673.68 |
14 | 9.11 | 134.07 | 29 | 30.81 | 704.49 |
15 | 2.18 | 136.25 | 30 | 34.19 | 738.68 |
Reference [7] argued that the software failure data given in Table 1 follow the Goel-Okumoto (1979) software reliability model.
1.2. Performance Error Measurement
In this section, we establish the metrics that will be used to evaluate the performance of the estimation models. There are various performance error measurement tools including but not limited to root mean squared error (RMSE) and mean absolute value difference (MAVD). Since we will use RMSE and MAVD in evaluating the performance of the three estimation models, it suffices to define them. These performance error measurement criteria are defined and explained in sections 1.2.1 and 1.2.2 respectively.
1.2.1. Root Mean Squared Error
Root mean squared error (RMSE) is the criteria most commonly used in error measurement, especially in prediction. The mean squared error (MSE) of an estimator of an observable parameter
is defined by
Let TBF be the actual time between failures and be the predicted mean time between failures. The RMSE used in this paper is defined as
(3)
1.2.2. Mean Absolute Value Difference
Mean absolute value difference (MAVD) is defined as the average of the absolute difference between predicted mean time between failures and actual times between failure values. The MAVD is defined as
(4)
2. Derivation of the Methods
In this section, we derive the three methods of estimation of the Goel-Okumoto software reliability parameters and its MTBF. In section 2.1, we consider the MLE method while the regression model and the resulting intensity function is derived in section 2.2. Finally, simple linear regression model and the resulting intensity function is considered in section 2.3.
2.1. Maximum Likelihood Estimation
The joint probability distribution function of the failure times from a Non-Homogeneous Poisson process with intensity function
is given as; [9]
(5)
Under the assumption that the failure times follow the Goel-Okumoto software reliability model with intensity function as in Equation (1), the joint probability distribution function of the failure times is given as
(6)
Taking the log-likelihood function of Equation (6) gives
(7)
Differentiating partially with respect to
and equating to zero gives
(8)
(9)
Solving Equations (8) and (9) for , we obtain the ML estimators denoted by
as
(10)
(11)
It has been shown [10] that the necessary and sufficient condition for Equations (10) and (11) to have a unique and positive solutions is if and only if
.
A numerical procedure known as the Newton Raphson method can be used to iteratively solve Equations (10) and (11) and use the MLEs thus obtained to obtain an estimator of the MTBF as
(12)
We denote the model from Equation (12) as model.
2.2. Regression Model and the Resulting Intensity Function
Subsection 2.2.1 outlines the derivation of the regression model and the resulting intensity function is derived in subsection 2.2.2.
2.2.1. Regression Approach for the Goel-Okumoto Software Reliability Model
This study stems from the fact that the logarithm of the intensity function of the Goel-Okumoto software reliability model is a linear function of the software failure times. It is thus proposed that the model can be taken as a simple linear regression. The parameters of the model are estimated using the classical regression approaches. References [11], [12], [13] and [14] used the inverse of the power law process, which is a NHPP to approximate MTBF. Since the Goel-Okumoto software reliability model is also a NHPP, its MTBF can be approximated by taking the inverse of its intensity function as
(13)
where is the failure time.
Taking natural logarithm both sides of Equation (13) we get
(14)
Let
(15)
(16)
(17)
Then Equation (14) becomes
(18)
Using the method of least squares for the linear regression model, the least squares estimators of the parameters in Equation (18) are obtained as
(19)
and
(20)
After obtaining the estimators of as in Equations (19) and (20) we get the estimators of the Goel-Okumoto software model parameters
denoted by
from Equations (16) and (17) as;
(21)
and
(22)
The estimator of MTBF can be obtained from Equation (2) and the regression estimators in Equations (9) and (10) as;
(23)
We call this model.
2.2.2. Intensity Function for the Regression Model
In order to derive the resulting intensity function from the assumed linear relationship in Equation (18), we state the following theorem without proof.
Theorem
Let be a random variable of continuous type with density function
and the cumulative density function
. If it is assumed that
, then
(24)
and the MRT is given as
(25)
From the assumed linear relationship , we get
(26)
Equating the MRT in the above theorem and the MTTF, i.e. equating Equations (25) and (26) in order to obtain the intensity failure function , we have
from which we obtain
(27)
By differentiating Equation (27) and using the result
We obtain
(28)
If we let , the Equation (28) becomes
(29)
Re-arranging Equation (29), we obtain
(30)
It is known that
(31)
But
(32)
From Equation (32), Equation (31) becomes
(33)
Now, from Equation (30), we have . Thus Equation (33) becomes
(34)
Equation (34) is the intensity function obtained when we assume a linear regression equation from the Goel-Okumoto software reliability model.
2.3. Simple Linear Regression Model and the Resulting Intensity Function
Subsection 2.3.1 outlines the derivation of the simple linear regression model and the derivation of resulting intensity function from the simple linear regression model is outlined in subsection 2.3.2.
2.3.1. Simple Linear Regression Model
In this section, we directly take a simple linear regression model instead of assuming the Goel-Okumoto (1979) reliability model. That is, we assume that the failure times and TBF are linearly related as
(35)
where TBF is the dependent variable and time of failure is the independent variable and
are constants that need to be estimated.
represents the error term.
Using least squares method, the estimators of the parameters in Equation (35) are obtained as
(36)
and
(37)
where denotes the average time between software failure. Thus the prediction equation (38) represents the estimating mean time between software failures.
(38)
We denote the estimator of MTBF from the simple linear regression model as model.
2.3.2. Intensity Function for the Simple Linear Regression Model
Here, we derive the intensity function resulting from the simple linear regression model in Equation (35) using the MRT. For a simple linear regression Equation (35),
(39)
Equating the MRT in Equation (13) and MTTF in Equation (39) we have from which we obtain
(40)
Differentiating Equation (40) and using the procedures and steps in Section 2.2.2, it can easily be shown that the intensity function resulting from the assumption of the simple linear regression model in Equation (35) is;
(41)
3. Results and Comparison of the Performance of the Three Estimation Methods
This section is divided into two where the results obtained from the three methods of estimation are discussed in section 3.1 and thereafter, the performances of these three methods are compared using RMSE and MAVD in section 3.2.
3.1. Results from the Three Methods of Estimation
The results obtained from the MLE method, regression method and simple linear regression method are respectively given in subsections 3.1.1, 3.1.2 and 3.1.3.
3.1.1. Using Maximum Likelihood Estimation Method
From Equations (10) and (11) and the data in Table 1, the MLE of the parameters of the Goel-Okumoto software reliability model with intensity function given in Equation (1) are
. Using these estimates and Equation (12), we find MSE and MAVD of the failure data in Table 1 as is in Table 2.
Table 2. MSE and MAVD of the model.
3.1.2. Using Regression Model
Using Equation (21) and Equation (22), we find the estimates of the parameters of the Goel-Okumoto software reliability as
Using these estimates, we find MSE and MAVD of the failure data in Table 1 as is in Table 3.
Table 3. MSE and MAVD of the model.
3.1.3. Using Simple Linear Regression
Using Equations (37) and (38), we obtain the simple linear regression estimates of the parameters of the Goel-Okumoto software reliability model as
. The following is the MSE and MAVD for the data in Table 1 obtained using simple linear regression approach based on Equation (38).
Table 4. MSE and MAVD for model.
3.2. Comparison of the Three Methods of Estimation
Comparison of the performance of the three methods of estimation of the parameters and MTBF of the Goel-Okumoto software reliability model based on RMSE and MAVD is given in Table 5.
Table 5. RMSE and MAVD for,
and
models.
Based on the results from Table 5, is the best model for estimating the parameters
and MTBF for the Goel-Okumoto (1979) software reliability model. This is so because the model has the least RMSE and MAVD. It is worth noting that this method of estimation performs better than pure linear regression model and MLE method and thus should be preferred. Based on this model and the data in Table 1, the preferred estimates of
and MTBF are thus obtained as
respectively.
4. Conclusions
Estimation of the parameters of software reliability models using the traditional techniques like the maximum likelihood method and the least squares Method pose some difficulties since the models are generally in non-linear relationships, [15]. The derivation and calculation of the MLEs usually require specialized software and more powerful computers for solving the non-linear equations. Some researchers, for instance, [16] argue that the difficulty experienced in the computations of MLE is less of a problem as time goes by as more statistical packages are being developed to contain and solve the complex maximum likelihood (ML) equations. However, these statistical packages require more complex algorithms and programming languages for them to work. MLEs are also heavily biased when there is small data on failure times, [17]. In this paper, we have presented a simpler and more efficient parameter estimation method for the Goel – Okumoto software reliability model. This stems from the fact that the logarithm of the intensity function of the model is a linear function of the software failure times and the parameters can thus be estimated using the traditional least squares regression method. The estimates thus obtained are better than MLE which is the widely used method in estimating the parameters of the model. It is also worth noting that when the parameters of the model are estimated using simple linear regression method, the results obtained are still better than MLE method.
References