A Novel Approach to Finding Sampling Distributions for Truncated Laws Via Unbiasedness Equivalence Principle
Nicholas A. Nechval^{1, *}, Sergey Prisyazhnyuk^{2}, Vladimir F. Strelchonok^{1}
1Department of Mathematics, Baltic International Academy, Riga, Latvia
^{2}Department of Geoinformation Systems, National Research University of Information Technologies, Mechanics and Optics, St-Petersburg, Russia
Email address:
To cite this article:
Nicholas A. Nechval, Sergey Prisyazhnyuk, Vladimir F. Strelchonok. A Novel Approach to Finding Sampling Distributions for Truncated Laws Via Unbiasedness Equivalence Principle. American Journal of Theoretical and Applied Statistics. Special Issue: Novel Ideas for Efficient Optimization of Statistical Decisions and Predictive Inferences under Parametric Uncertainty of Underlying Models with Applications. Vol. 5, No. 2-1, 2016, pp. 40-48. doi: 10.11648/j.ajtas.s.2016050201.16
Abstract: Truncated distributions arise naturally in many practical situations. In this paper, the problem of finding sampling distributions for truncated laws is considered. This problem concerns the very important area of information processing in Industrial Engineering. It remains today perhaps the most difficult and important of all the problems of mathematical statistics that require considerable efforts and great skill for investigation. In a given problem, most would prefer to find a sampling distribution for truncated law by the simplest method available. For many situations encountered in textbooks and in the literature, the approach discussed here is simple and straightforward. It is based on use of the unbiasedness equivalence principle (UEP) that represents a new idea which often allows one to provide a neat method for finding sampling distributions for truncated laws. It avoids explicit integration over the sample space and the attendant Jacobian but at the expense of verifying completeness of the recognized family of densities. Fortunately, general results on completeness obviate the need for this verification in many problems involving exponential families. The proposed approach allows one to obtain results for truncated laws via the results obtained for non-truncated laws. It is much simpler than the known approaches. In many situations this approach allows one to find the results for truncated laws with known truncation points and to estimate system reliability in a simple way. The approach can also be used to find the sampling distribution for truncated law when some or all of its truncation parameters are left unspecified. The illustrative examples are given.
Keywords: Truncated Law, Unbiasedness Equivalence Principle, Sampling Distribution, Reliability Estimation
1. Introduction
A probability distribution for a random variable X is said to be truncated when some set of values in the range of X is excluded. The truncated distributions (left truncated, right truncated or the doubly truncated) have found many applications, particularly in numerous industrial settings [1-8]. Final products are often subject to screening inspection before being sent to the customer. The usual practice is that if a product’s performance falls within certain tolerance limits, it is judged conforming and sent to the customer. If it fails, a product is rejected and thus scrapped or reworked. In this case, the actual distribution to the customer is truncated. Another example can be found in a multistage production process, in which inspection is performed at each production stage. If only conforming items are passed on to the next stage, the actual distribution is a truncated distribution. Accelerated life testing with samples censored is also a good example. In fact, the concept of a truncated distribution plays a significant role in analyzing a variety of production processes, process optimization and quality improvement. Truncated distributions can also be used to model intensity statistics in the study of atomic heterogeneity [9]. The justification being that: 1) atomic heterogeneity led to the intensity statistics being modified from Gaussian to near Gaussian forms [10,11]; and 2) in reality, the structure factors or normalized structure factors do not range from -∞ to ∞ but over a finite range.
Several examples have been given employing the truncated distributions in fitting rainfall data and animal population studies where observations usually begin after migration has commenced or concluded before it has stopped [12,13]. Other examples arise in life testing and reliability problems, where if failure is caused by a wear-out mechanism or is a consequence of accumulated wear, then the length-of-life of a system can be expected to be of finite dimension.
In many areas of the sciences, in particular communication networks, economic, hydrology, material science and Physics, long-tailed distributions arise. For example, many traffic measurement studies in modern communication networks such as the Internet have found long-tailed distributions. This means that the behavior of these data significantly departs from the traditional telephone traffic and its related Markov models with short-range dependence. In particular, the common Poisson arrival process and corresponding analysis based on Erlang formula are no longer valid.
The main weakness of long-tailed distributions is that they do not have finite moments of all orders. This weakness has restricted their use. To overcome this weakness, Nadarajah [14] introduces truncated versions of five of the most commonly known long-tailed distributions—which possess finite moments of all orders and could therefore be better models.
The object of the present paper is to obtain a sampling distribution for truncated law with a known (or unknown) truncation point (in general, vector) and a minimum variance unbiased estimator of the reliability function for this model using the results obtained for non-truncated law. It is known that a sampling distribution for truncated law may be derived using, namely, the method based on characteristic functions [15], the method based on generating functions [16], or the combinatorial method [17]. In this paper, a much simpler technique than the above ones is proposed. It allows one to obtain the results for truncated laws more easily.
2. Unbiasedness Equivalence Principle
Suppose an experiment yields data sample X^{n} = (X_{1}, … , X_{n}) relevant to the value of a parameter θ (in general, vector). Let L_{X}(x^{n}|θ) denote the probability or probability density of X^{n} when the parameter assumes the value θ. Considered as a function of θ for given X^{n}=x^{n}, L_{X}(x^{n}|θ) is the likelihood function. If the data sample X^{n} can be summarized by a sufficient statistic S (in general, vector), one can write L_{S}(s|θ) µ L_{X}(x^{n}|θ). Further, for any non-negative function w(s), w(s)L_{S}(s|θ) is also a likelihood function equivalent to L_{X}(x^{n}|θ). Suppose we recognize a function w(s) such that w(s)L_{S}(s|θ), regarded as a function of s for a given θ, is a density function. It can be shown that this is the sampling density of S if the family of recognized densities is complete.
The unbiasedness equivalence principle [18] consists in the following. If
(1)
represents the likelihood function for the truncated law, where w(θ,ϑ) is some function of a parameter (θ,ϑ) associated with truncation, ϑ is a known truncation point (in general, vector), then a sampling density for the truncated law is determined by
(2)
where
(3)
g(s|θ) is a sampling density of a sufficient statistic s(X^{n}) (for a family of densities {f(x|θ)}) determined on the basis of L_{X}(X^{n}|θ), is an unbiased estimator of 1/[w(θ,ϑ)]^{n}^{}with respect to g(s|θ), s∈S (a sample space of a non-truncated sufficient statistic S), φ(S) is a function of S for a given θ, which is equivalent to unbiased estimator of 1/[w(θ,ϑ)]^{n}, i.e.,
(4)
or
(5)
g_{ϑ }(s|θ) is the sampling density of a sufficient statistic S (for a family of densities {f_{ϑ }(x|θ)}) when the truncation parameter ϑ is known, S_{ϑ} is a sample space of a truncated sufficient statistic S.
3. Finding Sampling Distributions for Truncated Laws with Known Truncation Points
3.1. Example 3.1
Sampling distribution for the left-truncated Poisson law. Let the Poisson probability function be denoted by
(6)
The probability function of the restricted random variable, which is truncated away from some ϑ ≥ 0, is then
(7)
where
(8)
Consider a sample of n independent observations X_{1}, X_{2}, …, X_{n}, each with probability density function f_{ϑ }(x|θ), where the likelihood function is defined as
(9)
and let
(10)
It is well known that
(11)
is a complete sufficient statistic for the family {f(x|θ)}. A result of [19] states that sufficiency is preserved under truncation away from any Borel set in the range of X. Hence, in the case at hand S is sufficient for {f_{ϑ }(x|θ)}. It can be verified that S is also complete.
For the sake of simplicity but without loss of generality, consider the case ϑ=0. This is at the same time the most important case for applications and the easiest with which to deal. It follows from (2) that
(12)
where
(13)
(14)
(15)
denotes the Stirling number of the second kind [20] defined by
(16)
(17)
This is the same result that of Tate and Goen [21]. Their proof was based on characteristic functions.
3.2. Example 3.2
Sampling distribution for the right-truncated exponential law. Let the probability density function of the right-truncated exponential distribution be denoted by
(18)
where
(19)
(20)
Consider a sample of n independent observations X_{1}, X_{2}, …, X_{n}, each with density f_{ϑ} (x|θ), where the likelihood function is determined as
(21)
It is well known that
(22)
is a complete sufficient statistic for the family {f(x|θ)}. It follows from (2) that
n ≥ 1, (23)
where a_{+}= max(0, a),
(24)
(25)
(26)
(27)
This is the same result that of Bain and Weeks [15]. Their proof was based on characteristic functions.
3.3. Example 3.3
Sampling distribution for the doubly truncated exponential law. Consider an exponential distribution (20) that is doubly truncated at a lower truncation point (ϑ_{1}) and an upper truncation point (ϑ_{2}). The probability density function of the doubly truncated exponential distribution is defined as
(28)
where ϑ = (ϑ1,ϑ2),
(29)
Consider a sample of n independent observations X_{1}, X_{2}, …, X_{n}, each with density f_{ϑ} (x|θ), where the likelihood function is determined as
(30)
It is well known that
(31)
is a complete sufficient statistic for the family {f(x|θ)}. It follows from (2) that
n ≥ 1, (32)
where a_{+ }= max(0, a), g(s|θ) is given by (24),
(33)
(34)
(35)
4. Validity of the Unbiasedness Equivalence Principle
The theoretical results of this investigation into the validity of the proposed unbiasedness equivalence principle (UEP) for finding sampling distributions for truncated laws are largely contained in the theorem given below. We introduce the following notation and assumptions. Let X^{n} be a random variable taking on values x^{n} in a space X_{ϑ}, let A be a s -field of subsets of X_{ϑ}, and let (θ, ϑ) be a parameter associated with truncation, where ϑ is a known truncation point. For all values of the parameter θ in some parameter space Θ, let P_{ϑ} be a probability measure on A; i.e., for any set A in A, P_{ϑ} (A|θ) is the probability that X^{n} will belong to A when the parameter has the value θ. Let S = s(X^{n}) be a statistic on the measurable space (X_{ϑ}, A) taking on values in a measurable space (S_{ϑ}, B). For each θ∈Θ, let G_{ϑ} be the probability distribution of S when X^{n} has the distribution P_{ϑ}, i.e., for any B∈B , G_{ϑ}_{ }(B|θ) = P_{ϑ }( where s^{-1}(B) is the set of points x^{n} in X_{ϑ} for which s(x^{n})∈B.
(i). Assume the family P={P_{ϑ}_{:}θ∈Θ} of probability distributions of X^{n} is dominated by a totally s-finite measure m over (X_{ϑ}, A), i.e., there exists, for all θ ∈Θ, a non-negative A - measurable function p_{ϑ }(x^{n}|θ) such that
(36)
for all A∈A. (The integrand p_{ϑ}_{ }(x^{n}|θ) is called the density of P_{ϑ} w.r.t. (with respect to) m).
(ii). Assume that s(X^{n}) is sufficient for P. From the Halmos-Savage factorization theorem [22], s(X^{n}) is sufficient if and only if for each θ∈Θ there exists a non-negative B- measurable function L_{S}(s(x^{n})|θ,ϑ) on S _{ϑ} and a non-negative A - measurable function v on X_{ϑ} such that
(37)
(The symbol (m) following a statement means that the statement holds except on a set of m - measure zero). In (37), we will assume that L_{S} and v are finite (m).
(iii). Assume we recognize some likelihood function L_{S}(s|θ,ϑ) equivalent to likelihood function L_{X}(x^{n}|θ,ϑ). Define a s -finite measure r over (X_{ϑ}, A) by
(38)
Then, from (36), (37), and (38),
(39)
(iv). Assume we recognize a totally s -finite measure h over (S_{ϑ}, B) such that the measure r s^{-1} over (S_{ϑ}, B) is absolutely continuous w.r.t. h; i.e., h(B)=0 implis that rs^{-1}(B) = 0, where r s^{-1}(B) denotes the r - measure of the inverse image of B.
(v). Assume we recognize a positive B-measurable function φ on S_{ϑ} such that
(40)
for all θ∈Θ. Assume further that for any measurable set B of positive h - measure, there exists a θ∈Θ and a measurable subset B_{1} of B of positive h - measure over which L_{S}(s|θ,ϑ)φ(s) is positive.
From (40), {L_{S}(s|θ,ϑ)φ(s):θ∈Θ} is a family of densities w.r.t. h. For B∈B, let
(41)
Thus, (v) provides us with a family of densities, but at this stage we do not know if this recognized family is the family of sampling densities of S.
(vi). (vi) Assume we recognize that the family {L_{S}(s|θ,ϑ)φ(s):θ∈Θ} is complete, i.e.,
(42)
implies
(43)
except on a set D with for all θ∈Θ.
Theorem 1 (Sampling distribution for truncated law). Under assumptions (i) through (vi), G_{ϑ} has a density with respect to h and L_{S}(s|θ,ϑ)φ(s) is a version of it, i.e.,
(44)
is the sampling density, g_{ϑ}_{ }(s|θ), of the sufficient statistic s(X^{n}).
Proof. We show first that (43) and the second part of (v) imply that f (s)º0 (h). For suppose there exists a measurable В with h(B)>0 and f(s)¹0 over B. Then BÌD, so G_{ϑ}_{ }(B|θ)=0 for all θ∈Θ. But, from (v), there exists a B_{1}ÌB for which G_{ϑ}(B_{1}|θ)>0 for some θ, contradicting G_{ϑ}_{ }(B|θ)=0 for all θ∈Θ. Now, by a theorem in [22], there exists a non-negative measurable function y on S_{ϑ} such that
(45)
for every measurable function Θ_{ϑ}, in the sense that if either integral exists, then so does the other and the two are equal.
In (45), let Θ_{ϑ}_{ }(s,θ)=c_{B}L_{S}(s|θ,ϑ), where c_{B} is the characteristic function of B (B∈B). Then there exists a y (s) such that
(46)
for all B∈B. Note that the left side of (46) is G_{ϑ}_{ }(B|θ).
In (42), let f (s) = 1-[y(s)/φ (s)]. From (40) and (46),
(47)
for all θ∈Θ. Thus, from (43), y(s)=φ(s) almost everywhere (h), and, from (47),
(48)
is a version of the density of G_{ϑ} with respect to h.
5. Finding Reliability Estimators for Truncated Laws
Consider a system that is required to operate for a given ‘mission time’, t. The reliability of this system for the right-truncated distribution of time-to-failure with the probability density function f_{ϑ}_{ }(x|θ) may be defined as
(49)
Due to the Rao-Blackwell and Lehmann-Scheffé theorem [23] a minimum variance unbiased (MVU) estimator for R may be obtained as
(50)
where X may be any one of the observations (X_{1}, …, X_{n}) from f_{ϑ }(x|θ), S is a complete sufficient statistic for {f_{ϑ }(x|θ)}, and f_{ϑ}(x|s) is the conditional distribution of X given S=s; f_{ϑ }(x|s) is obtained as
(51)
where
(52)
is the joint probability density of X and S, is an unbiased estimator of
(53)
with respect to g(s|θ).
It should be noted that (50) can be obtained by different method as
(54)
where is an unbiased estimator of
(55)
with respect to g(s|θ).
5.1. Example 5.1
MVU estimator of reliability for the right truncated exponential distribution. Let X^{n}=(X_{1}, …, X_{n}) be a random sample of size n from a population with density (18). Then it follows from (50) (or (54)) that the MVU estimator of R(t) is obtained as
(56)
As a particular case, if ϑ ® ∞ that is the variable X is assumed unrestricted, the corresponding MVU estimator of reliability reduces to
(57)
For instance, suppose that the following failure times, in hours, are available from a given system: 4.2, 9.8, 16, 20 and that the truncation point ϑ=25 hours and the mission time t=5 hours. Clearly s=50 hours. Substituting these values in (56), the estimate of reliability is obtained as Had we assumed, however, that the observations are coming from the complete population, the estimate of reliability would have been, from (57),
5.2. Example 5.2
MVU estimator of reliability for the right-truncated gamma distribution. Let X^{n}=(X_{1}, …, X_{n}) be a random sample of size n from a population with density
0 < x ≤ ϑ, σ > 0, δ > 0, (58)
where ϑ is point of truncation, θ=(s,d), and w(θ,ϑ) is such that
(59)
This distribution has found applications in a number of diverse fields, for instance, in fitting of length-of-life data under fatigue. Note that for d=1, the right-truncated gamma distribution reduces to the right-truncated exponential distribution with parameter s. Although, this distribution is a special case of gamma distribution and gives a good fit to length-of-life data in many situations, it is not suitable since its use carries the implication that at any time future life-length is independent of past history.
To find MVU estimator of R(t) we apply the above technique. If the shape parameter d in (58) is assumed to be known, then it is well known that
(60)
is a complete sufficient statistic for s. The probability density function of the sampling distribution of S is given by
s∈(0, nϑ), (61)
where
(62)
(63)
The joint distribution of X and S is given by
(64)
Thus the conditional distribution of X given S is
(65)
Hence the MVU estimator of R(t) at time t is given by
(66)
It may be remarked that the result (66) though at the first look appears quite unwieldy is not so in practical applications, particularly when the sample size is small.
As a particular case, if ϑ ® ∞ that is the random variable X is assumed unrestricted, the distribution of the sufficient statistics from equation (61) reduces to
s∈(0,∞) (67)
and the corresponding MVU estimator of reliability at time t is given by
(68)
which corresponds to Basu’s [24] equation (9).
6. Finding Sampling Distributions for Truncated Laws with Unknown Truncation Points
It will be noted that the proposed approach can also be used to find the sampling distribution for truncated law when some or all of its truncation parameters are left unspecified.
6.1. Example 3.3 (Continued)
For instance, consider a situation of Example 3.3 where it is assumed that the truncation parameter ϑ=(ϑ_{1},ϑ_{2}) is unknown. It is known that the statistic (X_{(1)}, X_{(n)}, S), where
(69)
(70)
and
(71)
is a complete sufficient statistic for a set of parameters (ϑ_{1},ϑ_{2},θ). In this case, the likelihood function of a sample is determined as
(72)
where ϑ = (ϑ_{1},ϑ_{2}),
(73)
is the joint probability density function of the order statistics and, F_{ϑ} (×) is the probability distribution function. It is well known that
(74)
is a complete sufficient statistic for the family {f(x|θ)}. It follows from (2) and (72) that
s∈[(n-2), (n-2)], n ≥ 3,(75)
where
(76)
(77)
(78)
(79)
Thus, the sampling distribution of the sufficient statistic (X_{(1)}, X_{(n)}, S) for (ϑ_{1},ϑ_{2},θ) is given by
(80)
In other words, we have the following results.
6.2. Truncation Cases
In the case of one-sided truncation, when a truncation point on the left, ϑ_{1}, is unknown, a sampling distribution of the sufficient statistic (X_{(1)}, S) for (ϑ_{1},θ) is given by
(81)
where
x_{i} ≥ ϑ_{1}, i = 1, …, n, (82)
(83)
is the probability density function of the order statistic X_{(1)},
(84)
s≡s(X_{2}, …, X_{n}).
In the case of one-sided truncation, when a truncation point on the right, ϑ_{2}, is unknown, a sampling distribution of the sufficient statistic (X_{(n)}, S) for (ϑ_{2},θ) is given by
(85)
where
xi £ ϑ2, i = 1, …, n, (86)
(87)
is the probability density function of the order statistic X_{(n)},
(88)
sºs(X_{1}, …, X_{n}_{-1}).
In the case of two-sided truncation, when a lower truncation point, ϑ_{1}, and an upper truncation point, ϑ_{2}, are unknown, a sampling distribution of the sufficient statistic (X_{(1)}, X_{(n)}, S) for (ϑ_{1},ϑ_{2},θ ) is given by
(89)
where
ϑ_{1} £ x_{i} £ ϑ_{2}, i = 1, …, n, (90)
(91)
is the joint probability density function of the order statistic X_{(1) }and X_{(n)},
(92)
sºs(X_{2}, …, X_{n}_{-1}).
6.3. Example 6.3
If, say, we deal with a left-truncated exponential distribution,
(93)
where
(94)
and a truncation point on the left, ϑ_{1}, is unknown, then it follows immediately from (81) that the sampling distribution of the sufficient statistic (X_{(1)}, S=X_{2} + … +X_{n}) for (ϑ_{1},θ) is given by
(95)
which corresponds to the well-known result [23].
7. Conclusion
The authors hope that this work will stimulate further investigations using the proposed approach on specific applications to see whether obtained results with it are feasible for realistic applications.
References