An Almost Unbiased Estimator in Group Testing with Errors in Inspection
Langat Erick Kipyegon, Tonui Benard Cheruiyot^{*}, Langat Reuben Cheruiyot
Department of Mathematics & Computer Science, University of Kabianga, Kericho, Kenya
Email address:
To cite this article:
Langat Erick Kipyegon, Tonui Benard Cheruiyot, Langat Reuben Cheruiyot. An Almost Unbiased Estimator in Group Testing with Errors in Inspection. American Journal of Theoretical and Applied Statistics. Vol. 5, No. 3, 2016, pp. 138-145. doi: 10.11648/j.ajtas.20160503.19
Received: April 26, 2016; Accepted: May 9, 2016; Published: May 25, 2016
Abstract: The idea of pooling samples into pools as a cost effective method of screening individuals for the presence of a disease in a large population is discussed. Group testing was designed to reduce diagnostic cost. Testing population in pools also lower misclassification errors in low prevalence population. In this study we violate the assumption of homogeneity and perfect tests by investigating estimation problem in the presence of test errors. This is accomplished through Maximum Likelihood Estimation (MLE). The purpose of this study is to determine an analytical procedure for bias reduction in estimating population prevalence using group testing procedure in presence of tests errors. Specifically, we construct an almost unbiased estimator in pool-testing strategy in presence of test errors and compute the modified MLE of the prevalence of the population. For single stage procedures, with equal group sizes, we also propose a numerical method for bias correction which produces an almost unbiased estimator with errors. The existence of bias has been shown with the help of Taylor's expansion series, for group sizes greater than one. The indicator function with errors is used in the development of the model. A modified formula for bias correction has been analytically shown to reduce the bias of a group testing model. Also, the Fisher information and asymptotic variance has been shown to exist. We use MATLAB software for simulation and verification of the model. Then various tables are drawn to illustrate how the modified bias formula behaves for different values of sensitivities and specificities.
Keywords: Group Testing, Maximum Likelihood Estimator, Almost Unbiased Estimator, Bias Adjuster Formula, Bias-Corrected Estimates
1. Introduction
Group testing, also known as pooled samples, occurs when units from a population are pooled and tested as a group for the presence of a particular attribute, such as a disease, or a defect. The problem of group testing is concerned with classifying each given units in a population into two disjoint categories which are defectives and non-defectives. The characteristic feature is that any number of units (in a group) can be tested simultaneously but the information obtained from a single test on units, without any chance of error, is either negative or positive. When the test is negative, it implies that all the k units in that group are non-defective and when it is positive it implies that at least one of the units in k units is defective, but it is not known which ones or how many are defective. The problem is to devise a sequential sampling scheme which minimizes the expected number of tests required to classify all the units as defective or non-defective. That is, to find p the proportions of defective units in the population. The idea is to construct groups of size of, say, biological samples (e.g. blood) from a population of size . The population may be from a number of individuals pooled into n groups. Each group is tested by a single test. If the reading is negative, the group is dropped from further investigation, otherwise, sequential testing is performed on the group. The sequential testing procedure provided will enable us construct an almost unbiased estimator or propose analytical procedures that reduce biasness.
Group testing where subjects are tested in pools rather than individually has a long history of successful applications in screening of infectious diseases. Whether the aim is to diagnose individuals (classification) or estimate disease prevalence, it is cost effective since the test is done on a group and not individuals. Group testing first appeared in the statistical literature in the context of blood testing [1] but has since been applied in many fields, including transmission of viruses by insect vectors [2], genetics [3], plant disease assessment [4] and quality control [5]. Pool testing is a two-fold procedure: The first procedure being the identification of positive individuals in a population cost effectively (see [1]). This involves testing batches of items and those that test positive, the constituent members are tested for identification of positive ones. There is abundant literature on this classification problem. For instance, [6] and [7] proposed hierarchical or multistage model based on Dorfman idea that involves subdividing positive pools into smaller pools with the purpose of reducing cost. They showed that some savings can be achieved via multistage models. The second procedure is the estimation of prevalence rate as championed by [8]. There is also an abundant literature on this problem as established by [9] and [10]. Still on estimation problem, [11] used the Maximum Likelihood Estimation (MLE) to estimate elements of drugs in a composition of elements. In multistage problem with the purpose of estimation, [12] proposed a multistage estimation model. [10] proposed confidence intervals for prevalence rate when pool testing procedures are applied. Bayesian inference on population prevalence has also been studied (see for instance [9]). Some procedures for bias reduction in group testing model without errors has been proposed, [13].
In the group testing literature, with the objective of estimating the prevalence of an attribute of interest, the MLE is the dominant procedure. If the group size is, the MLE has been shown to be unbiased estimator ([12] and [14]). Whereas, when the group size is, the MLE has been shown to be biased and this is a drawback in statistical inference in pool testing procedure as observed by [8]. A more general bias adjustment, which was not specifically derived for group testing, was described by [15]. The purpose of this study is to determine an analytical procedure for bias reduction in estimating population prevalence using group testing procedure in presence of test errors. Specifically, we construct an almost unbiased estimator in pool-testing strategy in presence of test errors and compute the modified MLE of the prevalence of the population. For single stage procedures, with equal group sizes, we also propose a numerical method for bias correction which produces an almost unbiased estimator with errors.
The rest of the paper is organized as follows. In Section 2, we give analytic construction of MLE where we discuss the MLE in group testing with errors, an almost unbiased estimator, the bias adjuster formula and bias-corrected estimates of the prevalence of the population. In Section 3 we give results and discussions while in Section 4 we give conclusion.
2. Analytic Construction of MLE
In this section, it is shown that the MLE for group size is unbiased but for , it is biased. Secondly, ways of improving the MLE have been proposed using the Bias Adjuster formula.
2.1. MLE in Group Testing with Errors
Suppose we have a large population of size , the idea is to construct n groups from this population. The population may be blood from a number of individuals then pooled into n groups. The probability of classifying a group as positive in absence of errors is
(1)
where is the probability of an individual being classified as positive and k is the size of the pool. When the error element is introduced in (1) we obtain
(2)
where and denotes sensitivity and specificity of the test kits. By sensitivity we mean probability of classifying a positive group as positive while specificity is the probability of classifying a non-positive group as non-positive. For the derivations of (2), see [12] and [14]. Upon using (2) the MLE of can be obtained as;
(3)
For , and upon using (3) the MLE of p is unbiased, that is, . But for , it has been shown to be biased. That is for , and this is a drawback to group testing inference. We therefore construct an analytical procedure that can help reduce bias in the subsequent sections.
2.2. Improved MLE to Almost Unbiased Estimator
In this section we construct a MLE in pool testing with errors in inspection such that when the proposed is almost unbiased. To achieve this, we require Gart’s formula
(4)
where is the Fisher information, l is a log likelihood, is the Gart’s bias and O is the order of the error, see [15]. We notice that the Fisher information, I, has been computed in pool testing literature (See, for instance [12] and [14]) and provided as;
(5)
From (5), we have
and
(6)
Detailed derivation of (6) is provided in Appendix A.
Also
(7)
Technical derivation of (7) is provided in Appendix B.
Equations (6) and (7) are vital in the next sections.
2.3. Bias Adjuster Formula
With equations (6) and (7) at hand and Formula (4), upon substitution we have
(8)
On simplifying equation (8) above we obtain;
(9)
where implies the order of the error.
The Gart’s Bias-Corrected estimates are given by;
(10)
and
(11)
as suggested by [13].
We distinguish these two approaches by describing them as ‘Vertical’ or ‘Horizontal’ correction or more briefly as ‘Gart-V’ and ‘Gart – H’. Gart–V correction has the disadvantage of not being able to handle , owing to a zero denominator in Gart– H correction, in contrast, does not require , to be substituted in and so an estimate can be found. Gart’s method with Vertical correction is highly effective in reducing the bias for small p. With Horizontal correction, Gart’s method is moderately effective, (see [13]). In our discussion in the next section, our main focus is on Vertical correction since it is highly effective.
3. Results and Discussion
In this section, a sample from a population is taken, split in groups and tested for some attribute of interest. The estimates of p, the proportions of defective units in the population, under MLE, Bias Adjuster Formula and Gart’s Vertical Correction are obtained. These estimates are then represented empirically by use of tables.
We considered bias as the main issue in group testing problems. We investigated the MLE for single stage procedure. The estimates in the case of the ‘all positives’ outcome are shown to have a large effect on bias calculations.
We base our discussions on Monte Carlo Simulation for Bias and MLE for various group sizes for given sensitivity and specificity (see Tables 1, 2, 3 and 4)
In the tables that follow, we have results for simulated MLE for various group sizes with sensitivity and specificity of and and prevalence rate of
The simplest possible group testing procedure is where a single stage with equal group sizes is considered. We take a population of 200 samples split into 8 groups each of size 25 samples and tested for the prevalence of some attribute of interest. Hence 5 and . From equation (3), the MLE of for different values of and yields results as tabulated in Tables 1, 2, 3 and 4.
Also from equation (9) when , it reduces to;
(3)
Upon simulating on MATLAB, the MLE for different values of sensitivities and specificities, we obtain results provided in the following tables:
From the values in the tables above, the MLE of and the Bias Adjuster ofincrease with increase in sensitivity and specificity in the test kits. When the sensitivity and specificity are 95% and 90%, the MLE of and the correction are negative. This is when there is no positive outcome in the group test. It is also seen that the Vertical correction decreases with decrease in sensitivity and specificity of test kits. On the other hand, when all the groups test positive, is almost certainly an overestimate of as it is most unusual for every unit in a population to be positive. When all the individuals in the group are positive the probability is beyond 1 and the outcome is shown to have a large effect on bias calculations. The main reason for this rare occurrence is the presence of test errors i.e. the specificity and sensitivity and not human errors during the experiment since they are assumed to have more effect.
4. Conclusion
From the tables in Section 3, it is observed that the bias has been considerably reduced when compared to when , which conforms to the conventions of describing a bias of less than about 10% as acceptable. It is shown that the Vertical correction is most effective in reducing the bias. However, the correction is undefined when =1 owing to a zero in the denominator in modified Bias Adjuster. Thus when all groups test positive, in this situation, pool testing procedure is not applicable. For instance, see [13] who simply stated that ‘if all groups turn out to be positive, no sensible estimate of the infection rate can be obtained from the data’. The derivative of the log-likelihood function of the distribution has been shown to yield the Fisher information () and the asymptotic variance. The modified Bias Adjuster Formula for bias correction has been shown to reduce the bias. This is evident in Tables 1, 2, 3 and 4 for different specificity and sensitivity.
In this study we have considered bias reduction by constructing an almost unbiased estimator in a simple group testing model. However, there exist complex group testing models in literature. For further research, it will be interesting to study the bias properties and suggest some modification to such models.
Appendixes
Appendix A
In this appendix, we provide detail derivation of equation (6). First, we know that;
On simplifying the above, we get
The first derivative can be obtained as
(A1)
Substituting for
In the above, we get
(A2)
On simplifying, yields
(A3)
Factoring out and , we get
(A4)
Which simplifies to
(A5)
But we know that
Hence;
(A6)
On simplifying, we get;
(A7)
Therefore;
(A8)
Appendix B
Detail derivation of equation (7) is accomplished in this appendix. Upon taking natural logs of l (p/x), we have
On finding the derivative with respect to, we have;
The second derivative becomes;
(B1)
Upon simplification, we have:
(B2)
The third derivative is given by;
(B3)
To obtain equation (7), we take expectation on both sides of equation to obtain;
(B4)
(B5)
Upon simplification, we get;
(B6)
where
(B7)
(B8)
(B9)
(B10)
(B11)
(B12)
(B13)
and
(B14)
(B15)
(B16)
(B17)
(B18)
(B19)
(B20)
(B21)
Substituting although to into, we get;
(B22)
which reduces to;
(B23)
Hence on substituting and in, we get;
Hence;
Further;
(B24)
Equation simplifies to;
(B25)
References