KernelType Estimators of Divergence Measures and Its Strong Uniform Consistency
Hamza Dhaker^{1, *}, Papa Ngom^{1}, El Hadji Deme^{2}, Pierre Mendy^{3}
^{1}Departement de Mathématiques et Informatique, Faculté des Sciences et Technique, Université Cheikh Anta Diop, Dakar, Sénégal
^{2}Sciences Appliquées et Technologie, Unité de Formation et de Recherche, Université Gaston Berger, SaintLouis, Sénégal
^{3}Département de Techniques Quantitatives, Faculté des Sciences Economiques et de Gestion, Université Cheikh Anta Diop , Dakar, Sénégal
Email address:
To cite this article:
Hamza Dhaker, Papa Ngom, El Hadji Deme, Pierre Mendy. KernelType Estimators of Divergence Measures and Its Strong Uniform Consistency. American Journal of Theoretical and Applied Statistics. Vol. 5, No. 1, 2016, pp. 1322. doi: 10.11648/j.ajtas.20160501.13
Abstract: Nonparametric density estimation, based on kerneltype estimators, is a very popular method in statistical research, especially when we want to model the probabilistic or stochastic structure of a data set. In this paper, we investigate the asymptotic confidence bands for the distribution with kernelestimators for some types of divergence measures (Rényiα and Tsallisα divergence). Our aim is to use the method based on empirical process techniques, in order to derive some asymptotic results. Under different assumptions, we establish a variety of fundamental and theoretical properties, such as the strong consistency of an uniforminbandwidth of the divergence estimators. We further apply the previous results in simulated examples, including the kerneltype estimator for Hellinger, Bhattacharyya and KullbackLeibler divergence, to illustrate this approach, and we show that that the method performs competitively.
Keywords: Divergence Measures, Kernel Estimation, Strong Uniform, Consistency
1. Introduction
In this paper, we focus on the similarity between two distributions. Given a sample from one distribution, one of fundamental and classical question to ask is: how to have the similarity between this density with another known density? First, one must specify what it means for two distributions to be close, for which many different measures quantifying the degree of these distributions have been studied in the past. They are frequently called distance measures, although some of them are not strictly metrics. The divergence measures play an important role in statistical theory, especially in large theories of estimation and testing. They have been applied to different areas, such as medical image registration [21], classification and retrieval. There are several important problems in machine learning and statistics that require the estimation of the distance or divergence between distributions. Divergence between distributions also proves to be useful in neuroscience, For example (see, e.g [14]). employs divergence to quantify the difference between neural response patterns.
Later, many papers have appeared in the literature, where divergence or entropy type measures of information have been used in testing statistical hypotheses. For more examples and other possible applications of divergence measures, see the extended technical report [23,24]. The key role of the measure divergence in these various applications, it is necessary to accurately estimate these divergences.
Recently Ngom et all [16] has introduced the Divergence Indicator method by proposing a test for choosing between a random walk and AR(1), using a divergence measure.
The class of divergence measures is large; it includes the Rényi [25, 26], Tsallis [30], KullbackLeibler (KL), Hellinger, Bhattacharyya, Euclidean divergences, etc. These divergence measures can be related to the Csiszár divergence [3]. The KullbackLeibler, Hellinger and Bhattacharyya are special cases of Rényi and Tsallis divergences. But the KullbackLeibler one is the most popular of these divergence measures. Estimation of divergence and its applications have been many studies using different approaches. For example Pardo [20] presented methods and applications in the case of discrete distributions. By exploring a nonparametric method for estimating the divergence in the continuous case, Poczos and Schneider [23] proposed a nearestneighbor estimator and proved the weak consistency of the estimator Rényi and Tsallis divergences.
Finding estimators nonparametric of measure divergence, remains an open issue. Krishnamurthy and Kandasamy [15] used an initial plugin estimator for estimating by estimates of the higher order terms in the von Mises expansion of the divergence functional. In their frameworks, they proposed three estimators for Rényi, Tsallis, and Euclidean divergences between two continuous distributions and established the rates of convergence of these estimators. The main purpose of this paper is to analyze estimators for divergence measures between two continuous distributions. Our approach is similar to that of Krishnamurthy and Kandasamy [15] and is based on plugin estimation scheme: first, we apply a consistent density estimator for the underlying densities, and then we plug them into the desired formulas. Unlike of their frameworks, we study the strong consistency estimators for a general class of divergence measures. We emphasize that the plugin estimation techniques are heavily used by [2, 9] in the case of entropy. Bouzebda [2] proposed a method to establish consistency for kerneltype estimators of the differential entropy. We generalize this method for a large class of divergence measures in order to establish the consistency of kerneltype estimators of divergence measure when the bandwidth is allowed to range in a small interval which may decrease in length with the sample size. Our results will be immediately applicable to proving strong consistency for Keneltype estimation of this class of divergence measures.
The rest of this paper is organized as follows: in Section 2, we introduce divergence measures and we construct their kerneltype estimators. In Section 3, we study the uniform strong consistency of the proposed estimators. Section 4 is devoted to the proofs. In Section 5, numical examples are proposed in order to illustrate the performance of our method. Finally, in Section 6, we present our conclusion.
2. KernelType Estimators of Divergence Measures
In this section we give the notations and then presenting some basic definitions. We are interested with two densities,, : where denotes the dimension. The divergence measures of interest are Rényi, Tsallis are defined respectively as follows
(1)
(2)
These quantities are nonnegative, and equal zero iff almost surely (a.s). Remark that in the special cases where , we obtain from (1) and (2) the well known Hellinger, KullbackLeibler and Bhattacharyya divergences.
which is related to the Shannon entropy. For some statistical properties for the Shannon entropy, one can refer to [2].
via
where
For the following, we focus only on the estimation of and . The KullbackLeibler, Hellinger and Bhattacharyya can be deducted immediately.
We will next provide consistent estimator for the following quantity
(3)
whenever this integral is meaningful. Plugging its estimates into the appropriate formula immediately leads to a consistent estimator for the divergence measures and .
Now, assuming that for the rest of the document, the density is unknown, and the density is known and satisfies: is finite, this implies that is finite. Next, consider a sequence of independent and identically distributed valued random vectors, with cumulative distribution function a density function with respect to the Lebesgue measure on . The following conditions are needed for the remainder of this paper. To construct our divergence estimators we define, We start by a kernel density estimator for , and then substituting by its estimator in the divergence like functional of . For this, we introduce a measurable function that satisfies the following conditions.
(K.1) is of bounded variation on
(K.2) is right continuous on
(K.3)
(K.4)
Rosenblatt [27] first proposed an estimator for and Parzen [19] generalized, it eventually leading to the ParzenRosenblatt estimator, defined in the following way for any
(4)
where is the bandwidth sequence. Assuming that the density is continuous, one obtains a strongly consistent estimator of , that is, one has with probability , , . There are also results concerning uniform convergence and convergence rates. For proving such results one usually writes the difference as the sum of a probabilistic term and a deterministic term , also called bias. For further explanation One can refer to [10, 12, 13], among other authors. After having estimated , we estimate by setting
(5)
where and is a sequence of positive constants. Thus, using (5), the associated divergences and can be estimated by:
The approach used to define the plugin estimators is also developed in [2] in order to introduce a kerneltype estimator of Shannon’s entropy. Some statistical properties of these divergences is related to those of the kernel estimator of the continuous density . The limiting behavior of , for appropriate choices of the bandwidth , has been widely studied in the literature, examples include the work of Deroye [6, 7] Bosq [1] and Prakasa [22]. In particular, under our assumptions, the condition that together with is necessary and sufficient for the convergence in probability of towards the limit , independently of and the density . We can find other results of uniform consistency of the estimator in [4, 10, 5] and the references therein. In the next section, we will use the methods developed in previous references to establish convergence results for the estimates and deduce the convergence results of and .
3. Statistical Properties of the Estimators
We first study the strong consistency of the estimator defined in (5). Throughout the remainder of this paper, we well the notation for , which is delicate to handle. This is given by
Lemma 1 Let satisfy (K.1234) and let be a continuous bounded density. Then, for each pair of sequence , such that , together with , as and for any one has with probability 1
The proof of Lemma 1 is postponed until Section 4.
Lemma 2 Let satisfy (K.34) and let be a uniformly Lipschitz and continuous density. Then, for each pair of sequence , such that , together with , as and for any we have
The proof of Lemma 2 is postponed until Section 4.
Theorem 1 Let satisfy (K.1234) and let be a uniformly Lipschitz, bounded and continuous density. Then, for each pair of sequence , such that , together with , as and for any one has with probability 1
This, in turn, implies that
(6)
The proof of Theorem 1 is postponed until Section 4.
The following corollaries handle respectively the uniform deviation of the estimate and with respect to and .
Corollary 1 Assuming that the assumptions in Theorem 1 hold. Then, we have
This, in turn, implies that
(7)
The proof of Corollary 1 is postponed until Section 4.
Corollary 2 Assuming that the assumptions in Theorem 1 hold. Then, we have
This, in turn, implies that
(8)
The proof of Corollary 2 is postponed until Section 4.
Note that, the divergence estimator such as (5) also requires the appropriate choice of the smoothing parameter . The results given in (6), (7) and (8) show that any choice of between and ensures the strong consistency of the underlying divergence estimators. In other words, the fluctuation of the bandwidth in a small interval do not affect the consistency of the nonparametric estimators of these divergences. The work of Bouzebda and Elhattab [2] is very important for establishing our results, these authors have created a class of compactly supported densities. They used the following additional conditions.
(F.1) has a compact support say and is time continuously differentiable, and there exists a constant such that
(K.5) is of order , i.e., for some constant
and
Under (F.1) the expression may be written as follows
(9)
Theorem 2 Assuming conditions (K.12345) hold. Let fulfill (F.1). Then for each pair of sequences with , as and for any , we have
where
The proof of Theorem 2 is postponed until Section 4.
Corollary 3 Assuming that the assumptions in the Theorem 2 hold. Then,
Corollary 4 Assuming that the assumptions in the Theorem 2 hold. Then, for any we have
The proof of Corollaries 3 and 4 are given in Section 4.
Now, assume that there exists a sequence of strictly nondecreasing compact subsets of , such that For the estimation of the support we may refer to ([8]) and the references therein. Throughout, we let , where and are as in Corollaries (3) and (4). We choose an estimator of in the Corollaries (3) and (4) as the form
Using the techniques developed in [5] and the Corollaries (3) and (4) one can construct an asymptotically certainty intervals for the true divergences and .
4. Proofs of Our Results
Proof of Lemma 1. to show the strong consistency of , we use the following expression
where and is a sequence of positive constant. Define
We have
Since is a 1Lipschitz function, for then
.
Therefore for , we have
where denotes, as usual, the supremum norm, i.e., . Hence,
(10)
Finaly,
(11)
Using the conditions on the kernel , posed by Einmahl [11], consider the class of functions
For , set , where the supremum is taken over all probability measures on , where represents the field of Borel sets of , i.e is the smallest containing all the open (and/or closed) balls is . Here, denotes the metric and is the minimal number of balls of radius needed to cover .
We assume that satisfies the following uniform entropy condition.
(K.6) for some and ,
(K.7) is a pointwise measurable class, that is there exists a countable subclass of suchthat we can find for any function a sequence of functions in for which
This condition is discussed in [27]. It is satisfied whenever is right continuous.
Remark that condition (K.6) is satisfied whenever (K.1) holds, i.e., is of bounded variation on , we refer the reader to Van der Vaart and Wellner [28], for details on conditions of entropy (see also Pakes and Pollard [18], and Nolan and Pollard [17]). Condition (K.7) is satisfied whenever (K.2) holds, i.e., is right continuous, this condition is discussed in [28], (see also [5] and [11]).
From Theorem 1 in [11], whenever is measurable and satisfies (K.3467), and when is bounded, we have for each pair of sequence , such that , together with and as , with probability 1
(12)
Since , in view of (11) and (12), we obtain with probability 1.
(13)
It concludes the proof of the lemma.
Proof of Lemma 2.
Let be the complement of in (i.e, ). We have
with
and
Term . Repeat the arguments above in the terms with the formal change of by . We show that, for any ,
(14)
which implies
(15)
On the other hand, we know (see, e.g,[11] ), that since the density is uniformly Lipschitz and continuous, we have for each sequences , with , as ,
(16)
Thus,
(17)
Term . It is obsious to see that
Thus,
(18)
Hence,
(19)
Thus, in view of (16), we get
(20)
Finaly, in view of (17) and (20), we get
(21)
is deduced the proof of the lemma.
Proof of Theorem 1. We have
Combinating the Lemmas (1) and (2), we obtain
It concludes the proof of the Theorem.
Proof of Corollary 1. Remark that
Using the Theorem (1), we have
and the Corollary 1 holds.
Proof of Corollary 2. A first order taylor expansion of arround and gives
Remark that from Theorem 1,
which in turn, implies that
Thus, for all
Consequently
and the Corollary 2 holds.
Proof of Theorem 2. Under conditions , and using Taylor expansion of order we get, for ,
where and Thus a straightforward application of Lebesgue dominated convergence theorem gives, for large enough,
Let be a nonempty compact subset of the interior of . First, note that we have from Corollary 3.1.2. p. 62 of Viallon [29] (see also, [2], statement (4.16)).
(22)
Set, for all ,
(23)
(24)
by combining (22) and (24)
(25)
Let be a sequence of nondecreasing nonempty compact subsets of the interior of such that
Now, from (25), it is straightforward to observe that
The proof of Theorem 2 is completed.
Proof of Corollary 3. A direct application of the Theorem 2 leeds to the Corollary 3.
Proof of Corollary 4. Here again, set, for all ,
A first order Taylor expansion of leads to
Using condition , is compactly supported), is bounded away from zero on its support, thus, we have for large enough, there exists , such that , for all in the support of . From (23), we have
Hence,
by combining the last equation with (22)
The proof of Corollary is completed.
5. Simulation Study
Summarizing the ideas and the results given in the previous sections, we propose to study the performance of the kernelestimators for Hellinger (), Bhattacharyya (–) and KullbackLeibler (–) measures and their uniforminbandwidth consistency.
Hellinger, Bhattacharyya and KullbackLeibler divergences are defined respectively as follows:
The asymptotic behavior of each bandwidth is performed using the kerneltype estimator of the divergence criteria in corollary 3 and corollary 4 respectively.
We compute, for each chosen value of α, the expressions
where the cooresponding bounds () are defined by
We consider an experiment in which the DGP (Data Generating Process) for the true distribution are generated from a mixture of two normal distributions,
and the function is supposed to be a normal distribution with mean 1 and variance 2.
The sample size varies from 10 to 1000, and for each size, the statistics andare evaluated.
In order to plot against sample size, we need to perform three sets experiments.
The results are presented in tables 13 and figures 13.



10  0.14  0.0627 
20  0.25  0.0627 
50  0.07  0.0627 
100  0.02  0.0627 
300  0.01  0.0628 
500  0.006  0.0629 
1000  0.003  0.0629 



10  0.048  0.040 
20  0.037  0.030 
50  0.007  0.027 
100  0.007  0.025 
300  0.005  0.024 
500  0.003  0.024 
1000  0.002  0.023 



10  0.15  0.19 
20  0.128  0.167 
50  0.213  0.11 
100  0.053  0.095 
300  0.034  0.093 
500  0.025  0.842 
1000  0.007  0.825 
The tables 13 show that the kerneltype estimators of the divergence measures converge rapidly to their pseudotrue value, and confirm our asymptotic results. They all show that the discrepancy between the estimated and the true divergence criterion converge rapidly to zero. Similarly, in table 2 and table 3, DB and DK converge, as expected, to zero, which is the mean of the asymptotic distribution when the estimated distribution is close to f.
The figures 13 show values plots for Hellinger, Bhattacharyya and KullbackLeibler divergence respectively. The preceding comments from the table 13 also apply to the figure 13. For dealing with divergence error, it is much revealing to graph DH, DB and DK against sample size. They also confirm our asymptotic results. We note that, as sample size increases, the value discrepancy plots of divergence error converge, as it should, to zero These plots provide a great deal of information about how the sample size affect the performance of these informational criterions.
6. Concluding Remarks and Future Works
In this paper, we are concerned with the problem of nonparametric estimation of a class of divergence measures. For this cause, many estimators are available. The most recent ones are the estimates developed by Bouzebda [2]. We introduce an estimator that can be seen as a generalization of those previously suggested, in the sense that Bouzebda was only interested in the case of entropy, while we focus on the Rényi and the Tsallis divergence measures. Under our study, one can easily deduce KullbackLeibler, Hellinger and Bhattacharyya nonparametric estimators. The results presented in this work are general, since the required conditions are fulfilled by a large class of densities. We mention that the estimator in (5) can be calculated by using a MonteCarlo method under a given distribution . And a practical choice of is where and .
It will be interesting to enrich our results presented here by an additional uniformity in term of in the supremum appearing in all our theorems, which requires non trivial mathematics, this would go well beyond the scope of the present paper. Another direction of research is to obtain results, in the case where the continuous distributions and are both unknown. The problems and the methods described here are all inherently univariate. A natural and useful multivariate extension appear in the use of copula function.
References