Intrinsically Ties Adjusted Partial Tau (C-Tap) Correlation Coefficient
Oyeka Ikewelugo Cyprian Anaene, Osuji George Amaeze, Obiora-Ilouno Happiness Onyebuchi^{*}
Derpartment of Statistics, Physical Science Faculty, Nnamdi Azikiwe University, Awka, Nigeria
Email address:
To cite this article:
Oyeka Ikewelugo Cyprian Anaene, Osuji George Amaeze, Obiora-Ilouno Happiness Onyebuchi. Intrinsically Ties Adjusted Partial Tau (C-Tap) Correlation Coefficient. American Journal of Theoretical and Applied Statistics. Vol. 5, No. 5, 2016, pp. 270-279. doi: 10.11648/j.ajtas.20160505.14
Received: December 21, 2015; Accepted: May 24, 2016; Published:August 10, 2016
Abstract: This paper present a non-parametric statistical method for the estimation of partial correlation coefficient intrinsically adjusted for tied observations in the data. The method based on a modification of the method of estimating Tau correlation coefficient may be used when the population of interest are measurements on as low as the ordinal scale that are not necessary continuous or even numeric. The estimated partial correlation coefficient is a weighted average of the estimates obtained when each of the observations whose assigned ranks are arranged in their natural order as well as the observations whose assigned ranks are tagged along, with the weights being functions of the number of tied observations in each population. It is shown that failure to adjust for ties tends to lead to an underestimation of the true partial correlation coefficient, an effect that increases with the number of ties in the data. The proposed method is illustrated with some data and shown to compare favorably with the Kendall approach.
Keywords: Intrinsically, Ties, Adjusted Partial Tau (C-Tap), Correlation, Coefficient, Estimation
1. Introduction
[4] proposed a non-parametric method for estimating the simple correlation coefficient as well as any desired partial correlation coefficient between any two samples drawn from any two populations while holding at constant level values of observations drawn from the third population.
In Kendall’s approach, the populations of interest may be measurements on as low as the ordinal scale and need not be continuous or normally distributed.
Using the foundation of [4], [1] proposed a more formulated non parametric statistical method for the estimation of a partial correlation coefficient between two variables X and Y say, when the third variable Z say is held constant and again like Kendall’s approach, these variables may not necessarily be continuous or normally distributed.
In this paper, we propose another non-parametric method for the estimation of partial correlation coefficient that is intrinsically adjusted for tied observations in the data.
Apart from the fact that data may not be necessarily continuous or normally distributed, it may also not be numeric. The proposed method is more generalized and covers cases in which there are tied observations in any of the three populations which is intrinsically adjusted for those tied observation and when there are equal and unequal observation on the sampled populations which [4] and [1] did not put into consideration.
Now,
(1)
This maximum possible total number of agreement between the ranks assigned to observations from population X and Y provided that ranking of observations from Y and X are in their natural order [4].
(2)
This is the Kendall’s tau correlation coefficient rationale and is the basis of estimation where is the total sum of 1s or (+) and -1s or (-) obtained by numbers of each pair of ranks assigned to Y say, when the observation are arranged in natural ordered ranks of observations from X.
Using equation 2 for any two equal sample observation, we estimate a partial correlation coefficient between two populations X and Y say, when the values of a third population Z say are constant.
This is expressed for obtaining partial correlation coefficient based on the simple tau correlation coefficient [4], [5], [7].
(3)
Where , and are respectively the tau correlation coefficient between observations from population X and Y, X and Z, Y and Z. [4], [5] methods of obtaining exist in a very tedious way in practice. Again Kendall basic formulae do not have provision for presence of ties in the data.
Here, we proposed to develop using equations 2 and 3 a more formulated non parametric statistical method for the estimation of partial correlation coefficient between two variables X and Y say, when a third variable Z say is held constant, and these variables are not necessarily continuous or normally distributed, but measured on at least the ordinal scale.
The proposed method is more generalized and covers cases in which there are no tied observations in any of the three populations, and when there are equal and unequal tied observations on the on the sampled population.
The method is referred to as C-TAP for ties adjusted partial tau correlation coefficients which differentiate the proposed method from the usual Kendall tau partial correlation coefficient.
2. The Proposed Method
Consider the variables for the observation in a random sample of size n sampled from populations X, Y and Z respectively. In this method, population X, Y and Z may be measurements on as low as the ordinal scales that are not necessarily continuous or even numeric. We define as the rank assigned to, the rank assigned to and the rank assigned to for = 1, 2, 3,…, n as usual, tied observations in each sample are assigned to their mean ranks. Then
(4)
See [1]
That is provided that the rank assigned to the observation from the population comes after, that is succeeds the rank assigned to the observation from the same population when these observations are arranged in accordance with the natural ordering or ranking of the corresponding sister observations from population .
Let
(5)
where
(6)
and if there are no ties in .
Now let
(7)
Now from Equations (4) and (5), we have that
(8)
Similarly;
That is
(9a)
And
(9b)
Note that and are respectively the probabilities that the rank assigned to the observation from population X is less than, equal to or greater than the rank assigned to the observation from the same population if the rank assigned to this observation succeeds the rank assigned to the observation from when these observations are arranged in accordance with natural ordering of the ranks assigned to the observations from population . These probabilities are estimated as
(10)
where , and are respectively the number of 1s, 0s, and -1s in the frequency distribution of these numbers in
Hence the sample estimate of the total number of times the ranking of observations from population are in their natural order and consistent with the natural order and consistent with the natural ordering of the ranks of observations from population less the number of times they are out of order is obtained from Equation (9) and (10) as
(11)
As noted above, if these rankings are in their natural order then the maximum possible total number of arrangements or scores is (See Eqn (1)). Hence the Kendell tau correlation coefficient between observations from population X and observations from population uncorrected for ties in may be estimated using Equations (i) and (ii) in Equations 2 as
or
(12)
Note that if there are tied observations in , then the estimated correlation coefficient of Eqn 12 would not be unbiased estimate of the true tau correlation coefficient between and . This is because even though the numerator has by specifications been adjusted for possible ties in , its denominator has not been so adjusted. Therefore needs to be adjusted. To do this, we subtract , the number of tied observations in from to obtain
(13)
Hence an estimate of a tau correlation coefficient between and adjusted for ties in is
or
(14)
The unadjusted and adjusted tau correlation coefficients between observations from populations and are similarly estimated. Thus having arranged the ranks assigned to observations from in their natural order and tagged along the ranks assigned to the corresponding observations from we let
(15)
For that is, provided that the rank assigned to the observation from population Y comes after, that it succeeds the rank assigned to the observation from the same population when these observations are arranged in accordance with the natural ordering or ranking of the corresponding sister observation from population
Let
(16)
where
(17)
and if there are no ties in .
Also let
(18)
Now from Eqns 15 and 16, we have that
(19)
Similarly, from Eqns 18 and 19
(20)
And
(21)
Note that , , and are respectively the probabilities that the rank assigned to the observation from population is less than, equal to or greater than the rank assigned to observation^{}from the same population if the rank assigned to the observation succeeds the rank assigned to the ^{ }observation when the ranks assigned to the these observations are arranged in accordance with the natural ordering of the ranks assigned to the observations from population Z. These probabilities are estimated as
(22)
Where , and are respectively the number of 1s, 0s, and -1s in the frequency distribution of these numbers in .
Hence the sample estimate of the total number of times the rankings of observations from population are in their natural order and consistent with the natural ordering of the ranks of observations from population less the number of times they are out of order is obtained from Equations (20) and (22) as
(23)
Hence as before, the tau correlation coefficient between and unadjusted for ties in Y is estimated as
or
(24)
The corresponding estimate for variance is from Eqn (21)
(25)
Now the tau correlation coefficient between Y and Z adjusted for ties in Y is estimated as
or
(26)
whose estimated variance using Eqn (25) is
(27)
To estimate the tau correlation coefficient between observations from population X and observations from population Y, we let
(28)
for That is provided that the rank assigned to the observation in the sample drawn from population comes after, that is, succeeds the rank assigned to the observation in the sample drawn from population when the ranks assigned to the observations have been arranged according to the natural ordering of the ranks assigned to their sister observations in the sample drawn from population , Also let
(29)
where
(30)
and if there are no ties between X and Y
Define
(31)
from Eqns (28) and (29), we have that
(32)
Similarly from Equations (31) and (32) we have
(33)
and
(34)
Note that are respectively the probabilities that the rank assigned to the observation from population is less than, equal to or greater than the rank assigned to the observation from population , when the observations have been arranged so that the rank assigned to them correspond with the natural order of the ranks of their sister observations from population with the rank assigned to the ^{ }observation in succeeding the ranks assigned to the ^{ }observations in and are estimated as
(35)
where and are respectively the number of 1s, 0s, and -1s in the frequency distribution of these numbers in
Hence the sample estimate of the total number of times the rankings of observations from population and are in their natural order and consistent with the natural ordering with their sister observations from population less the number of times they are out of order is estimated from Equations (33) and (35) as
(36)
Following a similar argument as above the tau correlation coefficient between and unadjusted for ties between these populations is estimated as
or
(37)
whose variance is estimated as
(38)
The corresponding ties-adjusted tau correlation coefficient for ties between and is estimated as
That is
(39)
The corresponding variance is estimated from Equation 38 as
(40)
As noted above, in the presence of ties in the data, the estimated tau correlation coefficient is not independent of which of the two populations being correlated has its assigned ranks arranged in their natural order and which has its assigned ranks tagged along. To adjust for this effect we would need to use each of these two sets of ranks to alternatively play each of the two roles. Thus to estimate the correlation coefficient between X and Z when X has its assigned ranks arranged in their natural order, and Z has its corresponding assigned ranks tagged along, we let
(41)
Also let
(42)
where
(43)
and ; if there are no ties in
Also let
(44)
Now
(45)
And
(46)
(47)
Note that are respectively the probabilities that the rank assigned to the observation from population is less than, equal to or greater than the rank assigned to the observation from the same population if the rank assigned to the observation succeeds the rank assigned to the ^{ }observation when the ranks assigned to these observations are arranged in accordance with the natural ordering of the ranks assigned to the corresponding sister observations from population and are estimated as
(48)
where are respectively the number of 1s, 0s, and -1s in the frequency distribution of the values of these numbers in
Hence the sample estimate of the total number of times the ranking of observations from population are in their natural order and consistent with the natural ordering of the ranks of observations from population less the number of times they are out of order is obtained from equation 46 and 48 as
(49)
Hence as before, the tau correlation coefficient between and unadjusted for ties in is estimated as
or
(50)
with estimated variance
(51)
Now the tau correlation coefficient between and adjusted for ties in is estimated as
That is
(52)
whose estimated variance using equation 51 is
(53)
The tau correlation coefficients between populations and unadjusted as well as adjusted for ties in are similarly estimated. Thus following the above procedures we have that
or (54)
And
That is
(55)
The ties adjusted tau correlation coefficient between and is estimated as a weighted average of and , where the weights are functions of the number of tied observations in the two sampled populations and is estimated as
or
(56)
Similarly, the ties adjusted tau correlation coefficient between and is estimated as
(57)
Use of Equations 39, 56, and 57 in Eqn 3, yields an estimate of the ties adjusted or corrected partial tau correlation coefficient between observations from populations and holding at a constant level of observations from population as
(58)
Note that the unadjusted partial correlation coefficient is equal to the ties adjusted partial correlation coefficient, only if there are no ties in the data, otherwise tends to provide an under estimate of the true partial correlation coefficient, a bias that increases with the number of ties in the data.
3. Illustrative Example
The following are the later grades earned by 13 candidates under three judges in a job interview. Two of the judges and are males while judge is female.
Interest here is to estimate the partial correlation coefficient between the score or assessment of the candidates by the two male judges when the female judge is controlled. The later grades awarded the candidates by each of the three judges are here ranked from the highest ‘A^{+}’ to the lowest ‘F’, assigning the rank of 1 to A^{+}, the rank of 2 to A and so on until the rank of 13 is assigned to ‘F’. All tied grades or scores under each judge are assigned their mean rank.
To apply the proposed method, we here first arrange the ranks assigned by judge Z in their natural order and then tag along the ranks of the grades assigned the candidates by each of the other two judges X and Y the results are shown in table 2
To calculate , the ties adjusted correlation coefficient between X and Z when the ranks of Z are arranged in their natural order and the corresponding ranks of X are tagged along, we may first obtain the values of of Equation 4 preferably in a tabular form (Table 3)
Table 3. Calculation of Values of (Equation 4).
From Table 3, we have that = 41, = 7 and = 30. Hence from Equation (10), we have
;
Hence from Equation (12), we have that the estimated tau correlation coefficient unadjusted for ties in X is = 0.526 – 0.385 = 0.141.
Therefore from Equation 14, we have that the estimated tau correlation coefficient between X and Z adjusted for ties in X is
The estimated tau correlation coefficient between Y and Z is obtained from the data of table 4
Candidate number rjy | Candidate Number rky | 5 | 2 | 3 | 13 | 1 | 6 | 8 | 10 | 11 | 12 | 9 | 4 | 7 | |
2 | 2 | 9.5 | 9.5 | 4 | 13 | 8 | 6 | 11.5 | 2 | 6 | 6 | 11.5 | |||
5 | 2 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | ||
2 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | |||
3 | 9.5 | 0 | -1 | 1 | -1 | -1 | 1 | -1 | -1 | -1 | 1 | ||||
13 | 9.5 | -1 | 1 | -1 | -1 | 1 | -1 | -1 | -1 | 1 | |||||
1 | 4 | 1 | 1 | 1 | 1 | -1 | 1 | 1 | 1 | ||||||
6 | 13 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | |||||||
8 | 8 | -1 | 1 | -1 | -1 | -1 | -1 | ||||||||
10 | 6 | 1 | -1 | 0 | 0 | 1 | |||||||||
11 | 11.5 | -1 | -1 | -1 | 0 | ||||||||||
12 | 2 | 1 | 1 | 1 | |||||||||||
9 | 6 | 0 | 1 | ||||||||||||
4 | 6 | 1 | |||||||||||||
7 | 11.5 |
From table 4, we have that
; and
Hence from Equation 22, we have
and
Therefore, using Equation 24, we have
= 0.538 – 0.359 = 0.179
Hence the tau correlation coefficient between Y and Z adjusted for ties in Y is from equation 26
= = = 0.200
The tau correlation coefficient between X and Y is obtained using the values of of equation 28 shown in table 5.
From table 5, we have that
;;
So that from Equation 35,
; ;
Therefore, from Equation 37, the tau correlation coefficient between X and Y unadjusted for any ties between these two variables is estimated as
= 0.590 – 0.410 = 0.180, which is here the same as the adjusted correlation coefficient since there are no common tied observations between .
To estimate the tau correlation coefficient between X and Z when the ranks assigned to observations from X are arranged in their natural order and the ranks assigned to the corresponding observations from Z are tagged along, we use the values of of equation 41 shown in table 6.
From table 6, , , and
Hence from equation (48)
and
The estimated tau correlation coefficient between X and Z unadjusted for ties in Z (Eqn 50) is = 0.474 – 0.385 = 0.089.
The corresponding ties adjusted correlation coefficient (equation 52) is
= = = 0.104
The corresponding tau correlation coefficient between and when the ranks assigned to observations from Y are arranged in their natural order and the ranks assigned to the observations from are tagged along namely, is estimated using the values of from which we obtain and and
Hence
So that the tau correlation coefficient between and unadjusted for ties in is estimated as
, while the corresponding ties adjusted correlation coefficient is
= = = 0.104.
From these results, we have that the estimated ties adjusted tau correlation coefficient between X and Z from equation 56 is
= = = 0.130
Similarly, the estimated ties adjusted tau correlation coefficient between Y and Z (Eqn 57)
= = = 0.153
Using these values in equation 58, the ties adjusted partial correlation coefficient between X and Y, holding Z constant is estimated as
C-TAP = = = = 0.163
Notice that the simple correlation coefficient between X and Y is about 0.180 while the partial correlations coefficient between X and Y is only 0,163, indicating that the assessment by the female judge seems to reduce the strength of the association or agreement between the male judges in the assessment of the candidates. Also, if no adjustments have been made for the presence of tied observations in the data, the estimated partial correlation coefficient would have been
= = = 0.159
Showing that not adjusted for the presence of ties in the data, tends to lead to a probable underestimation of the true partial correlation coefficient.
4. Conclusion
This paper has presented a non-parametric method for estimating partial correlation coefficient between two variables holding a third variable constant, when there are tied observations in the data. The proposed method is more generalized and covers in which there are no tied observations in any of the three populations; and when there are equal and unequal tied observations on the sampled populations. The proposed method uses a modified approach to the estimation of tau correlation coefficient and assumes that the populations of interest may be measurements on as low as the ordinal scale. The estimated ties adjusted partial correlation coefficient is a weighted average of the estimates obtained when the ranks assigned to the observations from each of the sampled populations are alternatively used as those that are naturally ordered as well as those that are tagged along. It is shown that when there are ties in the data, failure to adjust for these ties tend to result to an underestimate of the true partial correlation coefficient. This bias increase with the number of tied observations in the data.
The proposed method is illustrated with some data and is shown to compare favorably with the Kendall approach.
References