American Journal of Theoretical and Applied Statistics
Volume 5, Issue 5, September 2016, Pages: 270-279

Intrinsically Ties Adjusted Partial Tau (C-Tap) Correlation Coefficient

Oyeka Ikewelugo Cyprian Anaene, Osuji George Amaeze, Obiora-Ilouno Happiness Onyebuchi*

Derpartment of Statistics, Physical Science Faculty, Nnamdi Azikiwe University, Awka, Nigeria

Email address:

(Obiora-Ilouno H. O.)

*Corresponding author

To cite this article:

Oyeka Ikewelugo Cyprian Anaene, Osuji George Amaeze, Obiora-Ilouno Happiness Onyebuchi. Intrinsically Ties Adjusted Partial Tau (C-Tap) Correlation Coefficient. American Journal of Theoretical and Applied Statistics. Vol. 5, No. 5, 2016, pp. 270-279. doi: 10.11648/j.ajtas.20160505.14

Received: December 21, 2015; Accepted: May 24, 2016; Published:August 10, 2016


Abstract: This paper present a non-parametric statistical method for the estimation of partial correlation coefficient intrinsically adjusted for tied observations in the data. The method based on a modification of the method of estimating Tau correlation coefficient may be used when the population of interest are measurements on as low as the ordinal scale that are not necessary continuous or even numeric. The estimated partial correlation coefficient is a weighted average of the estimates obtained when each of the observations whose assigned ranks are arranged in their natural order as well as the observations whose assigned ranks are tagged along, with the weights being functions of the number of tied observations in each population. It is shown that failure to adjust for ties tends to lead to an underestimation of the true partial correlation coefficient, an effect that increases with the number of ties in the data. The proposed method is illustrated with some data and shown to compare favorably with the Kendall approach.

Keywords: Intrinsically, Ties, Adjusted Partial Tau (C-Tap), Correlation, Coefficient, Estimation


1. Introduction

[4] proposed a non-parametric method for estimating the simple correlation coefficient as well as any desired partial correlation coefficient between any two samples drawn from any two populations while holding at constant level values of observations drawn from the third population.

In Kendall’s approach, the populations of interest may be measurements on as low as the ordinal scale and need not be continuous or normally distributed.

Using the foundation of [4], [1] proposed a more formulated non parametric statistical method for the estimation of a partial correlation coefficient between two variables X and Y say, when the third variable Z say is held constant and again like Kendall’s approach, these variables may not necessarily be continuous or normally distributed.

In this paper, we propose another non-parametric method for the estimation of partial correlation coefficient that is intrinsically adjusted for tied observations in the data.

Apart from the fact that data may not be necessarily continuous or normally distributed, it may also not be numeric. The proposed method is more generalized and covers cases in which there are tied observations in any of the three populations which is intrinsically adjusted for those tied observation and when there are equal and unequal observation on the sampled populations which [4] and [1] did not put into consideration.

Now,

(1)

This maximum possible total number of agreement between the ranks assigned to observations from population X and Y provided that ranking of observations from Y and X are in their natural order [4].

(2)

This is the Kendall’s tau correlation coefficient rationale and is the basis of estimation where  is the total sum of 1s or (+) and -1s or (-) obtained by numbers of each pair of ranks assigned to Y say, when the observation are arranged in natural ordered ranks of observations from X.

Using equation 2 for any two equal sample observation, we estimate a partial correlation coefficient between two populations X and Y say, when the values of a third population Z say are constant.

This is expressed for obtaining partial correlation coefficient based on the simple tau correlation coefficient [4], [5], [7].

(3)

Where ,  and are respectively the tau correlation coefficient between observations from population X and Y, X and Z, Y and Z. [4], [5] methods of obtaining  exist in a very tedious way in practice. Again Kendall basic formulae do not have provision for presence of ties in the data.

Here, we proposed to develop using equations 2 and 3 a more formulated non parametric statistical method  for the estimation of partial correlation coefficient between two variables X and Y say, when a third variable Z say is held constant, and these variables are not necessarily continuous or normally distributed, but measured on at least the ordinal scale.

The proposed method is more generalized and covers cases in which there are no tied observations in any of the three populations, and when there are equal and unequal tied observations on the on the sampled population.

The method is referred to as C-TAP for ties adjusted partial tau correlation coefficients which differentiate the proposed method from the usual Kendall tau partial correlation coefficient.

2. The Proposed Method

Consider the variables  for the  observation in a random sample of size n sampled from populations X, Y and Z respectively. In this method, population X, Y and Z may be measurements on as low as the ordinal scales that are not necessarily continuous or even numeric. We define  as the rank assigned to, the rank assigned to  and the rank assigned to  for = 1, 2, 3,…, n as usual, tied observations in each sample are assigned to their mean ranks. Then

(4)

 See [1]

That is provided that the rank assigned to the  observation from the population comes after, that is succeeds the rank assigned to the observation from the same population when these observations are arranged in accordance with the natural ordering or ranking of the corresponding sister observations from population .

Let

(5)

where

(6)

and if there are no ties in .

Now let

(7)

Now from Equations (4) and (5), we have that

(8)

Similarly;

That is

(9a)

And

(9b)

Note that   and  are respectively the probabilities that the rank assigned to the observation from population X is less than, equal to or greater than the rank assigned to the observation from the same population if the rank assigned to this observation succeeds the rank assigned to the observation from when these observations are arranged in accordance with natural ordering of the ranks assigned to the observations from population . These probabilities are estimated as

(10)

where , and are respectively the number of 1s, 0s, and -1s in the frequency distribution of these numbers in

Hence the sample estimate of the total number of times the ranking of observations from population are in their natural order and consistent with the natural order and consistent with the natural ordering of the ranks of observations from population  less the number of times they are out of order is obtained from Equation (9) and (10) as

(11)

As noted above, if these rankings are in their natural order then the maximum possible total number of arrangements or scores is  (See Eqn (1)). Hence the Kendell tau correlation coefficient between observations from population X and observations from population  uncorrected for ties in  may be estimated using Equations (i) and (ii) in Equations 2 as

 or

(12)

Note that if there are tied observations in , then the estimated correlation coefficient  of Eqn 12 would not be unbiased estimate of the true tau correlation coefficient between  and . This is because even though the numerator  has by specifications been adjusted for possible ties in , its denominator  has not been so adjusted. Therefore  needs to be adjusted. To do this, we subtract , the number of tied observations in  from  to obtain

(13)

Hence an estimate of a tau correlation coefficient between  and  adjusted for ties in  is

 or

(14)

The unadjusted and adjusted tau correlation coefficients between observations from populations  and  are similarly estimated. Thus having arranged the ranks assigned to observations from  in their natural order and tagged along the ranks assigned to the corresponding observations from we let

(15)

For that is, provided that the rank assigned to the  observation from population Y comes after, that it succeeds the rank assigned to the  observation from the same population when these observations are arranged in accordance with the natural ordering or ranking of the corresponding sister observation from population

Let

(16)

where

(17)

and  if there are no ties in .

Also let

(18)

Now from Eqns 15 and 16, we have that

(19)

Similarly, from Eqns 18 and 19

(20)

And

(21)

Note that , , and are respectively the probabilities that the rank assigned to the  observation from population is less than, equal to or greater than the rank assigned to observationfrom the same population if the rank assigned to the observation succeeds the rank assigned to the  observation when the ranks assigned to the these observations are arranged in accordance with the natural ordering of the ranks assigned to the observations from population Z. These probabilities are estimated as

(22)

Where ,  and  are respectively the number of 1s, 0s, and -1s in the frequency distribution of these numbers in .

Hence the sample estimate of the total number of times the rankings of observations from population  are in their natural order and consistent with the natural ordering of the ranks of observations from population  less the number of times they are out of order is obtained from Equations (20) and (22) as

(23)

Hence as before, the tau correlation coefficient between and unadjusted for ties in Y is estimated as

 or

(24)

The corresponding estimate for variance is from Eqn (21)

(25)

Now the tau correlation coefficient between Y and Z adjusted for ties in Y is estimated as

 or

(26)

whose estimated variance using Eqn (25) is

(27)

To estimate the tau correlation coefficient between observations from population X and observations from population Y, we let

(28)

for  That is provided that the rank assigned to the  observation in the sample drawn from population  comes after, that is, succeeds the rank assigned to the  observation in the sample drawn from population  when the ranks assigned to the observations have been arranged according to the natural ordering of the ranks assigned to their sister observations in the sample drawn from population ,  Also let

(29)

where

(30)

and if there are no ties between X and Y

Define

(31)

from Eqns (28) and (29), we have that

(32)

Similarly from Equations (31) and (32) we have

(33)

and

(34)

Note that  are respectively the probabilities that the rank assigned to the  observation from population  is less than, equal to or greater than the rank assigned to the  observation from population , when the observations have been arranged so that the rank assigned to them correspond with the natural order of the ranks of their sister observations from population  with the rank assigned to the  observation in  succeeding the ranks assigned to the  observations in  and are estimated as

(35)

where  and are respectively the number of 1s, 0s, and -1s in the frequency distribution of these numbers in

Hence the sample estimate of the total number of times the rankings of observations from population  and  are in their natural order and consistent with the natural ordering with their sister observations from population  less the number of times they are out of order is estimated from Equations (33) and (35) as

(36)

Following a similar argument as above the tau correlation coefficient between and unadjusted for ties between these populations is estimated as

 or

(37)

whose variance is estimated as

(38)

The corresponding ties-adjusted tau correlation coefficient for ties between  and  is estimated as

That is

(39)

The corresponding variance is estimated from Equation 38 as

(40)

As noted above, in the presence of ties in the data, the estimated tau correlation coefficient is not independent of which of the two populations being correlated has its assigned ranks arranged in their natural order and which has its assigned ranks tagged along. To adjust for this effect we would need to use each of these two sets of ranks to alternatively play each of the two roles. Thus to estimate  the correlation coefficient between X and Z when X has its assigned ranks arranged in their natural order, and Z has its corresponding assigned ranks tagged along, we let

(41)

Also let

(42)

where

(43)

and ; if there are no ties in

Also let

(44)

Now

(45)

And

(46)

(47)

Note that are respectively the probabilities that the rank assigned to the observation from population  is less than, equal to or greater than the rank assigned to the  observation from the same population if the rank assigned to the observation succeeds the rank assigned to the   observation when the ranks assigned to these observations are arranged in accordance with the natural ordering of the ranks assigned to the corresponding sister observations from population  and are estimated as

(48)

where are respectively the number of 1s, 0s, and -1s in the frequency distribution of the  values of these numbers in

Hence the sample estimate of the total number of times the ranking of observations from population  are in their natural order and consistent with the natural ordering of the ranks of observations from population  less the number of times they are out of order is obtained from equation 46 and 48 as

(49)

Hence as before, the tau correlation coefficient between and  unadjusted for ties in  is estimated as

 or

(50)

with estimated variance

(51)

Now the tau correlation coefficient between  and  adjusted for ties in  is estimated as

That is

(52)

whose estimated variance using equation 51 is

(53)

The tau correlation coefficients between populations and  unadjusted as well as adjusted for ties in  are similarly estimated. Thus following the above procedures we have that

 or (54)

And

That is

(55)

The ties adjusted tau correlation coefficient between and  is estimated as a weighted average of and , where the weights are functions of the number of tied observations in the two sampled populations and is estimated as

 or

(56)

Similarly, the ties adjusted tau correlation coefficient between  and  is estimated as

(57)

Use of Equations 39, 56, and 57 in Eqn 3, yields an estimate of the ties adjusted or corrected partial tau correlation coefficient between observations from populations and  holding at a constant level of observations from population  as

(58)

Note that the unadjusted partial correlation coefficient  is equal to the ties adjusted partial correlation coefficient, only if there are no ties in the data, otherwise  tends to provide an under estimate of the true partial correlation coefficient, a bias that increases with the number of ties in the data.

3. Illustrative Example

The following are the later grades earned by 13 candidates under three judges in a job interview. Two of the judges  and  are males while judge  is female.

Table 1. Grades of Candidates under three judges in a job interview.

Interest here is to estimate the partial correlation coefficient between the score or assessment of the candidates by the two male judges when the female judge is controlled. The later grades awarded the candidates by each of the three judges are here ranked from the highest ‘A+’ to the lowest ‘F’, assigning the rank of 1 to A+, the rank of 2 to A and so on until the rank of 13 is assigned to ‘F’. All tied grades or scores under each judge are assigned their mean rank.

To apply the proposed method, we here first arrange the ranks assigned by judge Z in their natural order and then tag along the ranks of the grades assigned the candidates by each of the other two judges X and Y the results are shown in table 2

Table 2. Natural Order of ranks for Z with corresponding ranks for X and Y.

To calculate , the ties adjusted correlation coefficient between X and Z when the ranks of Z are arranged in their natural order and the corresponding ranks of X are tagged along, we may first obtain the values of  of Equation 4 preferably in a tabular form (Table 3)

Table 3. Calculation of Values of  (Equation 4).

From Table 3, we have that = 41,  = 7 and  = 30. Hence from Equation (10), we have

;  

Hence from Equation (12), we have that the estimated tau correlation coefficient unadjusted for ties in X is  = 0.526 – 0.385 = 0.141.

Therefore from Equation 14, we have that the estimated tau correlation coefficient between X and Z adjusted for ties in X is

The estimated tau correlation coefficient between Y and Z is obtained from the data of table 4

Table 4. Calculation of Values of ujk;y. z (Equation 15).

Candidate number rjy Candidate Number rky 5 2 3 13 1 6 8 10 11 12 9 4 7
2 2 9.5 9.5 4 13 8 6 11.5 2 6 6 11.5
5 2 0 1 1 1 1 1 1 1 0 1 1 1
2 2 1 1 1 1 1 1 1 0 1 1 1
3 9.5 0 -1 1 -1 -1 1 -1 -1 -1 1
13 9.5 -1 1 -1 -1 1 -1 -1 -1 1
1 4 1 1 1 1 -1 1 1 1
6 13 -1 -1 -1 -1 -1 -1 -1
8 8 -1 1 -1 -1 -1 -1
10 6 1 -1 0 0 1
11 11.5 -1 -1 -1 0
12 2 1 1 1
9 6 0 1
4 6 1
7 11.5

From table 4, we have that

;  and

Hence from Equation 22, we have

  and

Therefore, using Equation 24, we have

 = 0.538 – 0.359 = 0.179

Hence the tau correlation coefficient between Y and Z adjusted for ties in Y is from equation 26

=  =  = 0.200

The tau correlation coefficient  between X and Y is obtained using the values of  of equation 28 shown in table 5.

Table 5. Calculation of Values of ujk;xy. z of Equation 28.

From table 5, we have that

;;

So that from Equation 35,

; ;

Therefore, from Equation 37, the tau correlation coefficient between X and Y unadjusted for any ties between these two variables is estimated as

 = 0.590 – 0.410 = 0.180, which is here the same as the adjusted correlation coefficient since there are no common tied observations between .

To estimate the tau correlation coefficient between X and Z when the ranks assigned to observations from X are arranged in their natural order and the ranks assigned to the corresponding observations from Z are tagged along, we use the values of  of equation 41 shown in table 6.

Table 6. Calculating the Values of ujk;z. x of Equation 41.

From table 6, , , and

Hence from equation (48)

  and

The estimated tau correlation coefficient between X and Z unadjusted for ties in Z (Eqn 50) is  = 0.474 – 0.385 = 0.089.

The corresponding ties adjusted correlation coefficient (equation 52) is

=  =  = 0.104

The corresponding tau correlation coefficient between  and  when the ranks assigned to observations from Y are arranged in their natural order and the ranks assigned to the observations from  are tagged along namely,  is estimated using the values of  from which we obtain and   and

Hence  

So that the tau correlation coefficient between  and  unadjusted for ties in  is estimated as

, while the corresponding ties adjusted correlation coefficient is

= = = 0.104.

From these results, we have that the estimated ties adjusted tau correlation coefficient between X and Z from equation 56 is

=  =  = 0.130

Similarly, the estimated ties adjusted tau correlation coefficient between Y and Z (Eqn 57)

=  =  = 0.153

Using these values in equation 58, the ties adjusted partial correlation coefficient between X and Y, holding Z constant is estimated as

C-TAP = = =  = 0.163

Notice that the simple correlation coefficient between X and Y is about 0.180 while the partial correlations coefficient between X and Y is only 0,163, indicating that the assessment by the female judge seems to reduce the strength of the association or agreement between the male judges in the assessment of the candidates. Also, if no adjustments have been made for the presence of tied observations in the data, the estimated partial correlation coefficient would have been

=  =  = 0.159

Showing that not adjusted for the presence of ties in the data, tends to lead to a probable underestimation of the true partial correlation coefficient.

4. Conclusion

This paper has presented a non-parametric method for estimating partial correlation coefficient between two variables holding a third variable constant, when there are tied observations in the data. The proposed method is more generalized and covers in which there are no tied observations in any of the three populations; and when there are equal and unequal tied observations on the sampled populations. The proposed method uses a modified approach to the estimation of tau correlation coefficient and assumes that the populations of interest may be measurements on as low as the ordinal scale. The estimated ties adjusted partial correlation coefficient is a weighted average of the estimates obtained when the ranks assigned to the observations from each of the sampled populations are alternatively used as those that are naturally ordered as well as those that are tagged along. It is shown that when there are ties in the data, failure to adjust for these ties tend to result to an underestimate of the true partial correlation coefficient. This bias increase with the number of tied observations in the data.

The proposed method is illustrated with some data and is shown to compare favorably with the Kendall approach.


References

  1. Ebuh G. U, Oyeka I. C. A (2012) "A Nonparametric Method for Estimating Partial Correlation Coefficient" J Biom Biostat 3: 156. doi:10.4172/2155-6180.1000156.
  2. Fraser, D. A. S. (1957) "Nonparametric Methods in Statistics" John Wiley & Sons, Inc., New York.
  3. Gibbon, J. D (1973), "Non parametric statistical inference" Mc Graw Hills book company, New York.
  4. Hollander, m and Woife, D. A. (1999), "Non parametric statistical Methods (2nd edition) Wiley-Inter Science, New York.
  5. Kendall, M. G. (1948), Rank Correlation Methods, Hafiner Publishing Company. Inc. New York.
  6. Noether, G. E. (1967) "Elements of Nonparametric Statistics" John Wiley & Sons, Inc., New York.
  7. Siegel Sidney (1956) Nonparametric Statistics for the Behavioral Sciences.McGraw-Hill Series in Psychology, New York.
  8. Oyeka, C. A Osuji, G. A and Nwankwo, C. C. (2013), "Intrinsically ties adjusted Tau (C – Tat) Correlation Coefficient" American Journal of Theoretical and Applied Statistics Vol 2 pp 273–281.
  9. Oyeka I. C. A., Ebuh G. U., Nwosu C. R., Utazi E. C., Ikpegbu P. A., Obiora-Ilouno H & Nwankwo C. C. (2009) "A Method of Analyzing Paired Data Intrinsically Adjusted for Ties". Global Journal of Mathematics and Statistics, India. Volume1, Number 1, 2009. 1-6.
  10. Oyeka, C. A. (1996) "An introduction to applied statistical method. Nobern avocation Publication Company, Enugu-Nigeria.

Article Tools
  Abstract
  PDF(327K)
Follow on us
ADDRESS
Science Publishing Group
548 FASHION AVENUE
NEW YORK, NY 10018
U.S.A.
Tel: (001)347-688-8931