Multinomial Logit Modeling of Factors Associated With Multiple Sexual Partners from the Kenya Aids Indicator Survey 2007
Beryl Ang’iro1, Samuel Mwalili1, Josphat Kinyanjui2
1Jomo Kenyatta University of Agriculture and Technology, School of Mathematical Sciences, Nairobi, Kenya
2Karatina University, Department of Mathematics, Statistics & Actuarial sciences, Karatina, Kenya
Beryl Ang’iro, Samuel Mwalili, Josphat Kinyanjui. Multinomial Logit Modeling of Factors Associated With Multiple Sexual Partners from the Kenya Aids Indicator Survey 2007. American Journal of Theoretical and Applied Statistics. Vol. 4, No. 3, 2015, pp. 170-177. doi: 10.11648/j.ajtas.20150403.23
Abstract: The number of lifetime sex partners of an individual has an important effect on Human Immunodeficiency Virus (HIV) status of an individual; hence modeling multiple sexual partnerships is an essential component of any analysis of HIV outcome. Multiple sexual partnerships are associated with greater risk of HIV, Sexually Transmitted infections (STIs) and intimate partner violence. This research project presents a general approach for modeling logit of clustered (correlated) ordinal and nominal responses using polytomous data from the Kenya AIDS Indicator Survey 2007 (NASCOP 2010). We review multinomial logit models as generalized linear models. The model is applied to HIV prevalence data among men and women in Kenya, derived from the Kenya AIDS Indicator Survey 2007 (KAIS). We generalize logistic regression to handle multinomial response variables, with separate models for nominal and ordinal cases. When modeling a nominal response variable we are interested in finding if certain predictors have an effect on the probabilities. The baseline category logit model, models the odds of being in one category relative to being in a designated category (last category), for all pairs of categories. It is used for nominal responses. A maximum likelihood estimation (MLE) approach is used for baseline category logit model. To model an ordinal response variable one models the cumulative response probabilities or cumulative odds. The cumulative logit model is used when the response of an individual unit is restricted to one of a finite number of ordinal values. This study shows the practicality of multinomial logit model in analyzing epidemiological data. Other studies have found education to be associated with multiple sexual partners. In this study, we observed that multiple sexual partners is not related to education. Other covariates like Gender, Place of residence, sexually active individuals for the past 12 months and marital status were found to be associated with multiple sexual partners. Individuals that are sexually active for the past 12 months were found to be ten times more likely to have multiple sexual partners compared to those that are not. After controlling for all other factors, the odds of male to female having multiple sexual partners doubled to 7.6 meaning male are almost 8 times likely to have multiple sexual partners compared to female. Partner testing or couples testing is a main strategy of national testing initiatives in Kenya. Respondents are encouraged to learn their test results with their partner.
Keywords: Multinomial Logistic Regression, Baseline-Category Logit, Cumulative-Category Logit, Akaike Information Criterion AIC, Deviance Information Criterion DIC
1.1. Background of the Study
Globally, people living with HIV were estimated to be 35.3 million in 2012 with 2.2 million new infections (UNAIDS, 2013). In Kenya, HIV prevalence is estimated to be approximately at 5.6% among adults aged 15-64 years (NASCOP, 2012).
A number of studies have shown that having many sexual partners and having a casual sexual partners' increase the risk of getting infected with HIV (Kruzich LA1, Marquis GS, Wilson CM (2004); Shelton JD1, Halperin DT, Nantulya V, Potts M, Gayle HD (2004); RL, Stoneburner (2004)). Other studies have shown that having one regular partner can reduce the risk of getting HIV infection (Mishra et al. (2007)). It has been argued that having concurrent sexual partners in a dense sexual network increases the risk of HIV infection by allowing the virus to spread rapidly to others (Halperin and. (2004); Helleringer and Kohler, 2006; Morris and Kretzschmar, 1997).
Studies in other African countries have yielded contradictory results concerning the relationship between education, occupation, and migration and multiple sexual partners among men and women. Some studies have found that higher levels of education or being in school are negatively associated with multiple sexual partners among both men and women.
We consider multinomial logit model that is often used in nominal polytomous data. This study discusses the general approach for logit modeling of clustered multinomial (ordinal or nominal) responses. The main purpose of this is to survey existing logit models for multi-category responses.
We extend models and estimation methods, for instance an appropriate link function for nominal responses is the baseline-category logit. The proportional odds model described by Mc Cullagh (1989) is a common choice for analysis of ordinal data. It characterizes the ordinal responses in R categories in terms of R-1 cumulative category comparison, specifically R-1 cumulative logits (i.e. log odds) of the ordinal responses.
1.2. Literature Review
Multinomial logit models have been used by many researchers in studying HIV outcome. Manning et al. (2003) conducted a study to model the effects of sexual partners’ characteristics on contraceptive use at first intercourse using the multinomial logistic regression.
Ngesa et al (2014) developed a Bayesian semi-parametric regression model for HIV prevalence data. In this study it was observed that circumcision reduces the risk of HIV infection by up to 4.5 times.
Blank (2011) conducted a study to evaluate the association between education, occupation, and migration and multiple sexual partnerships among men and women tested for HIV in Lüderitz, Namibia. In this study it was discovered that underlying sociological factors support the hypothesis that education, occupation, and migration are associated with multiple sexual partnerships in Lüderitz, Namibia.
Delavande and Kohler (2009) find a higher incidence of condom use and reduced number of sexual partners for positive testers in Malawi who learned their HIV infection status than for those who did not.
Mohammed (2013) studied statistical methods for analyzing complex survey data, an application to HIV/AIDS in Ethiopia. In his study, three statistical approaches were used to analyse the complex survey data. The first approach was a survey logistic regression used to model the binary outcome (HIV serostatus) and set of explanatory variables (HIV risk factors). The difference between survey logistic regression and the ordinary logistic regression is that survey logistic regression approach takes the study design into account during analysis. The second approach was a multilevel logistic regression model that assumed that the data structure in the population was hierarchical, and that individual within household was selected from clusters that were randomly selected from a national sampling frame. This study considered the results of a Frequentist and Bayesian multilevel models. The third approach used was a Small Area Estimation approach where model parameters were estimated under the Integrated Nested Laplace Approximation (INLA) paradygm. The study identified the key factors associated with HIV risk in Ethiopia. Survey logistic regression model was fitted to the male and female data sets. The age variable was found to be significant in terms of HIV prevalence. Women aged 20 to 49 years are significantly more affected than younger women aged 15 to 19 years.
2.1. Model Specification
Suppose Yi denote our response variable, that is, the number of sexual partners. Then the response Yi for the i-th individual has R categories, that is, Yi =1, 2… R. The probabilities associated with the response categories 1, 2... R are respectively … for the i-th individual.
Define; = Pr (Yi = r), the probability of the i-th individual whose outcome falls in the r-th category. To model the probabilities (i=1...n and r=1…R), we allow these probabilities to depend on a vector of covariates associated with the i-thindividual.
2.2. Baseline Category Logit Model
The baseline logit model is appropriate for nominal responses. The simplest approach to multinomial data is to nominate one of the response categories to act as a baseline or reference. The odds of being in the r-th category with reference to an arbitrary category R is and we calculate the log-odds expressed as a linear function of the predictors.
The r-th category for the i-th individual is given by;
In case , that is is not allowed to depend on the r-th category then,
Where a constant and βr is a vector of regression coefficients. We pick the last category as the baseline and calculate the odds that a member of group i fall into category r.
2.3. Cumulative Category Logit Model
Proportional-odds cumulative logit model is possibly the most popular model for ordinal data. This model uses cumulative probabilities up to a threshold, thereby making the whole range of ordinal categories binary at that threshold. As in the previous section, ,,...are the probabilities associated with response category for the i-th individual.The probabilities of the response Yi=1, 2... R, are expressed in cumulative probability of a response less than equal to r. The cumulative probabilities are given as
The odds of the r-th cumulative probability is given by; .
The Cumulative logits is given as the log of cumulative odds and is defined as;
The cumulative logistic regression is obtained by allowing cumulative logits to depend on covariates and is given by;
Here, are the category specific cut-offs (intercepts) satisfying.
If is a fixed effect for all logits, then we obtain the proportional odds regression model given by;
2.4. Model Diagnostic
The models are compared using the Akaike Information Criterion (AIC) which measures the goodness of fit and the complexity of the model. The preferred model is the one with the minimum AIC value. It is given by, Where L is the maximum likelihood value, k is the number of free parameters in the model and 2k refers to a penalty that is an increasing function of the number of estimated parameters in the model.
In the case of Bayesian approach, Deviance Information Criterion (DIC) is used which is useful for Bayesian model selection. It is valid when the posterior distribution is approximately multivariate normal.
Define the Deviance as where are the unknown parameters of the model and is the likelihood function. C is a constant that cancels out in all the calculations that compares different models, therefore does not need to be known.
The expectation is a measure of how well the model fits the data. The best fitting model is one with the smallest DIC. DIC value as suggested by Spiegelhalter et al (2002 p.587) is given by in which is the posterior mean of the deviance that measures the goodness of fit, and pD gives the effective number of parameters in the model which penalizes for complexity of the model. In this criterion, low values of indicate a better fit while small values of pD indicate a parsimonious model. pD = where is the expectation of .
This chapter presents results for the logistic regression of number of sexual partners on various regressors. A cumulative logistic regression to model the number of sexual partners was fitted.
The following covariates were considered: Age, Residence, Region, Level of Education, Marital Status, Ever tested and Condom use.
3.2. Demographic Profile
A total of 7701 male and 10239 female responded. A larger proportion (77.6%) male and (76.1%) female were from the rural areas. Age distribution across male and female is similar. Majority of the survey participants were aged between 15-24 (34%) and 30-39 (22.3%). The distribution of education status was similar, most respondents finished secondary and above (35%). There was a statistically significant difference between male and female on their marital status. A large proportion of male are either never married or cohabiting (37%) compared to women who were never married (23.1%). More female (6.8%) than male (4.2%) are divorced or separated. Among the widowed, female are six times more than male. Furthermore, there is also a significant difference between those that are currently married or cohabiting [male-57%; female-63%]. Overall, about two thirds of the respondents were married or cohabiting.
As can be seen from figure 2, majority of participants were from Eastern (17%) provinces with least coming from North Eastern (5%).
3.3. Bivariate Analysis
This section presents resents results from the bivariate logistic regression analysis. In this model each covariate is fitted independently, that is one at a time. The results of the bivariate logistic regression are represented in Table 1.
|P value||Odds Ratio||[95% OR CI]|
|No primary||0.000||0.625||(0.576 0.679)|
|Incomplete primary||0.001||0.889||(0.829 0.953)|
|Complete primary||0.000||1.436||(1.336 1.544)|
|Secondary + (ref)||-||1.000||-|
|never married/cohab||0.000||0.157||(0.147 0.169)|
|currently married/cohab (ref)||-||1.000||-|
|Used condom||0.000||1.552||(1.401 1.712)|
|Did not use condom(ref)||-||1.000||-|
|Ever tested HIV|
|Sexually active in past 12 months|
|Sexual Active past 12-months||0.000||10.493||(9.743 11.299)|
|Not sexually active past 12-months(ref)||-||1.000||-|
The chance of having multiple sexual partners was higher (thrice as much) for male than for female (OR: 3.214, 95%CI: 3.0302 to 3.409). Multiple sexual partners is negatively related to education. People with no primary education, incomplete primary and complete primary education are 0.625, 0.889 and 1.436 times likely to have multiple sexual partners compared to those with secondary education and above respectively.
Place of residence (Urban/Rural) was also found to be associated with multiple sexual partners. The odds of having multiple sexual partners are about 1.21 times greater for urban dwellers than for rural dwellers. Individuals that are sexually active for the past 12 months were found to be 10.493times more likely to have multiple sexual partners compared to those that are not (OR:10.493, 95% CI: 9.743 to 11.3).
People with the lowest, second, middle and fourth wealth index are 0.728, 0.865, 0.874 and 1.001 times likely to have multiple sexual partners compared to those the highest wealth index respectively. Those who use condoms were found to be 1.55 times likely to have multiple sexual partners compared to those who do not use condoms (OR:1.55, 95% CI:1.408 1.712).
The young age group of 15-24 is 0.156 times more likely to have multiple sexual partners compared to those that are between 60-64. Individuals who have ever tested for HIV are about 1.399 times likely to have multiple sexual partners as compared to those who have never tested.
Positive testers for HIV are 2.396 times more likely to have multiple sexual partners compared to those that have tested negative.
3.4. Cumulative Results
After adjusting for all factors, the odds of male to female having multiple sexual partners doubled to 7.6 meaning male are almost 8 times likely to have multiple sexual partners.
The likelihood for place of residence (Urban/Rural) was found to be significantly associated with multiple sexual partners after controlling for other factors.
The odds of having multiple sexual partners for urban dwellers to rural dwellers remained 1.2.
Multiple sexual partners was found to be negatively related to those that are ever tested for HIV after adjusting for all other factors. The odds of individuals who have ever tested reduced from 1.399 to 1.02 compared to those who have never tested.
HIV results also remained the same after adjusting and still highly significant as indicated by the odds ratio (OR: 2.4, 95% CI: 2.079 2.791). Sexually active in the past 12 months reduced after controlling for other factors but still remained highly significant.
3.5. Model Selection
The following sets of models were investigated in order to understand the effect of the observed covariates on the distribution of multiple sexual partners based on KAIS data. To come up with the best model, we first include age and sex since they are very important demographic factors as seen in the previous section
3.6. Obtaining a Reduced Model Using Stepwise Ordered Regression
A stepwise regression selection was performed and two variables were removed from the full model with probabilities greater than 0.2.
|P value||Odds Ratio||[95% OR CI]|
|No primary||0.000||0.422||(0.36 0.49)|
|Incomplete primary||0.000||1.246||(1.12 1.38)|
|Complete primary||0.000||1.251||(1.13 1.38)|
|never married/cohab||0.120||0.896||(0.78 1.029)|
|Used condom||0.104||1.113||(0.98 1.27)|
|Did not use condom||-||1.000||-|
|Ever tested HIV|
|Sexually active in past 12 months|
|Sexual Active past 12-months||0.036||3.161||(1.079 9.26)|
|Not sexually active past 12-months||-||1.000||-|
|Parameter||Estimate||Std. Err.||z||P>z||[95% conf interval|
|Age Category||0.173||0.016||11.02||0||[0.142, 0.204]|
|Wealth Index||0.058||0.015||3.92||0||[0.029 ,0.088]|
|HIV Status||1.053||0.074||14.25||0||[0.907 ,1.197]|
|Ever Tested||-0.087||0.041||-2.13||0.033||[-0.167 ,-0.007]|
|Cut-off 1||-10.15||0.901||-||-||[ -11.9,-8.38]|
|Cut-off 2||-2.346||0.56||-||-||[ -3.44,-1.25]|
|Cut-off 3||-0.212||0.559||-||-||[ -1.3, 0.88]|
Table 4 represents the parameter estimates for model () with significant levels. Condom use and place of residence were removed from the full model.
Comparing the goodness of fit and complexity of the models, model () with the least AIC and DIC values is the preferred model. The following were found to be potential risk factors of multiple sexual partners; Age, HIV results, Ever tested for HIV, Wealth index, Gender, Place of residence.
4. Discussion, Conclusion and Recommendation
Multinomial logistic regression has a discriminative performance and can accurately predict multiple sexual partners of an individual. The following were found to be potential risk factors of multiple sexual partners; HIV results, Ever tested for HIV, Wealth index, Gender, Place of residence. In this study we found that level of education was not associated with multiple sexual partners. This finding is supported by previous studies and therefore adds to the large body of research. The government’s introduction of free primary education and subsidized secondary education is hoped to increase the number of young people attaining higher level of education. We recommend use of other methods to model multiple sexual partners and include other variables like circumcision. We may also recommend use of KAIS 2012 using the multinomial logit modeling for multiple sexual partners to compare the results of two data sets. in this study it was found that wealth index is positively associated with multiple sexual partners and therefore we recommend self therapy, group therapy, counseling and HIV/AIDS awareness to be administered to people who fall under this cluster. Partner testing or couples testing is a main strategy of national testing initiatives in Kenya. Respondents are encouraged to learn their test results with their partner. Participants with negative test results are advised to seek further testing if they had engaged in unsafe risk behaviour after sample collection.
The findings of this study agree to large extent with others in the literature and could be used in the design of the policy and public health interventions to address trends in occurrence of the HIV epidemic and the relationship with multiple sexual partners. With respect to this finding, there will be hope for decline in HIV prevalence and less sexual partners due to more campaign programmes.