An Alternative Method of Estimation of SUR Model
Shohel Rana^{*}, Mohammad Mastak Al Amin
Department of Mathematics and Natural Sciences, BRAC University, Dhaka, Bangladesh
Email address:
To cite this article:
Shohel Rana, Mohammad Mastak Al Amin. An Alternative Method of Estimation of SUR Model. American Journal of Theoretical and Applied Statistics. Vol. 4, No. 3, 2015, pp. 150-155. doi: 10.11648/j.ajtas.20150403.20
Abstract: This paper proposed a transformed method of SUR model which provided unbiased estimation in case of two and three equations of high and low co-linearity for both small and large datasets. The generalized least squares (GLS) method for estimation of seemingly unrelated regression (SUR) model proposed by Zellner (1962), Srivastava and Giles (1987),provided higher MSE. Although the Ridge estimators proposed by Alkhamisi and Shukur (2008) provided smaller MSE in comparison with others, it was not unbiased in case of severe multicollinearity.This study showed that our proposed method typically provided unbiasedestimator with lower MSE and TMSE than traditional methods.
Keywords: SUR Model, GLS, MSE, TMSE
1. Introduction
A set of equations that might be related not because of they interact, but also their error terms were related.A seemingly unrelated regression (SUR) system comprises several individual relationships that were linked by the fact that their disturbances were correlated. There were two main motivations for use of SUR. The first one was to gain efficiency in estimation by combining information on different equations and second motivation was to impose and/or test restrictions that involved parameters in different equations. The usual assumed requirement for the estimation of SUR model might be paraphrased as the sample size must be greater than the number of explanatory variables in each equation and at least as greater as the number of equations in the system. Such a statement was flawed in two respects. First, sometimes the estimators required more stringent sample size than impliedby this statement. Second, the different estimators might have different sample size requirements. Thevariance component model resulted in a certain type of correlation among the residuals. The residuals for each cross-section unit were correlated over time, but the residuals for different cross-section units were uncorrelated. The type of correlation would arise if each cross-section unit had a specific time invariant variable omitted from the equation. In the Seemingly Unrelated Regression model introduced by Zellner(1962), the residual were uncorrelated over time but correlated across cross-section units.
Mathematically it showed in the following form,
Cov(e_{it,}e_{js})=σ_{ij,}if t=s
= 0 ,if ts
This type of correlation arised if there were some omitted variables that were common to all equations. Both these models, in principle, be extended to include the other types of correlation. Also, in both the models it was possible to apply tests for equality of the slope coefficients before any pooling was done. For the Seemingly Unrelated Regression model first we estimated each equation separately by ordinary least square (OLS) method. After that, we obtained the estimated residuals e_{it}. From these estimated residuals we computed the estimation of covarianceσ_{ij.}
Where,σ_{ij}=
Where k was the number of regression parameters estimated. After we estimatedσ_{ij, }we re-estimated all the N cross-sectional equations jointly, using generalized least square method.
A number of methods were available for estimation of SUR type of models. Such as ordinary least squares (OLS) method, generalized least squares method (GLS) proposed Zellner (1962), generalized least squares method (GLS) proposedSrivastava and Giles (1987), SUR ridge regression method proposed M. A. Alkhamisi and G.Shukur(2008), optimality of least squares in the SUR Model proposed Dwivedi T. D, Srivastava V. K. (1978) etc. There were some limitations of existing methodswhichgave large MSE in case of high multicollinearity in the data set, for a large number of cross-section units the methods were not reasonable and might be affected by the common omitted variables.In this study we suggested a new method which would be able to estimate the SUR model more efficiently and the new approach might be expected to be superior to the traditional methods.
2. Literature Review
In econometrics, the seemingly unrelated regressions (SUR) or seemingly unrelated regression equations (SURE) model, proposed by Arnold Zellner in (1962) and in (1963),Stewart G. W. (1980) and Parks R. W.(1967) were a generalization of a linear regression model that consisted of several regression equations, each having its own dependent variable and potentially different sets of exogenous explanatory variables. Each equation was a valid linear regression on its own and couldbe estimated separately, which was why the system was called seemingly unrelated, although some authors suggested that the seeminglyrelated term would be more appropriate, since the error terms were assumed to be correlated across the equations. The model would be estimated equation-by-equation using standard ordinary least squares (OLS). Such estimates were consistent, however generally not as efficient as the SUR method, which amounts to feasible generalized least squares with a specific form of the variance-covariance matrix. Two important cases when SUR was in fact equivalent to OLS, were either when the error terms in fact uncorrelated between the equations (so that they were truly unrelated), or when each equation contained exactly the same set ofregressorson the right-hand-side. The SUR model could be viewed as either the simplification of the general linear model where certain coefficients in matrix Β were restricted to be equal to zero, or as the generalization of the general linear model where the regressors on the right-hand-side were allowed to be different in each equation. M. Hubert, T. Verdonck and O. Yorulmaz (Priprint) proposed a fast algorithm, FastSUR, and show its good performance in a simulation study and diagnostics for outlier detection and illustrate them on a real data set from economics. They focused on the General Multivariate Chain Ladder (GMCL) model that employs SUR to estimate its parameters. O. B. Ebukuyo, A. A. Adepoju and E. I. Olamide (2013) examined the performances of the SUR estimator with varying degree of AR(1) using Mean Square Error (MSE), the SUR estimator performed better with autocorrelation coefficient of 0.3 than that of 0.5 in both regression equations with best MSE. Z. Zeebari and G. Shukur (2012) examined the application of the Least Absolute Deviations (LAD) method for ridge type parameter estimation of Seemingly Unrelated Regression Equations (SURE) models. M. El-Dereny and N. I. Rashwan (2011) has solved the equation in case of multicollinearity by Ridge Regression model, but not solving the SUR model in presence of multicollinearity in the data set.
3. Methodology
Different forms of generalized least squares method for the estimation of SUR model had been verified. Theoretical aspects of proposed methods such as GLS_{1,} GLS_{2}, and GLS_{3} for estimating SUR model had been described. We also showed the unbiasedness and variance property of each proposed estimator. It hadbeen found that GLS_{3} estimator provided less variance and less MSE compared to other proposed estimators. This study showed that the proposed method typically provided unbiased estimator with lower MSE and TMSE than traditional methods in case of severe multicollinearity. The methods are as follows:Let us assumed that there were Nresponse variables each with T observations denoted by vectors y_{1}, y_{2,}…... ,y_{t}with associated explanatory variables x_{1},x_{2},……..,x_{t} respectively. One way of fitting these models was to treat them as unrelated multiple regression models of the form,
Yi=Xi+ei (3.1)
Where was a vector of unknown regression parameters and e_{i} was a vector of random errors with each element having variance σ^{2}_{i} for i=1, 2,…………….., N
Let,
X= ,Y= , = , e=
By assumption,
E(ei ej´)=σijI i, j=1,2,……..,N
Where,σ_{ij}=andE(ee^{´})= Σ Ä I_{T}was thecovariance matrix capturing the variances and covariance of the random error terms of (3.1), then the SUR form of this model was
Y=X+e (3.2)
Therefore SUR formulation of the regression models produced more efficient regression parameter estimates using proposed generalized least squares.
Some properties of proposed GLS estimators follow:
GLS_{1}: Let us considered, the following transformation,
Y* = ^{ }(DÄI_{T})Y X* = (DÄI_{T})X e* = ^{ }(DÄI_{T})e
Where, D was any orthogonal matrix. [Ali, M. I. (1984)] and Ä was a kronecker product(Anderson T. W 1984). Using the above transformation the model in (3.2) be expressed as,
Y* =X*+ e* (3.3)
Where, Y* and e* were NT×1 vectors, X* was an NT×n matrix.
E(e* e*´)=ΣI_{T} and E(e*)=0
Then the GLS_{1} estimator of in (3.1) was
(3.4)
Where, D was an orthogonal matrix.
Theorem:_{GLS1}was an unbiased estimator of.
E (_{GLS1}) =
Theorem:V(_{GLS1})= E[{_{GLS1}-E()} {_{GLS1}-E()}´]Rahman M. (2008)
=(X´(D^{2}ÄI_{T})X)^{-1} X´(D^{2}ÄI_{T})(ΣI_{T} )(D^{2}ÄI_{T})X (X´(D^{2}ÄI_{T})X)^{-1}
GLS_{2} :Let us considered, the following transformation,
Y* = (S^{-1} ÄI_{T}) Y X* = XS^{-1} e* = eS^{-1}
Using the above transformation the model in (3.2) be expressed as,
Y* =X*+ e* (3.5)
Where, Y* and e* were NT×1 vectors, X* was an NT×n matrix.
E (e* e*´)=ΣI_{T} and E(e*)=0
When Σ was known then the GLS estimator of ß in (3.5) was
_{GLS2} = (X*´ X*)^{-1} X*´ Y*
= (S^{-1}X´XS^{-1})^{-1}S^{-1}X´(S^{-1} ÄI_{T})Y
When the covariance matrix Σ (Alan J. L 2004)was unknown, a feasible generalized least squares (FGLS) (Johnston J, DiNardo J. 1963, 1972 and 1984) estimator was defined by replacing the unknown Σ with a consistent estimatewas given by,
= =
Then,
_{GLS2}= (S^{^}^{-1}X´XS^{^}^{-1})^{-1}S^{^}^{-1}X´ (S^{^}^{-1} ÄI_{T})Y (3.6)
Theorem:_{GLS2} was not an unbiased estimator of .
E(_{GLS2})
Theorem:V(_{GLS2})= E[{_{GLS2}-E()} {_{GLS2}-E()}´]
= (S^{^}^{-1}X´XS^{^}^{-1})^{-1}S^{^}^{-1}X´(S^{^}^{-1} ÄI_{T})XS^{^}^{-1} (S^{^}^{-1}X´XS^{^}^{-1})^{-1}
GLS_{3} :Again, let us considered, the following transformation,
Y* = (S^{-1} ÄI_{T}) Y, X* = (S^{-1}ÄI_{T}) X, e* = (S^{-1}ÄI_{T})e
Using the above transformation the model in (3.2) be expressed as,
Y* =X*+ e* (3.7)
Where, Y* and e* were NT×1 vectors, X* was an NT×n matrix
E(e* e*´)=ΣI_{T} and E(e*)=0
When Σ was known then the GLS_{3} estimator of in (3.7) became
_{GLS3} =(X*´ X*)^{-1} X*´ Y*
= [{(S^{-1} ÄI_{T})X}´{(S^{-1} ÄI_{T})X}]^{-1}{(S^{-1} ÄI_{T})X}´{(S^{-1} ÄI_{T})Y}
= (X´S^{-2}X)^{-1} X´S^{-2}Y (3.8)
When the covariance matrix Σ was unknown a feasible generalized least squares (FGLS) estimator was defined by replacing the unknown Σ with a consistent estimate was given by,
= =
Then,
_{FGLS3}=(X´S^{^}^{-2}X)^{-1} X´S^{^}^{-2}Y (3.9)
Theorem:_{GLS3} was an unbiased estimator of.
E (_{GLS3}) =
Theorem: V(GLS3)= E[(GLS3-) (GLS3-)´]
={X´(S^{^}^{-2}ÄI_{T})X}^{-1} {X´(S^{^}^{-3}ÄI_{T})^{ }X}{X´(S^{^}^{-2}ÄI_{T})X}^{-1}
It had been found that GLS_{3} estimator provided less variance and less MSE compared to other proposed estimators such as GLS_{1} and GLS_{2,} so that,
V (_{GLS3}) < V (_{GLS2}) <V (_{GLS1})
The results were verified using real data and simulated data. The empirical results were presented in Table 1, 2. The results were also compared with the aid of graphs.
Types of Data | Observation | OLS | GLS(Zellner, 1962) | GLS(Srivastava and Giles,1987) | Ridge Estimator for SUR Model | GLS1 | GLS2 | GLS3 | |
Based on Real Data | T=8 | MSE | 0.00588 | 0.000012 | 0.000012 | 0.000020 | 0.00970 | 0.00000205 | 0.000000073 |
TMSE | 0.13370 | 0.133672 | 0.133672 | 0.133672 | 0.13370 | 0.44457 | 0.22764 | ||
T=16 | MSE | 50.3009 | 0.044025 | 0.044025 | 0.044025 | 82.4437 | 0.00289 | 0.00524 | |
TMSE | 0.04873 | 0.048655 | 0.048655 | 0.048654 | 0.04873 | 0.06082 | 0.04907 | ||
T=32 | MSE | 203.058 | 0.092769 | 0.092769 | 0.092770 | 269.283 | 0.01521 | 0.01879 | |
TMSE | 0.02042 | 0.020404 | 0.020404 | 0.020404 | 0.02042 | 0.02274 | 0.02042 | ||
Based on Simulated Data | T=8 | MSE | 0.55654 | 0.710668 | 0.770346 | 1.314708 | 0.75496 | 0.81591 | 0.51844 |
TMSE | 0.00110 | 0.001073 | 0.001069 | 0.001076 | 0.00107 | 0.00041 | 0.00119 | ||
T=16 | MSE | 0.88032 | 0.765417 | 1.045613 | 0.832161 | 0.93289 | 1.11909 | 0.70974 | |
TMSE | 0.00052 | 0.000513 | 0.000511 | 0.000509 | 0.00051 | 0.00041 | 0.00053 | ||
T=32 | MSE | 0.95541 | 1.081081 | 0.775164 | 0.896703 | 0.99046 | 0.94065 | 1.03722 | |
TMSE | 0.00021 | 0.000212 | 0.000214 | 0.000214 | 0.00022 | 0.00041 | 0.00021 |
Types of Data | Observation | OLS | GLS(Zellner, 1962) | GLS(Srivastava and Giles,1987) | Ridge Estimator for SUR Model | GLS1 | GLS2 | GLS3 | |
Based on Real Data | T=8 | MSE | 340.019 | 21.70766 | 21.70766 | 21.70767 | 192.335 | 5.18659 | 5.18277 |
TMSE | 0.17382 | 0.121873 | 0.121873 | 0.121873 | 0.17382 | 1.98253 | 0.28204 | ||
T=16 | MSE | 720.778 | 0.938033 | 0.93803 | 0.93803 | 49.4244 | 0.05963 | 0.09554 | |
TMSE | 0.06658 | 0.066237 | 0.06624 | 0.066237 | 0.06658 | 0.08839 | 0.06713 | ||
Based on Simulated Data | T=8 | MSE | 0.81563 | 0.975524 | 1.10023 | 1.64572 | 1.24147 | 1.13131 | 0.41326 |
TMSE | 0.00118 | 0.001315 | 0.001177 | 0.00159 | 0.00173 | 0.00210 | 0.00160 | ||
T=16 | MSE | 1.04617 | 0.897032 | 1.05750 | 0.81005 | 1.20330 | 1.22188 | 0.92706 | |
TMSE | 0.00082 | 0.000640 | 0.00051 | 0.000656 | 0.00054 | 0.00068 | 0.00076 |
4. Sources of Data
The data set was collected from a secondary sources the issues of the Federal Reserve Bulletin by G. S. Maddala (1988) : p. 364-365. Another data set was collected from the book of Introduction to Econometrics by D. N. Gujarati (1995): p. 351-353. Both the data set had severe multicollinearity and hence checked by different methods. In this paper considered variables were wage income, non-wage, the price of alternative financing to firms and production index. There were two independent variables, x_{1} wage income, x_{2} non-wage for the first set and x_{1} represented the price of alternative financing to firms, x_{2} represented industrial production index and represented firms’ expectation about future economic activity for the second set to estimate the two equations SUR model. Again we used three independent variables, x_{1} wage income, x_{2} non-wage, x_{3} farm income for the first set and x_{1} represented the price of alternative financing to firms, x_{2} represented industrial production index and represents firms’ expectation about future economic activity, and x_{3} represented average prime rate charged by banks for the second set to estimate three equations SUR model. We analyzed the data by using the software R-Language (Version-2.9.2).
5. Empirical Analysis
Algorithms for data simulation of seemingly unrelated regressions (SUR) model:
Step 1:For two equations, we had considered starting values the parameters (,) which were obtained from the real data by OLS methods. Based on these values we had simulated data for T=8, 16 and 32 observations.
Step 2:For three equation, we had considered starting values of the parameters (,,) which were obtained from the real data by OLS methods. Based on these values we had simulated data for T=8 and 16 observations.
Step 3: By the similar way we repeated the simulation 1000 times and we got 1000 estimates for each parameter. Then we took mean of the simulated estimates for each parameter.
Step 4: These estimates were presented in a tabular form.
Step 5: The above procedures were repeated for two equations SUR model and three equations SUR model.
6. Statistical Results
From Table 1 it was seen that when the multicollinearity was high the MSE of two equations SUR model was larger by the methods of OLS (0.00588) and GLS (Zellner, 1962) (0.000012). The MSE obtained by the proposed method GLS_{3}(0.000000073) was smaller than the other methods of estimation of two equations SUR model based on real data. But for the small observations (T=8) the TMSE in GLS_{3 }method of estimation of two equations SUR model was a small amount of outsized than others based on real data. It was seen that if we increased sample size, then the MSE’s reduced but the TMSE’s were approximately equal to the others on the basis of real data. If the sample size increased more, then the MSE’s and TMSE’s declined in case of proposed method GLS_{3} than other methods of estimation of two equations SUR model based on simulated data. Hence the table 1 showed that the method of GLS_{3} gave better estimate of SUR model in both cases of real data and simulated data with respect to MSE and TMSE criterion.
Figure 1 showed that the MSE by the proposed method GLS_{3} was smaller in comparison with other methods.
It was evident from Fig. 2 that the TMSE were approximately same for the different methods of estimation of two equations SUR model. From the figure we also seen that if we increased sample size, then TMSE’s increased in case of each methods for T<30 but the TMSE’s declined for each methods for T>30,while for extremely large observations the TMSE declined for the methods of estimation of two equations SUR model based on different generating samples.
Table 2indicated that the MSE and TMSE of three equations SUR model were larger than the two equations SUR model based on both real data and simulated data.
Figure 3 showed that the MSE by the proposed method GLS_{3} was smaller in comparison with other methods.
Figure 4 represented that the TMSE approximately same to the different methods of estimation of two equation SUR model. From the figure it was seen that if we increased sample size, then TMSE’s increased in case of each methods for T<30 but the TMSE’s declined for T>30 and for extremely large observations the TMSE was strictly declined by the methods of estimation of three equations SUR model based on different generating samples.
7. Results and Discussion
It had been found that the ordinary least squares (OLS), generalized least squares (GLS) by Zellner (1962), generalized least squares (GLS) by Srivastava and Giles (1987) all were unbiased, but the SUR ridge estimator by M. A. Alkhamisi and G. Shukur (2008) was not unbiased. We had computed their variances and found that SUR ridge estimator by M. A. Alkhamisi and G. Shukur (2008) provided less MSE compared to others. We had also discussed multicollinearity, causes of multicollinearity, consequences, detection and removal methods of multicollinearity in brief.MSE and TMSE criterion had been used to measure the goodness of SUR estimators. We described theoretical concepts of our proposed methods viz. GLS_{1,} GLS_{2 }and GLS_{3} for estimating SUR model for two and three equation. The proposed estimators were mainly defined on the basis of transformation or modification made in variables/matrix.
The simulation results supported the hypothesis that the number of equations, the number of observations per equation, the correlation among explanatory variables and equations were the main factors that affected the inferential properties of SUR estimators. The fitness of the models were verified to the real data and simulated data. The goodness of the proposed models had been computed in terms of MSE and TMSE.
The results showed that the MSE of GLS_{3} of the SUR estimator was consistently lower than the other existing estimators. Therefore, the GLS_{3} estimator performs better than other estimators when the errors were correlated between the equations and this could be considered as the best estimator of SUR model.
8. Conclusion
This study provided an approach to fitting SUR models when faced with some difficulties. Several methods of handling these were explored here and the simple approach of applying to estimate the SUR model by conditioning on all observations and iterating until estimates GLS_{3} method was computationally efficient and reasonably accurate.
Finally, under certain conditions we might be suggested GLS_{3} as one of the good estimators to estimate the SUR (seemingly unrelated regression) model in the presence of high multicollinearity. We also suggested that the orthogonal transformation (GLS_{1}) was less efficient to estimate the SUR model. Our study concluded that we would use our proposed estimator GLS_{3} in any type of real data (except time series data) for the best fitting of SUR model in case of severe multicollinearity.
Hence the proposed method (GLS_{3}) could be gained in estimator accuracy to other methods for small and large sample observations in terms of bias MSE and TMSE criteria.
The practical applications of the seemingly unrelated regression (SUR) model where the proposed method of estimation can be applied in order to obtain better forecasting through efficient estimation of parameters involved are mentioned below:
i. SUR model may be used to predict or forecast the total commercial loan on different causes such as average prime rate changed by bank, bank rate, total bank deposits etc.
ii. SUR model can be applied to an environmental situation with missing data and censored values.
iii. SUR model may be more appropriate to predict farm’s ability in meeting their current and anticipated obligations in the next 12, 9 and 3 months etc.
iv. SUR model may be applied for any type of simultaneous regression equations where their error terms are highly correlated.
References