Application of spatial error model using GMM estimation in impact of education on poverty alleviation in Java , Indonesia

Java Island is the center of development in Indonesia, and yet poverty remains its major problem. The pockets of poverty in Java are often located in urban and rural areas, dominated by productive age group population with low education. Taking into account spatial factors in determining policy, policy efficiency in poverty alleviation can be improved. This paper presents a Spatial Error Model (SEM) approach to determine the impact of education on poverty alleviation in Java. It not only focuses on the specification of empirical models but also in the selection of parameter estimation methods. Most studies use Maximum Likelihood Estimator (MLE) as a parameter estimation method, but in the presence of normality disturbances, MLE is generally biased. The assumption test on the poverty data of Java showed that the model error was not normally distributed and there was spatial autocorrelation on the error terms. In this study we used SEM using Generalized Methods of Moment (GMM) estimation to overcome the biases associated with MLE. Our results indicate that GMM is as efficient as MLE in determining the impact of education on poverty alleviation in Java and robust to non-normality. Education indicators that have significant impact on poverty alleviation are literacy rate, average length of school year, and percentage of high schools and university graduates.


Introduction
Java Island is the center of development in Indonesia, even though poverty remains one of its major problems.This can be seen from the contribution of GDP of Java Island from 2008 to 2013, which has been consistent in the 56 percent rate [1].Still, in 2013 the majority of people who lived in poverty were concentrated in Java.As shown in Fig. 1, 54.45 percent of the total poor population in Indonesia live in Java [2].So, even though the Island has a significant contribution to the national economy, it still cannot get out of the poverty problem, with more than half of Indonesia's poor population living in the Island.
The pockets of poverty in Java are located in urban and rural areas, dominated by productive age group population with low education and productivity, which put this group at a disadvantage in the labor market.The increasing poverty rate in both urban and rural areas is due to the low quality of human resources, of which education is an important indicator [3], [4].Additionally, poverty is also associated with interregional spatial interactions associated with population mobility and spatial impoverishment [5].Crandall and Weber [6] explained that poverty has a spatial interaction.A region with high poverty rate would affect and be affected by other regions around it.[2] Taking into account this spatial interaction, a policy on poverty alleviation can benefit from and made more efficient by using spatial analysis.Numerous researches have incorporated spatial statistics when examining poverty.However, many of such research have focused on the specification of spatial statistical models rather than the selection of parameter estimation methods.Although the use of parameter estimation methods might not make as significant difference as model specification, the existence of assumption requirements and data conditions requires appropriate parameter estimation methods [7].To be best of our knowledge, existing spatial analyses do not take into account these assumption requirements explicitly -perhaps inadvertently.Spatial regression parameters estimation [8] can 21.68% 54.45% 7.00% 3.43% 7.49% 5.96% Sumatera Java Bali-Nusa Tenggara Kalimantan Sulawesi Maluku-Papua be obtained through several estimation methods such as Maximum Likelihood Estimation (MLE) and Generalized Method of Moments (GMM).These two parameters estimation methods are used for different assumption requirements.We select GMM as a parameter estimation method to overcome the bias associated with MLE.

Theoretical Framework
Poverty is defined as a lack of means necessary to meet basic needs such as food and non-food as measured by expenditure [2].People who live in poverty are the ones with an average consumption per capita per month below this established poverty line.Poverty can be measured in two dimensions: the monetary dimension that covers insufficient income or consumption; and non-monetary dimensions that covers insufficient outcomes with respect to human capital such as education, health, and nutrition [9].One of the efforts to reduce and cut the vicious cycle of poverty is to improve the education of the population [10].The low education level of the poor will lead to a vicious cycle of poverty in the next generation.People with low education will have low productivity; and low productivity will lead to low income, resulting in poverty.Poor households will find it difficult to finance their children's schooling so that it will produce the next generation with similar low education thus creating the unfortunate cycle of poverty.
Throughout the world, it has been found that the probability of finding employment rises with higher levels of education, and that earnings are higher for people with higher level of education [11].This connection between education and poverty works through three mechanisms.Firstly, more educated people earn more.Secondly, more (and especially better quality) education improves economic growth and thereby economic opportunities and income.Thirdly, education brings wider social benefits, such as economic development, which will have a positive ripple effect on the poor regions.The theoretical framework compiled by Janjua and Kamal [12] states that education has direct and indirect effects in poverty alleviation.From this theoretical framework, we consider education and skills of the individual as the direct effects in poverty alleviation.Fig. 2 depicts the impact of education on poverty alleviation.

Materials
The analysis of this study included all regencies/cities in Java Island.This study used secondary data from the 2013 Indonesian Central Bureau of Statistics (BPS) and Indonesian Ministry of Education and Culture for its 118 regencies/cities.Data used include: literacy rate (X1), average length of school year (X2), percentage of high schools and university graduates (X3), ratio of junior high school availability (X4), ratio of senior high school availability (X5), and poverty ratio (Y).Poverty ratio is the proportion of the poor population with total population in a region (regency/city).It indicated the incidence of poverty in a region, but ignored the differences in well-being between different poor households.
The data used in this study was aggregate data in every regency/city.This study was conducted at the spatial poverty level, not on the individual or household poverty.Meanwhile, the spatial data used in this study was derived from BPS mapping.Fig. 2. Impact of education on poverty alleviation [12]

Methods
The methods used in this research was Exploratory Spatial Data Analysis (ESDA) and inference analysis using spatial regression.The details of each analysis are explained below.

Exploratory Spatial Data Analysis
Exploratory Spatial Data Analysis (ESDA) was used to describe spatial patterns of poverty in Java Island.ESDA [13] is a collection of techniques to describe and visualize spatial distributions; identify spatial outliers; discover patterns of spatial association, clusters or hot-spots; and suggest spatial regimes or other forms of spatial heterogeneity.ESDA was applied based on Global Moran's I, Moran Scatterplot, and LISA statistics.To identify spatial patterns, spatial clustering association patterns, and outlier data, we used the following statistical techniques for exploratory spatial data analysis.

Constructing Spatial Weighted Matrices
The basic form of spatial weighted matrices is a square symmetric weighted matrices (denoted W) n × n (row standardized) matrices that define which areas are neighbors of a given area.Spatial weighted matrices is a weight denoting the strength of the connection between areas i and j.
In this study, we used contiguity-based relations based on modified queen contiguity.Contiguity-based relations are mostly used in the presence of irregular polygons with varying shape and surface, since contiguity ignores distance and focuses instead on the location of an area.It was appropriate for areas in Java that had irregular polygons.Queen contiguity defined a neighbor when at least one point on the boundary of one polygon is shared with at least one point of its neighbor (common border or corner).Queen contiguity weighted matrices (denoted W  ) is denoted as follows: In addition, the queen contiguity weighted matrices used in this study was manually modified to maintain connectivity between areas.Thus, separate cross-islands areas like Bangkalan and Surabaya or Pulau Seribu and Jakarta Utara still have access to interact as neighboring areas.Connectivity between areas based on modified queen contiguity weighted matrices was illustrated in Fig. 3.The obtained queen contiguity weighted matrices were transformed into the normality matrices (row-standardized), which is the spatial weighted value (denoted Wij) for each neighbor which forms the spatial weighted matrices W, according to following equation: W= [

Analyzing Global Indicators of Spatial Autocorrelation by Using Global Moran's I Statistics
Moran's I [14] was used in this study to determine whether the value of neighboring areas were more similar than would be expected under the null hypothesis.Mathematically, Global Moran's I statistics for n observation and i-th observation at the j-th location can be formulated in following equation: Under the randomization assumption (denoted R), the rates were random samples from a population whose distribution was unknown.Assumption R is less restrictive since their theoretical distribution is often unknown.The value of Global Moran's I ranges between -1 and 1.If I > E(I), then the spatial pattern is clustered indicating a positive spatial autocorrelation.If I = E(I), the pattern spread unevenly (no spatial autocorrelation), and if I < E(I), the pattern is diffused indicating negative spatial autocorrelation [15].

Analyzing Local Indicators of Spatial Autocorrelation by
Using LISA Cluster Map In contrast to the previously described Global Moran's I, which is a global indicators of spatial autocorrelation, LISA indicated local autocorrelation.In this case, LISA identified the relationship between a location of observation to another location of observation.Furthermore, the clustering of areas belonged to four types of spatial associations and visualized through the LISA cluster map [16].The four possible scenarios were as follows: a. Hot Spots, high-value location would be surrounded by high-value neighbors (high-high) b.Cold Spots, the low-value location would be surrounded by low-value neighbors (low-low) c.Outliers, high-value locations would be surrounded by low-value neighbors (high-low) d.Outliers, low-value locations would be surrounded by high-value neighbors (low-high) If the LISA cluster map showed 'not significant' results, it meant that the proximity of the area was not closely related to the events studied.

Spatial Regression
Spatial regression is closely related to the autoregressive process, indicated by the dependence relationship between a set of observations or locations.The relations could also be expressed with the location value depending on another neighboring location value.Spatial regression was used to analyze the impact of education, with predictor variables as follows: literacy rate (X1), average length of school year (X2), percentage of high schools and university graduates (X3), ratio of junior high school availability (X4), and ratio of senior high school availability (X5); and poverty ratio in Java in 2013 as response variable, taking into account spatial factors.
There are two common types of spatial regression: 1. Spatial Lag Model (SLM) SLM [17] is a model that combines a classic regression model with spatial lag in response variables using cross-sectional data so often called spatial lag model.The SLM is formed when ρ ≠ 0 and λ = 0.This model assumed that the autoregressive process only occurs in the response variable.The spatial lag model that was possible to be formed in this research is as follows: with ρ is spatial lag coefficient parameters on the response variable.

Spatial Error Model (SEM)
SEM [17] is a model in which the model-error has spatial correlation.The SEM is formed when ρ = 0 and λ ≠ 0. This model assumed that the autoregressive process only occurred in the model error.The spatial error model that was formed in this research was as follows: with λ as spatial error coefficient parameters on error u, and u as spatial error vector (n × 1).
To specify the appropriate model, we follow the steps illustrated in Fig. 4. In particular the focus was on detecting model specification due to spatial dependence (in the form of an omitted spatially lagged dependent variable and spatial residual autocorrelation).Four tests were performed to assess the spatial dependence of the model.The statistics were the simple LM diagnostics for a missing spatially lagged dependent variable (Lagrange Multiplier (lag)), the simple LM diagnostics for error dependence (Lagrange Multiplier (error)), variants of these robust to the presence of the other (Robust LM (lag) and Robust LM (error) which diagnoses for error dependence in the possible presence of a missing lagged dependent variable, Robust LM (lag) is the other way round).All modelling process were calculated by R programming using 'spdep' packages and GeoDaSpace, two softwares for advanced spatial econometrics.

Maximum Likelihood Estimator (MLE)
The underlying assumption of this estimator is the normal distribution of model errors, i.e., N(0.σ 2 I).The log-likelihood function and estimator of SLM was: The log-likelihood function and estimator of SEM was:

Generalized Methods of Moment (GMM)
The basic principle of GMM is to estimate β so that the moment of condition in the sample will be equal to the moment of condition in the population by minimizing objective function of the moment of sample condition.Where g(β ̂)= is the moment of sample condition,   is the instrument variable, and E(g(β))=0 is the moment of the population condition.Kelejian dan Prucha (in Anselin) [8] argue that GMM is as efficient as MLE.In addition, GMM is parameter estimator that did not require the normal distribution assumption for model errors as required by MLE.This results in the GMM estimator for SEM.The GMM estimator can be produced in three steps [18]: a. Build objective function of the moment of sample condition, that is the quadratic function of moment of sample condition based on specified spatial weighting matrices: b. Obtain a consistent but inefficient estimate of β by minimizing the objective function of the moment of sample condition as follows: c. Obtain a consistent and efficient estimate of β by minimizing the objective function of the moment of sample condition with an optimized spatial weighting matrices based on β ̂ [1] , as follows:

Results
Java had a wide spread poverty among its regions.Through ESDA, we could describe the spatial distribution pattern which included patterns of spatial association and identification of outlier data from poverty ratio.ESDA was applied based on: (1) Global Moran's I statistics score which described the effects of global spatial autocorrelation, and (2) LISA cluster map which described the effects of local spatial autocorrelation through spatial weighted matrices based on modified queen contiguity.The calculation results in Table 1 showed that there were statistically spatial autocorrelation effects in poverty among regions in Java in 2013.This was indicated by pseudo p-value less than 5%.The table also indicated that that I > E(I), which meant that the spatial pattern of poverty among regions in Java Island was clustered.
To determine which region has a significant effect on spatial association in general, we used LISA cluster map.As shown in Fig. 5, there were three significant spatial distribution patterns of poverty based on the LISA calculation results.To see the spatial effects on poverty in Java Island, we used spatial regression model.Before determining the appropriate model, multicollinearity diagnostics using Variance Inflation Factor (VIF) was conducted to see if there were correlations among the predictor variables.The presence of multicollinearity in certain predictor variables would cause greater of standard error and thus interfere the results of the analysis.If the VIF value is less than 10 then it can be concluded that there is no multicollinearity.If a multicolinearity is found in the model, one solution is to remove one of the variables from the model.The goal is to extract information that is already represented by the other predictor variables.The results of the multicollinearity diagnostics are displayed on Table 3. From the above table, appropriate predictor variables that could be used in this study were obtained.The average length of school year variable was removed because it had VIF > 10.There was a possibility that average length of school year (X2) correlated with percentage of high schools and university graduates variable (X3).After all the predictor variables were free from multicollinearity, we built the spatial regression model.Before determining the appropriate model, we did model specification between SLM and SEM.The results of model specification are displayed on Table 4. Model specification by LM and Robust LM diagnostics showed that Spatial Error Model (SEM) was better suited for this study.Additionally, the Robust LM (lag) value was smaller than Robust LM (error) value, and the Robust LM (error) was more significant than Robust LM (lag).Meanwhile, based on normality test results, data error in this study was not normally distributed.The result of normality test by Jarque-Bera test is displayed on Table 5.The p-value of Jarque-Bera is 0.0013 (i.e.reject null hypothesis where null hypothesis is error normally distributed).Spatial regression modelling by SEM was estimated using Generalized Method of Moments (GMM) estimator.The GMM was used because the normal distribution assumption for model errors was not met.The following table is a summary of the model.In terms of the estimates of standard errors of parameters, MLE produced slightly larger standard errors for the significant parameter estimates and slightly smaller standard errors for the non-significant parameter estimates than GMM.The MLE and GMM also produced slightly different results in the parameter estimates (i.e.λ ̂, β 1 ̂, β 3 , ̂ β 4 ̂, β 5 ̂).GMM, which was free of distributional assumption for the model errors, had pseudo R-squared comparable to that of MLE.The pseudo Rsquared of SEM GMM was 0.6002.The results indicated that GMM was better than MLE in terms of pseudo R-squared.Pseudo R-Squared is used to describe how close the data to the fitted regression line.In this study, these results meant that the SEM GMM with variables of literacy rate (X1), percentage of high schools and university graduates (X3), ratio of junior high school availability (X4), and ratio of senior high school availability (X5) could explain 60.02% the variability of the poverty ratio in Java.
The variance of each parameter estimator for the parameter β 1 ̂, β 3 , ̂ β 4 ̂, β 5 ̂, and λ ̂ was also computed for MLE and SEM GMM (Table 7).Theoretically, MLE is most efficient (producing lowest variance) if the normality assumption is met.But in this study, MLE produced much larger variance than GMM for the significant parameter.Under the non-normality, GMM was better in terms of the variance than MLE.This results indicated that GMM was as efficient as MLE and robust to non-normality.
Based on the results obtained in Table 6, the SEM equation formed is as follows: Significance test of model parameter estimates in Table 6 showed that the variable of literacy rate (X1), percentage of high schools and university graduates (X3), and spatial error (λ) had significant effect to poverty ratio in Java in 2013.The significant coefficient λ indicated that autoregressive process on model error significantly influenced the regions' poverty ratio in Java in 2013.If the variable of literacy rate (X1), percentage of high schools and university graduates (X3), ratio of junior high school availability (X4), and ratio of senior high school availability (X5) was ignored or equal to zero, the poverty ratio in Java was estimated at 54.09%.Assuming the condition of other variables is constant, the increase in literacy rate in a region by 1% can reduce the poverty ratio of a region by 0.3853%.Similarly, if the percentage of high schools and university graduates rose 1% in a region then the poverty ratio in the region will reduce by 0.1282%.The variables of ratio of junior high school availability (X4) and ratio of senior high school availability (X5) also had a negative relationship with poverty ratio.That is, the higher the ratio of school availability to both junior and senior high school, the lower the poverty ratio.However, the variable of ratio of junior high school availability and ratio of high school availability had no significant effect on poverty ratio.

Discussion
Based on the results in Table 2, regions with high poverty tended to be surrounded by regions with high poverty as well, and vice versa, regions with low poverty tended to be surrounded by similar low poverty.This phenomenon has been described by Crandall and Weber [6] in which they argued that poverty has a spatial interaction.However, we also found the outliers in this phenomenon.There were three regions which belonged to low-high clusters, low poverty regions surrounded by high poverty neighbors.The regions were: • Tegal Regency (10,75%)  ) as highpoverty neighbors.This condition might lead to two possible scenarios: the lowpoverty region affects or be affected by the high-poverty neighbors.Which of these two scenarios will likely to happen depends on many factors.
One of the contributing factors is literacy rate.According to this model, a significant increase in literacy rate could reduce poverty ratio.As Murray and Shillington [19] describe, a person with low literacy skill tends to be unsuitable for a job compared to those with higher literacy skill.Regions with higher literacy rate have a population with a higher chance of entering the labor market and earning income so as to avoid poverty.In aggregate, it can reduce the poverty ratio in a region.In addition, an increase in the percentage of high schools and university graduates also has a significant impact on poverty alleviation.Silva [20] documents that poverty declines with increasing years of education.Increasing one year of education will increase the human capital.The increased human capital will contribute negatively to the possibility of being in poverty.
The variables of ratio of junior high school availability and ratio of senior high school availability also had a negative relationship with poverty ratio, but had no significant effect on poverty ratio.This might be due to the difference in calculation approach of poverty ratio and the ratio of school availability in both junior and senior high schools.The poverty ratio calculation used a household approach, while the school availability ratio of both junior and senior high schools used an individual approach of school age (ages 13 to 15 years for junior high school and 16 to 18 years old for high school).Increasing the ratio of school availability to both junior and senior high schools would only affect the increased chance of a certain school-age population to have a certain education.When there is a decrease in one poor household in a region, then its poverty ratio will also decrease.Meanwhile, when a household in a region in which a household member has the opportunity to go to school and receive a certain education due to an increase in the number of schools, the poverty level in the region does not necessarily decrease, but the school availability ratio will.
Based on the resulting spatial error equation in Equation 23, the poverty of a region would increase by a multiple of 0.5114% of the spatial weighting of each region, if the average error of the neighboring region rises by 1%.For example, Sampang Regency had a spatial weighting with its neighboring area of 0.50 (provided in Appendix A) and the spatial error equation of Sampang was: u Sampang =0.5114 ∑ 0.50 3 j=1,i≠j u j u Sampang =0.2557u Bangkalan +0.2557u Pamekasan Sampang Regency had two neighboring regions: Bangkalan Regency and Pamekasan Regency.If one or all of regions variable of error (uj) was increased so that the average of all neighboring regions were increased by 1%, then Sampang Regency would get the effect of increased poverty rate of 0.2557%.These results meant that there were influences of the predictor variables other than the ones used in this study from the neighboring areas.
These results were consistent with those presented by Henninger and Snel [21], in which they argued that spatial variations in poverty level are often caused by factors with spatial dimension of the surrounding areas.In this study, Sampang regency was the region with the highest poverty rate and lowest literacy rate and percentage of high schools and university graduates.Furthermore, Sampang regency was also surrounded by high-poverty regions.Meanwhile Tangerang Selatan city was the region with the lowest poverty rate and high literacy rate and percentage of high schools and university graduates, so as Yogyakarta city and Cimahi city.They were also surrounded by low-poverty regions.
Based on these results, the government can improve the efficiency and effectiveness education and poverty alleviation policy by paying more attention to the cluster of high poverty and low education region.

Conclusion
We compare MLE and GMM parameter estimation methods for spatial error model.Our results indicate that SEM using GMM estimation significantly better than MLE, in terms of pseudo R-squared and variance under the nonnormality.However, in terms of model specification, the results do not make a significant difference between GMM and MLE.The estimates of parameters and its standard error have a slight difference.These results indicate that GMM is as efficient as MLE and robust to non-normality.Therefore, the selection of the parameter estimation methods may depend on the distribution of data and variables, as well as the purpose of the specific research.
Based on these results, education indicators that have significance impact to poverty alleviation in Java are literacy rate, average length of school year, and percentage of high schools and university graduates.By ESDA, there were positive spatial autocorrelation effects in poverty among regions in Java in 2013 so as to form the clusters of poverty regions.
This study showcases one alternative to spatial statistics and parameter estimation methods besides the commonly used MLE, and compare it with other methods like GMM.Future studies can also test other alternatives such as Quasi Maximum Likelihood.

Fig. 3 .
Fig. 3. Connectivity between areas based on modified queen contiguity weighted matrices

Fig. 5 .
Fig. 5. Poverty cluster in Java Island based on LISA cluster map

Table 1 .
Global Moran's I statistics calculation

Table 2 .
Significant spatial distribution patterns of poverty based on LISA cluster map

Table 5 .
Normality test result

Table 6 .
Model parameter estimates, estimates of standard error of the parameters, and pseudo R-Squared of SEM by MLE and GMM

Table 7 .
Variance of estimators in SEM by MLE and GMM