Blog Post Four

2021-04-02
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)

Loading the Data

After our recent blog posts, we have a better understanding of the metrics and relationships between HIPC and OECD member countries. Building on those, we have decided to narrow our focus on HIPC countries and begin building a model to understand the factors that lead to gdp growth.

For this intial model building exercise we have selected:

  • Adjusted net enrollment rate, primary (% of primary school age children)
  • Access to electricity (% of population)
  • Inflation, consumer prices (annual %)

To predict: GDP per capita (constant 2010 US$)

We believe these factors provided in the WDI database give a reasonable picture of economic growth of a country that should impact the GDP of a growing country.

data <- read.csv('~/Documents/courses/ma415/final-project-data/blog-4-data-final.csv')
full_subset <- data %>% select(X2003..YR2003.:X2018..YR2018.)
series_name = data$Series.Name

We use a subset of the data were all factors are available over a period of almost two decades. We continue to clean and prepare the data for a simple multivariable regression model.

access_elec   <- as.numeric(as.character(full_subset[1,]))

# We remove the following as they are redudant in our model
#com_edu       <- as.numeric(as.character(full_subset[3,])) 

primary_enrol <- as.numeric(as.character(full_subset[2,])) 
gdp           <- as.numeric(as.character(full_subset[4,])) 
inflation     <- as.numeric(as.character(full_subset[5,])) 

regression_data <- data_frame(access_elec=access_elec, primary_enrol=primary_enrol, inflation=inflation, gdp=gdp)
## Warning: `data_frame()` is deprecated as of tibble 1.1.0.
## Please use `tibble()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
head(regression_data)
## # A tibble: 6 x 4
##   access_elec primary_enrol inflation   gdp
##         <dbl>         <dbl>     <dbl> <dbl>
## 1        23.6          63.0      5.30  646.
## 2        24.3          65.6      4.57  662.
## 3        22.3          68.2      8.45  682.
## 4        25.8          70.7      6.78  704.
## 5        34.4          72.8      7.60  728.
## 6        28.0          76.0     11.0   750.

Before fitting our model we inspect the plots of each individual variable and observe a generally linear trends:

plot(gdp ~ access_elec + primary_enrol + inflation, regression_data)

Fitting the Model

fit <- lm(gdp ~ access_elec + primary_enrol + inflation, data = regression_data)
summary(fit) # show results
## 
## Call:
## lm(formula = gdp ~ access_elec + primary_enrol + inflation, data = regression_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -39.268  -4.092   1.968   8.525  23.707 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    -65.367     77.150  -0.847  0.41342    
## access_elec      4.873      1.230   3.962  0.00189 ** 
## primary_enrol    9.756      1.354   7.204 1.08e-05 ***
## inflation       -5.906      2.566  -2.302  0.04005 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17.68 on 12 degrees of freedom
## Multiple R-squared:  0.9697, Adjusted R-squared:  0.9622 
## F-statistic: 128.2 on 3 and 12 DF,  p-value: 2.225e-09

After fitting the model we can see form the incredibly high R^squared value that this model captures the data very well. We also observe that all covariates are individually significant at the 95% level. Note that the coefficient estimates also follow intuition as education and electricity access increases so does economic activity measured as GDP. Inflation as is observed to have a negative impact on GDP so we capture postive and negative relationships in out data.

We believe this is a good beginning into the modeling we can do on this database and will be continuing to explore more complex and interesting relationships in the data.