Blog Post Four

2021-04-02

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)

Loading the Data

After our recent blog posts, we have a better understanding of the metrics and relationships between HIPC and OECD member countries. Building on those, we have decided to narrow our focus on HIPC countries and begin building a model to understand the factors that lead to gdp growth.

For this intial model building exercise we have selected:

Adjusted net enrollment rate, primary (% of primary school age children)
Access to electricity (% of population)
Inflation, consumer prices (annual %)

To predict: GDP per capita (constant 2010 US$)

We believe these factors provided in the WDI database give a reasonable picture of economic growth of a country that should impact the GDP of a growing country.

data <- read.csv('~/Documents/courses/ma415/final-project-data/blog-4-data-final.csv')

full_subset <- data %>% select(X2003..YR2003.:X2018..YR2018.)
series_name = data$Series.Name

We use a subset of the data were all factors are available over a period of almost two decades. We continue to clean and prepare the data for a simple multivariable regression model.

access_elec   <- as.numeric(as.character(full_subset[1,]))

# We remove the following as they are redudant in our model
#com_edu       <- as.numeric(as.character(full_subset[3,])) 

primary_enrol <- as.numeric(as.character(full_subset[2,])) 
gdp           <- as.numeric(as.character(full_subset[4,])) 
inflation     <- as.numeric(as.character(full_subset[5,])) 

regression_data <- data_frame(access_elec=access_elec, primary_enrol=primary_enrol, inflation=inflation, gdp=gdp)

## Warning: `data_frame()` is deprecated as of tibble 1.1.0.
## Please use `tibble()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.

head(regression_data)

## # A tibble: 6 x 4
##   access_elec primary_enrol inflation   gdp
##         <dbl>         <dbl>     <dbl> <dbl>
## 1        23.6          63.0      5.30  646.
## 2        24.3          65.6      4.57  662.
## 3        22.3          68.2      8.45  682.
## 4        25.8          70.7      6.78  704.
## 5        34.4          72.8      7.60  728.
## 6        28.0          76.0     11.0   750.

Before fitting our model we inspect the plots of each individual variable and observe a generally linear trends:

plot(gdp ~ access_elec + primary_enrol + inflation, regression_data)

Fitting the Model

fit <- lm(gdp ~ access_elec + primary_enrol + inflation, data = regression_data)
summary(fit) # show results

## 
## Call:
## lm(formula = gdp ~ access_elec + primary_enrol + inflation, data = regression_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -39.268  -4.092   1.968   8.525  23.707 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    -65.367     77.150  -0.847  0.41342    
## access_elec      4.873      1.230   3.962  0.00189 ** 
## primary_enrol    9.756      1.354   7.204 1.08e-05 ***
## inflation       -5.906      2.566  -2.302  0.04005 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17.68 on 12 degrees of freedom
## Multiple R-squared:  0.9697, Adjusted R-squared:  0.9622 
## F-statistic: 128.2 on 3 and 12 DF,  p-value: 2.225e-09

After fitting the model we can see form the incredibly high R^squared value that this model captures the data very well. We also observe that all covariates are individually significant at the 95% level. Note that the coefficient estimates also follow intuition as education and electricity access increases so does economic activity measured as GDP. Inflation as is observed to have a negative impact on GDP so we capture postive and negative relationships in out data.

We believe this is a good beginning into the modeling we can do on this database and will be continuing to explore more complex and interesting relationships in the data.

Previous Post Post Five

Next blog-post-three