PREDICTING WAGES

In this segment, we will examine a real-world example, where we will predict wages of the workers using a linear combination of workers characteristics and assess the predictive performance of our prediction rules using the Mean Squared Error(MSE), adjusted MSE and r-squared as well as out-of-sample MSE and r-squared.

The data comes from the March supplement of the U.S. Current Population Survey, the year 2012. It focuses on the single (never married) workers with education levels equal to high school, some college, or college graduates. The sample size is approx 4,000.

The outcome variable Y is an hourly wage, and the X’s are various characteristics of workers such as gender, experience, education, and geographical indicators.

Data Dictionary

The dataset contains the following variables:

  1. wage : weekly wage
  2. female : female dummy
  3. cg : college Graduate Dummy
  4. sc : some college dummy
  5. hsg : High School graduate dummy
  6. mw : mid-west dummy
  7. so : south dummy
  8. we : west dummy
  9. ne : northeast dummy
  10. exp1 : experience(year)
  11. exp2 : experience squared (taken as experience squared/100)
  12. exp3 : experience cubed (taken as experience cubed/1000)

Importing libraries

Checking the info of the dataset

Univariate Analysis

Checking the summary statistics of the dataset

Bivariate Analysis

Let's first look the relationship between the experience and Wages

Now make a list of dummy columns and check there relationship with wage