This project is an assignment in the class that finished by R. We tried to find out the relationship between scores in math and gender, race, and socioeconomic status. We would first describe the relationship between the variables by data visualization, then find out the best linear model to predict math scores, given gender, race, and socioeconomic status.
Dataset Visualization
Generally, the scores of math and science have linear relationship. White race student has average higher math and science score than other students. The higher social economic status, the higher math scores will get. The most African American people social economic status are low and large part of White people social economic status are higher than middle.
Thus, we applied linear regression on those features.
Linear Regression
We used math scores as dependent variable and the rest as dependent variables to build linear regression. Here is the diagnostic plots.
We selected race and ses as variables to predict by using “backward elimination” method. As we can see from the diagnostic plots above, residual plot shows that there exist a certain pattern and residual does not satisfy normality assumption. But Q-Q plot looks good, except some outliers. And from the rest plots, we should expect several outliers exist. After checking the studendize residual and cooks distance, we set 2 and 0.04 as limits to identify outliers in data:
From the coefficients, given ses, student whose race is Asian will get higher math score than other student, and student is white will get higher than Spanish. If given race, the higher social economic status you are, higher math score you will get. In the raw data, the highest math score is 75 and basing on the fitted model, higher math score will be got by student whose race is Asian or White and social economic status is high. So selected data are outliers.
Here is the coefficients of all variables:
From the coefficients above, for instance, given other variables, student who is male will get higher math score than female by 0.2919. Given other variables, student who is high social economic status will get higher math score than student who is low social economic status by 5.2239, etc.
Math Scores Differ Between Races
We tried to identify is there any significant between population groups. We would discuss race here.
In the boxplot, difference btw race is obvious. Using as.factor before applying K-W test. The result shows that p-value is less than 0.05, which indicates that there are significantly differences btw population groups.
Summary
When we consider relationship btw math scores and race, gender, ses. We can see the coefficients of all variables in the part 3.3.B by doing regression analysis. Overall, given ses and race, male has higher math score than female; Given gender and race, students who come from higher social economic status have higher math scores and given gender and ses, student who is Asian have higher score than others and White students have higher scores than African American.