---
title: "Problem Set 06"
author: "WRITE YOUR NAME HERE"
date: "2018-03-06"
output:
html_document:
highlight: tango
theme: cosmo
toc: yes
toc_depth: 2
toc_float:
collapsed: false
df_print: kable
---
```{r, include=FALSE}
# Do not edit this code block/chunk
knitr::opts_chunk$set(echo = TRUE, message=FALSE, warning = FALSE, fig.width = 16/2, fig.height = 9/2)
```
Load necessary packages:
```{r}
library(ggplot2)
library(dplyr)
library(moderndive)
library(readr)
```
# Collaboration:
Please indicate who you collaborated with on this problem set:
# Question 1: Ch 6 Learning Checks
You'll be completing slightly modified versions of the Learning Checks from
Chapter 6 of ModernDive. Recall we are using the `evals` dataset with
* Outcome variable y = teaching score
* Explanatory variable x = age
```{r}
load(url("http://www.openintro.org/stat/data/evals.RData"))
evals <- evals %>%
select(score, bty_avg, age) %>%
as_data_frame()
```
## LC 6.1: EDA
Write the code to perform the following exploratory data analysis:
1. Create a visualization that allows you conduct an "eyeball" test of the
relationship between teaching score and age.
1. Compute a summary statistic of the strength of linear association of these
two variables
```{r}
```
Based on these two outputs, comment on the relationship between teaching score
and instructor age for these 463 instructors at the University of Texas Austin.
Do this in three sentences or less below:
## LC 6.2: Regression table
Do the following in the code block below:
1. Fit the corresponding regression model and save this in `score_model_age`
1. Output the regression table
```{r}
```
**Part a)** For both the slope and intercept coefficients, state its numerical
value and it's interpretation below. You can "sanity-check" these results by
comparing them with the visualization above.
1. Intercept $b_0$:
1. Slope $b_1$:
**Part b)** Say an a new instructor of age 45 joins the faculty at the UT
Austin. Knowing nothing else about this instructor, what would you predict their
teaching score to be? Show your work:
## LC 6.3: Residual analysis
In the code block below, output the two visualizations that allow us to
investigate the presence of a drastic systematic pattern in the residuals.
```{r}
```
Based on these two visualizations, would you say there is a drastic systematic
pattern to the residuals? A certain amount of subjectivity to your responses is
expected.
# Question 2: BONUS QUESTION
Recall the following data/visualization from Question 1 on the [practice
midterm](https://rudeboybert.github.io/STAT135/static/Midterm-I.pdf):
```{r}
DD_vs_SB <- read_csv("https://rudeboybert.github.io/STAT135/static/PS/DD_vs_SB.csv")
ggplot(DD_vs_SB, aes(x = med_inc, y = shops_per_1000, col = Type)) +
geom_point() +
facet_wrap(~Type) +
geom_smooth(method = "lm", se = FALSE, col = "blue") +
labs(x = "Median Household Income", y = "# of shops per 1000 people",
title = "Coffee/Cafe Comparison in Eastern MA") +
scale_color_manual(values = c("orange", "forestgreen"))
```
Write the code in the code block below that will allow you to answer the
following two questions:
1. For every increase in $10K in median income, there is an associated **decrease**
of on average how many Dunkin Donuts shops per 1000 individuals?
1. For every increase in $10K in median income, there is an associated **increase**
of on average how many Starbucks per 1000 individuals?
```{r}
```
Write your answers here:
1. Dunkin donuts:
1. Starbucks: