Load Necessary Data and Packages

LC 4.7-4.8

  • Why should line graphs be avoided when there is not a clear ordering of the horizontal axis?
  • Why are line graphs frequently used when time is the explanatory variable?

Solution

  • Because line suggest connectedness and correlation. More on this later.

LC 4.9

female_audreys <- filter(babynames, name=="Audrey" & sex=="F")
ggplot(data=female_audreys, aes(x=year, y=prop)) + 
  geom_line() +
  geom_smooth(se=FALSE, span=0.1)

  • Set span=10 in the last code block above. Does this appear to be a good smoother?

Solution

In my opinion, span=10 is a bit too coarse:

ggplot(data=female_audreys, aes(x=year, y=prop)) + 
  geom_line() +
  geom_smooth(se=FALSE, span=10)

But maybe span=0.1 is TOO refined, i.e. not enough smoothing is happening. What about span=1:

ggplot(data=female_audreys, aes(x=year, y=prop)) + 
  geom_line() +
  geom_smooth(se=FALSE, span=1)

LC 4.10

Do you think the earlier smoother or the regression line is a better way to pick out the “signal” (i.e. the trend) from the “noise” in the previous plot? Using this evidence, what do you think is a condition for a regression line to have “valid” interpretability?

Solution

Hard to say.

  • The regression line seems to suggest that the overall trend seems to be up, but doesn’t fit the curve very well
  • The curved smoother only picks out recent trends, for example the dip in the 1970’s