Linegraphs Learning Checks

Load Necessary Data and Packages

LC 4.7-4.8

Why should line graphs be avoided when there is not a clear ordering of the horizontal axis?
Why are line graphs frequently used when time is the explanatory variable?

Solution

Because line suggest connectedness and correlation. More on this later.

LC 4.9

female_audreys <- filter(babynames, name=="Audrey" & sex=="F")
ggplot(data=female_audreys, aes(x=year, y=prop)) + 
  geom_line() +
  geom_smooth(se=FALSE, span=0.1)

Set span=10 in the last code block above. Does this appear to be a good smoother?

Solution

In my opinion, span=10 is a bit too coarse:

ggplot(data=female_audreys, aes(x=year, y=prop)) + 
  geom_line() +
  geom_smooth(se=FALSE, span=10)

But maybe span=0.1 is TOO refined, i.e. not enough smoothing is happening. What about span=1:

ggplot(data=female_audreys, aes(x=year, y=prop)) + 
  geom_line() +
  geom_smooth(se=FALSE, span=1)

LC 4.10

Do you think the earlier smoother or the regression line is a better way to pick out the “signal” (i.e. the trend) from the “noise” in the previous plot? Using this evidence, what do you think is a condition for a regression line to have “valid” interpretability?

Solution

Hard to say.

The regression line seems to suggest that the overall trend seems to be up, but doesn’t fit the curve very well
The curved smoother only picks out recent trends, for example the dip in the 1970’s

Linegraphs Learning Checks

Albert Y. Kim

Wed Sep 28, 2016

Load Necessary Data and Packages

LC 4.7-4.8

Solution

LC 4.9

Solution

LC 4.10

Solution