| Who | Variables | Threshold | Test % Correct |
|---|---|---|---|
| Alden | job2 + height + diet3 + income2 |
0.5 | 84.1 |
| Amanda | income_level + job_new + body_type_buckets |
0.5 | 70.7 |
| Brenda | income+height |
0.5 | 83.2 |
| James | income + job + orientation + body_type |
0.5 | 71.4 |
| Albert | height |
0.5 | 83.0 |
Recall from Lec12.R and quiz
Important Principle of Coding: Don’t Repeat Yourself
# 1. Get gold & bitcoin data
# 2. Make column structure the same
# 3. Add variable type
gold <- Quandl("BUNDESBANK/BBK01_WT5511") %>%
select(Date, Value) %>%
mutate(type="Gold")
bitcoin <- Quandl("BAVERAGE/USD") %>%
rename(Value = `24h Average`) %>%
select(Date, Value) %>%
mutate(type="Bitcoin")
# Combine them into single data frame using bind_rows()
combined <- bind_rows(gold, bitcoin) %>%
# Group by here!
group_by(type) %>%
# Then do the following ONLY ONCE:
filter(year(Date) >= 2011) %>%
arrange(Date) %>%
mutate(
Value_yest = lag(Value),
rel_diff = 100 * (Value-Value_yest)/Value_yest
)
# Plot
ggplot(combined, aes(x=Date, y=rel_diff, col=type)) +
geom_line() +
labs(y="% Change")
When parsing the time, what most of you did:
jukebox_hourly <- jukebox %>%
mutate(
date_time = parse_date_time(date_time, "%b %d %H%M%S %Y"),
hour=hour(date_time)
) %>%
group_by(hour) %>%
summarise(count=n())
What’s wrong with this plot?
ggplot(data=jukebox_hourly, aes(x=hour, y=count)) +
geom_bar(stat="identity") +
xlab("Hour of day") +
ylab("Number of songs played")
Need to change the time zone using date_time = with_tz(date_time, tz = "America/Los_Angeles"):
jukebox_hourly <- jukebox %>%
mutate(
date_time = parse_date_time(date_time, "%b %d %H%M%S %Y"),
date_time = with_tz(date_time, tz = "America/Los_Angeles"),
hour=hour(date_time)
) %>%
group_by(hour) %>%
summarise(count=n())
ggplot(data=jukebox_hourly, aes(x=hour, y=count)) +
geom_bar(stat="identity") +
xlab("Hour of day") +
ylab("Number of songs played")
Your top 10 graveyard shift artists are not
| artist | n |
|---|---|
| OutKast | 2533 |
| Beatles, The | 2219 |
| Led Zeppelin | 1617 |
| Radiohead | 1589 |
| Rolling Stones, The | 1522 |
| Notorious B.I.G. | 1382 |
| Red Hot Chili Peppers, The | 1318 |
| Eminem | 1311 |
| Bob Dylan | 1163 |
| Talking Heads | 1113 |
but are instead below. Not that different actually!
| artist | n |
|---|---|
| OutKast | 1217 |
| Beatles, The | 765 |
| Notorious B.I.G. | 719 |
| Led Zeppelin | 644 |
| Eminem | 616 |
| 2Pac | 568 |
| Rolling Stones, The | 529 |
| Radiohead | 490 |
| Talking Heads | 448 |
| Tenacious D | 435 |