Let’s look at a random sample of 5 of the movies:
title | budget | rating |
---|---|---|
Swamp Thing | 3000000 | 5.0 |
Incident at Loch Ness | 1400000 | 6.4 |
Princess and the Pirate, The | 2000000 | 6.7 |
Prime Time, The | 150000 | 1.5 |
Monty Python and the Holy Grail | 250000 | 8.4 |
Both variables are numerical. Here are the components of the Grammar of Graphics:
data variable |
aes() thetic attribute |
geom_ etric object |
---|---|---|
budget |
x |
point |
rating |
y |
point |
Does spending more on a movie yield higher IMDB ratings?
Let’s look at a random sample of 5 of the dates:
date | n |
---|---|
2013-01-13 | 828 |
2013-01-07 | 933 |
2013-01-19 | 674 |
2013-01-25 | 922 |
2013-01-29 | 890 |
Both variables are numerical (dates are technically numerical). Here are the components of the Grammar of Graphics:
data variable |
aes() thetic attribute |
geom_ etric object |
---|---|---|
date |
x |
line |
n |
y |
line |
Note: Why did we use line
as the geom_
etric object? Because lines suggest sequence/relationship, and points don’t.
Why are there drops in the number of flights?
Let’s look at a random sample of 5 of the car year/make/model matchings:
name | trans | hwy |
---|---|---|
1996 Acura NSX | Manual | 22 |
2013 Buick LaCrosse eAssist | Automatic | 36 |
1996 Chevrolet C1500 Pickup 2WD | Manual | 18 |
2002 Volkswagen Jetta Wagon | Manual | 26 |
1984 Chevrolet G10/20 Sport Van 2WD | Automatic | 15 |
trans
type is categorical, whereas hwy
is numerical. Here are the components of the Grammar of Graphics:
data variable |
aes() thetic attribute |
geom_ etric object |
---|---|---|
trans |
x |
boxplot |
hwy |
y |
boxplot |
About what proportion of manual car models sold between 1984 and 2015 got 20 mpg or worse mileage?
Let’s look at all the data:
name | n |
---|---|
Carlos | 155711 |
Ethan | 359506 |
Hayden | 105716 |
Name is categorical. Here are the components of the Grammar of Graphics:
data variable |
aes() thetic attribute |
geom_ etric object |
---|---|---|
name |
x |
bar |
n |
y |
bar |
About how many babies were named “Hayden” between 1990-2014?
Let’s look at a random sample of 5 of the users:
sex | height |
---|---|
f | 65 |
m | 75 |
m | 65 |
f | 64 |
m | 69 |
Height is numerical. Here are the components of the Grammar of Graphics:
data variable |
aes() thetic attribute |
geom_ etric object |
---|---|---|
height |
x |
histogram |
Note: We’ll see later there is no explicit y
aesthetic here, because there is no explicit variable that maps to it, but rather it is computed internally.
What are the smallest and largest visible heights and what do you think of them? Also, think of one graph improvement to better convey information about SF OkCupid users.