Scatterplot

Solution

Let’s look at a random sample of 5 of the movies:

title budget rating
Swamp Thing 3000000 5.0
Incident at Loch Ness 1400000 6.4
Princess and the Pirate, The 2000000 6.7
Prime Time, The 150000 1.5
Monty Python and the Holy Grail 250000 8.4

Both variables are numerical. Here are the components of the Grammar of Graphics:

data variable aes()thetic attribute geom_etric object
budget x point
rating y point

Question

Does spending more on a movie yield higher IMDB ratings?

Linegraph

Solution

Let’s look at a random sample of 5 of the dates:

date n
2013-01-13 828
2013-01-07 933
2013-01-19 674
2013-01-25 922
2013-01-29 890

Both variables are numerical (dates are technically numerical). Here are the components of the Grammar of Graphics:

data variable aes()thetic attribute geom_etric object
date x line
n y line

Note: Why did we use line as the geom_etric object? Because lines suggest sequence/relationship, and points don’t.

Question

Why are there drops in the number of flights?

Boxplot

Solution

Let’s look at a random sample of 5 of the car year/make/model matchings:

name trans hwy
1996 Acura NSX Manual 22
2013 Buick LaCrosse eAssist Automatic 36
1996 Chevrolet C1500 Pickup 2WD Manual 18
2002 Volkswagen Jetta Wagon Manual 26
1984 Chevrolet G10/20 Sport Van 2WD Automatic 15

trans type is categorical, whereas hwy is numerical. Here are the components of the Grammar of Graphics:

data variable aes()thetic attribute geom_etric object
trans x boxplot
hwy y boxplot

Question

About what proportion of manual car models sold between 1984 and 2015 got 20 mpg or worse mileage?

Bar Plot

Solution

Let’s look at all the data:

name n
Carlos 155711
Ethan 359506
Hayden 105716

Name is categorical. Here are the components of the Grammar of Graphics:

data variable aes()thetic attribute geom_etric object
name x bar
n y bar

Question

About how many babies were named “Hayden” between 1990-2014?

Histogram

Solution

Let’s look at a random sample of 5 of the users:

sex height
f 65
m 75
m 65
f 64
m 69

Height is numerical. Here are the components of the Grammar of Graphics:

data variable aes()thetic attribute geom_etric object
height x histogram

Note: We’ll see later there is no explicit y aesthetic here, because there is no explicit variable that maps to it, but rather it is computed internally.

Question

What are the smallest and largest visible heights and what do you think of them? Also, think of one graph improvement to better convey information about SF OkCupid users.