Load Necessary Data and Packages
LC 4.1-4.3
- What are some practical reasons why
dep_delay
and arr_delay
have a positive relationship?
- What does (0, 0) correspond to from the point of view of a passenger on an Alaskan flight? Why do you believe there is a cluster of points near (0, 0)?
- Create a similar plot, but one showing the relationship between departure time and departure delay. What hypotheses do you have about the patterns you see?
Solution
- The later a plane departs, typically the later it will arrive.
- The point (0,0) means no delay in departure and arrival. From the passenger’s point of view, this means the flight was on time. It seems most flights are at least close to being on time.
- We now put
dep_time
as the x
-aesthetic and dep_delay
as the y
-aesthetic.
Hint: Look at Alaska Airlines’ route map:
LC 4.4-4.5
ggplot(alaska_flights, aes(x = dep_delay, y = arr_delay)) +
geom_point()
ggplot(alaska_flights, aes(x = dep_delay, y = arr_delay)) +
geom_point(alpha = 0.2)
- Why is setting the
alpha
argument value useful with scatter-plots?
- After viewing the above plot, give a range of arrival delays and departure delays that occur most frequently? How has that region changed compared to when you observed the same plot without the
alpha = 0.2
set in lower plot?
Solution
- It thins out the points so we address over-plotting. But more importantly it hints at the (statistical) density and distribution of the points: where are the points concentrated, where do they occur. We will see more about densities and distributions in Chapter 6 when we switch gears to statistical topics.
- The lower plot suggests that most Alaska flights from NYC
- depart between 12 minutes early and on time
- arrive between 50 minutes early and on time
Question: 50 minutes early? Why so much?
LC 4.6
- Compare the “transparency” vs “jitter” approach to dealing with over-plotting above. In this case, which do you prefer?
Solution
There is no right answer! It’s your call.