Please Indicate
- Who you collaborated with. If no one, please indicate so:
- Approximately how much time did you spend on this problem set:
- What, if anything, gave you the most trouble:
Instructions:
- As I said in lab on Thursday, unless you are extremely comfortable with the tools of
ggplot2
and dplyr
, do not attempt to answer the questions below immediately in this R Markdown file. Rather, focus on separating the
- “What” you are going to do: the blueprint/plan
- “How” you are going to do it:
ggplot2
and dplyr
coding
- Confusing the “what” and the “how” will only make this task more difficult.
- For example, the “what” step could be tackled as follows:
- Draw on paper, a blackboard, or a whiteboard what your starting data frames look like and
- what your resulting data frames should look like
- what any visualizations should look like
- Write out in pseudocode what your R code will look like. Pseudocode is informal and rough code that doesn’t necessarily need to work, but still illustrates what you want to do. More than anything else, writing pseudocode out forces you to think your plan out loud.
- Only after you a plan for the “what” should you start the “how” i.e. coding.
- I HIGHLY encourage you to work in groups on this problem set, on both the “what” and “how” stages.
Question: nycflights13
data
You are a junior analyst in the airline industry, specifically working for Virgin America. You are tasked with calculating Virgin America’s available seat miles for all flights leaving each of the three New York Metropolitan area airports separately in 2013, and comparing these figures with those of ExpressJet Airlines Inc, a close rival to your company. Available seat miles are the fundamental units of production for a passenger-carrying airline and it is a measure of capacity. You are asked for two deliverables:
- Present these results in
- A well-formatted and easy to read table
- A cleanly formatted visualization
- Give a two sentence executive summary on the comparison in available seat miles between Virgin America and ExpressJet Airlines Inc for the three NYC airports.
Management offers the following helpful tips:
- Keep
View()
ing your data
- Figure 5.8 in Chapter 5.3 gives a sense of what information you have to work with.
- The help files contain the documentation. For example
?flights
- Say you have your results in a data frame called
results
. Including kable(results)
in your code will print the table in your HTML page in a nice and clean manner.
- You can
group_by
more than one variable. Example say you had another data set that you want to group by state
and city
, you would use group_by(state, city)
.