ggplot2
?join
So far we’ve seen simple linear regression
- Simple means only one predictor/independent variable \(x\)
- Outcome/depedendent variable \(y\)
- \(x\) can be either numerical or categorical
In Lec 36 LC we saw the relationship between \(x =\) dep delay & \(y =\) arr delay for Alaska Airlines flights.
- Since we only have Alaska flights, the variable
carrier
doesn’t vary.- But now let’s also consider Frontier Airlines (
carrier == F9
)
So we have:
- \(y =\) arrival delay
- \(x_1 =\) departure delay (numerical variable)
- \(x_2 =\) carrier (categorical variable with \(k=2\) levels. In other words, carrier now varies.)
Is there a difference in delays between Alaska and Frontier?
Is there a difference in delays between Alaska and Frontier?
- Continuing Regression Outputs: Lec36 Learning Check
- Categorical Predictors
What does “best fitting line”" mean?
Consider ANY point (in blue).
Now consider this point’s deviation from the regression line.
Do this for another point…
Do this for another point…
Regression line minimizes the sum of squared arrow lengths.
- Residuals
- Review of Lec36 Learning Check outputs
- Regression viewed through the lens of sampling
n=100
Here are your 12 resulting \(\widehat{p}\)’s…
p_hat | |
---|---|
aghall | 0.360 |
ccrobinson | 0.402 |
chimstead | 0.380 |
cwhitedzuro | 0.440 |
dmortime | 0.430 |
efeldman | 0.370 |
jobrien | 0.400 |
jvolz | 0.420 |
lschroer | 0.402 |
rlightman | 0.400 |
rstoreyfisher | 0.390 |
zmillslagle | 0.402 |
Let me add 8 of my own so we have 20…
p_hat | |
---|---|
aghall | 0.360 |
ccrobinson | 0.402 |
chimstead | 0.380 |
cwhitedzuro | 0.440 |
dmortime | 0.430 |
efeldman | 0.370 |
jobrien | 0.400 |
jvolz | 0.420 |
lschroer | 0.402 |
rlightman | 0.400 |
rstoreyfisher | 0.390 |
zmillslagle | 0.402 |
aykim | 0.420 |
aykim | 0.360 |
aykim | 0.300 |
aykim | 0.360 |
aykim | 0.360 |
aykim | 0.400 |
aykim | 0.340 |
aykim | 0.400 |
Let’s compute \(\mbox{SE} = \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}}\)…
p_hat <- p_hat %>%
mutate(
n = 100,
SE = sqrt(p_hat*(1-p_hat)/n)
)
p_hat | n | SE | |
---|---|---|---|
aghall | 0.360 | 100 | 0.048 |
ccrobinson | 0.402 | 100 | 0.049 |
chimstead | 0.380 | 100 | 0.049 |
cwhitedzuro | 0.440 | 100 | 0.050 |
dmortime | 0.430 | 100 | 0.050 |
efeldman | 0.370 | 100 | 0.048 |
jobrien | 0.400 | 100 | 0.049 |
jvolz | 0.420 | 100 | 0.049 |
lschroer | 0.402 | 100 | 0.049 |
rlightman | 0.400 | 100 | 0.049 |
rstoreyfisher | 0.390 | 100 | 0.049 |
zmillslagle | 0.402 | 100 | 0.049 |
aykim | 0.420 | 100 | 0.049 |
aykim | 0.360 | 100 | 0.048 |
aykim | 0.300 | 100 | 0.046 |
aykim | 0.360 | 100 | 0.048 |
aykim | 0.360 | 100 | 0.048 |
aykim | 0.400 | 100 | 0.049 |
aykim | 0.340 | 100 | 0.047 |
aykim | 0.400 | 100 | 0.049 |
Finally the left and right end points of the 95% confidence interval. Whose CI’s captured the true \(p=0.4023\)?
p_hat <- p_hat %>%
mutate(
left = p_hat - 1.96*SE,
right = p_hat + 1.96*SE
)
p_hat | n | SE | left | right | |
---|---|---|---|---|---|
aghall | 0.360 | 100 | 0.048 | 0.266 | 0.454 |
ccrobinson | 0.402 | 100 | 0.049 | 0.306 | 0.498 |
chimstead | 0.380 | 100 | 0.049 | 0.285 | 0.475 |
cwhitedzuro | 0.440 | 100 | 0.050 | 0.343 | 0.537 |
dmortime | 0.430 | 100 | 0.050 | 0.333 | 0.527 |
efeldman | 0.370 | 100 | 0.048 | 0.275 | 0.465 |
jobrien | 0.400 | 100 | 0.049 | 0.304 | 0.496 |
jvolz | 0.420 | 100 | 0.049 | 0.323 | 0.517 |
lschroer | 0.402 | 100 | 0.049 | 0.306 | 0.498 |
rlightman | 0.400 | 100 | 0.049 | 0.304 | 0.496 |
rstoreyfisher | 0.390 | 100 | 0.049 | 0.294 | 0.486 |
zmillslagle | 0.402 | 100 | 0.049 | 0.306 | 0.498 |
aykim | 0.420 | 100 | 0.049 | 0.323 | 0.517 |
aykim | 0.360 | 100 | 0.048 | 0.266 | 0.454 |
aykim | 0.300 | 100 | 0.046 | 0.210 | 0.390 |
aykim | 0.360 | 100 | 0.048 | 0.266 | 0.454 |
aykim | 0.360 | 100 | 0.048 | 0.266 | 0.454 |
aykim | 0.400 | 100 | 0.049 | 0.304 | 0.496 |
aykim | 0.340 | 100 | 0.047 | 0.247 | 0.433 |
aykim | 0.400 | 100 | 0.049 | 0.304 | 0.496 |
- Dots are \(\widehat{p}\)
- Dashed line is true \(p=0.4023\)
- Final topic for this course!
- Correlation Coefficient
Recall the nycflights
data set. For Alaska Air flights, let’s explore the relationship between
The correlation coefficient is computed as follows:
cor(alaska_flights$dep_delay, alaska_flights$arr_delay)
## [1] 0.8373792
83.7% is fairly strongly positively associated!
Chalk talk
For large \(n\), the sampling distribution for these point estimates are bell-shaped, thus a 95% C.I. is \(\mbox{PE} \pm 1.96\times \mbox{SE}\).
Population Parameter | Sample Statistic |
---|---|
Mean \(\mu\) | Sample Mean \(\overline{x}\) |
Proportion \(p\) | Sample Proportion \(\widehat{p}\) |
Diff of Means \(\mu_1 - \mu_2\) | \(\overline{x}_1 - \overline{x}_2\) |
Diff of Proportions \(p_1 - p_2\) | \(\widehat{p}_1 - \widehat{p}_2\) |
NPR report on Obama from 2013. Chalk talk…
We are estimating a population parameter using a point estimate based on a sample. Example: Mean (Chalk Talk)
Imagine the \(\mu\) is a fish:
Point Estimate \(\overline{x}\) | Confidence Interval |
---|---|
- Lec33 Learning Check Discussion
- Chalk Talk.
Age example:
- I picked a random sample of
n=3
students- I computed sample mean age \(\overline{x}\)
- I did this three times
Note:
- They are not the same because of sampling variability
- What quantifies how much these point estimates vary?
From the OkCupid population:
- Take samples of size
n
- Compute sample mean height \(\overline{x}\)
- Do this many, many, many times (10000)
- Visualize distribution of these sample means
Taking a sample in order to infer about a population:
Let’s Google “define infer”…
library(lubridate)
library(mosaic)
library(dplyr)
# Randomly sample three people:
students <-
c("Arthur", "Caroline", "Claire", "Clare", "Conor", "Daniel",
"Dylan", "Elana", "Jacob", "Jay", "Joe", "Julian", "Kelsie",
"Lisa", "Maya", "Naing", "Parker", "Rebecca", "Ry", "Theodora",
"Zebediah", "Albert")
resample(students, size=3, replace=FALSE)
# Get average age:
birthdays <- c("1980-11-05", "2000-01-01", "1955-08-05")
ages <- as.numeric(as.Date("2017-04-27") - as.Date(birthdays))/365.25
ages
mean(ages)
- We randomly sample 3 students and get mean age
- We randomly sample 3 students and get mean age
- We randomly sample 3 students and get mean age…
Questions:
- Why is the mean (AKA) age different each time?
- What numerical summary quantifies how these means vary?
Chalk talk…
- Hypothesis testing in general
- Background statistical theory
- View Lec29 Learning Check
- Chalk talk
If we assume \(H_0\) is true (there is no difference in test scores between evens and odds) then:
even_vs_odd
is irrelevantFrom last lecture: How do we construct null distribution?
In this case, the null distribution is barplot:
Analytically | Via Simulation |
---|---|
- Analytically/Mathematically: Necessitates probability background. Covered in MATH 310.
- Simulation: Necessitates random number generator. We take this approach.
Only chalk talk today, based on Learning Checks for Lec26.
Not very! Only occurs 0.34% of the time
p-value: Chalk Talk
If guessing at random, here are hypothetical outcomes:
She got 8/8 right!
Critical chalk talk.
ggplot2
?join
Binary situations, like
- True vs False
- Correct vs Incorrect
- Yes vs No
are often coded as 1
vs 0
in many programming languages.
- Correlation is not necessarily causation
- Spurious correlations
- Confounding variables
- Two types of studies
- Principles of designing experiments
Ezell’s Fried Chicken is a famous chicken restaurant in Seattle. Oprah Winfrey has it flown into to Chicago.
One day I was raving about Ezell’s Chicken, but my friend accused me of “buying into the hype”.
So what did we do?
Fried Chicken Face Off:
Do people prefer this? | Or this? |
---|---|
How would you design a taste test to ascertain, independent of hype, which fried chicken tastes better?
Use the relevant principles of designing experiements from above.
The mosaic
package has functions for the random simulation.
rflip()
: Flip a coinshuffle()
: Shuffle a set of valuesdo()
: Do the same thing many, many, many timesresample()
: the swiss army knife for samplingRun the following in your console:
library(mosaic)
# Define a vector fruit
fruit <- c("apple", "orange", "mango")
# Do this multiple times:
shuffle(fruit)
Two types of sampling:
resample()
by default samples with replacement. Run this in the console multiple times:
resample(fruit)
resample()
Chalk Talk
Chalk Talk 1
- In short: Probability is the study of randomness.
- Its roots lie in one historical constant
- It is the theoretical backbone of statistics.
There are two approaches to studying probability:
Mathematically (MATH 310) | Via Simulations |
---|---|
Doing this repeatedly by hand is tiring:
All hail the mosaic
package: library(mosaic)
.
Chalk Talk 2
Best viewed in HTML mode, not slide deck mode:
You should draw out what your end data frame should look like in tidy format:
Why? If you don’t clearly identify this, not only will your work not be focused, but more importantly, how would you know when you’re done?
Before starting any substantive data wrangling using mutate
, summarise
, arrange
, or _join
, I like to pare down the necessary data sets to the minimum of what I need by
filter
only the absolutely necessary rowsselect
only the absolutely neccesary columnsWhy? This has several benefits:
View()
s of your work as you progress.nycflights13
data sets: flights
, planes
, airlines
, weather
, and airports
.Why? If you confuse the what and the how, you’ll only get doubly lost. Separate them out!
Done with “Tidy” and “Transform”, start with “Model”:
Growing up I used to only eat white rice, but now I only eat multigrain rice.
White Rice | Multigrain Rice |
---|---|
What is my spin on multigrain rice made of?
- Brown rice
- Sweet brown rice
- Barley
- Red beans
- Black beans
Chalk Talk
For each of the following 4 scenarios
1: Identify
- The population of interest and if applicable the population parameter
- The sample used and if applicable the statistic
2: Comment on the representativeness/generalizability of the results of the sample to the population.
- The Royal Air Force wants to study how resistant their airplanes are to bullets. They study the bullet holes on all the airplanes on the tarmac after an air battle against the Luftwaffe (German Air Force).
- You want to know the average income of Middlebury graduates in the last 10 years. So you get the records of 10 randomly chosen Midd Kids. They all answer and you take the average.
- Imagine it’s 1993 i.e. almost all households have landlines. You want to know the average number of people in each household in Middlebury. You randomly pick out 500 phone numbers from the phone book and conduct a phone survey.
- You want to know the prevalence of illegal downloading of TV shows among Middlebury students. You get the emails of 100 randomly chosen Midd Kids and ask them “How many times did you download a pirated TV show last week?”
- Not difficult, but it still takes practice.
- You might need to do this for your final projects.
- Excel
.xlsx
files are clunky as they have lots of Microsoft metadata we don’t need. Can usereadxl
package to load Excel files- Comma-separated values
.csv
files are a minimalist spreadsheet format.
A .csv
file (example) is just data and no fluff:
- Rows are separated by line breaks.
- Values for a given row (i.e. variables) are separated by commas. Each row has equal number of commas.
- The first row is typically a header row with the column/variable names
Today you will load DD_vs_SB.csv
file that contains the Dunkin Donuts and Starbucks data. Delaney Moran scraped the web for the following data: For each of 1024 census tracts in Eastern Massachusetts:
- In the RStudio File Panel -> Navigate to the file -> Click on it and select -> “Import Dataset…”
- Make sure “Heading” is set to “Yes”. This tells RStudio that the first row are the variable names.
- Click Import
- The
View()
panel should pop up with the data. Make sure that the variable names are correct.- Plot this data!
- Start Problem Set 06 in R Markdown format
- Biggest source of confusion: R Markdown has it’s own environment. Just because something exists in your console, doesn’t mean it exists in R Markdown.
- R Markdown Debugging first
We add regression lines…
After loading DD_vs_SB.csv
:
library(ggplot2)
ggplot(DD_vs_SB, aes(x=median_income, y=shops_per_1000)) +
geom_point(aes(col=Type)) +
facet_wrap(~Type) +
geom_smooth(method="lm", se=FALSE) +
labs(x="Median Household Income", y="# of shops per 1000 people",
title="Coffee/Cafe Comparison in Eastern MA") +
scale_color_manual(values=c("orange", "forestgreen"))
arrange()
& _join
filter()
rows/observations matching criteriasummarize()
numerical variablesgroup_by()
group rows/observations by a categorical variablemutate()
existing variables to create new onesarrange()
rows
And _join
!
Really simple. Either
DATASET_NAME %>% arrange(VARIABLE_NAME)
orDATASET_NAME %>% arrange(desc(VARIABLE_NAME))
library(dplyr)
# Create data frame with two variables
test_data <- data_frame(
name=c("Abbi", "Abbi", "Ilana", "Ilana", "Ilana"),
value_1=c(0, 1, 0, 1, 0),
value_2=c(4, 6, 3, 2, 5)
)
# See contents in console
test_data
Run this code. Notice the subtle diff between 2 and 3:
# 1: Arrange in ascending order
test_data %>%
arrange(value_1)
# 2: Arrange in descending order
test_data %>%
arrange(desc(value_1))
# 3: Arrange in decending order of value_1, and then within
# value_1, arrange in ascending order of value_2
test_data %>%
arrange(desc(value_1), value_2)
And now the last component of data wrangling: joining/merging two data sets. Run the following:
x <- data_frame(x1=c("A","B","C"), x2=c(1,2,3))
y <- data_frame(x1=c("A","B","D"), x3=c(TRUE,FALSE,TRUE))
x
y
We join by the "x1"
variable. Note how it is in quotation marks.
left_join(x, y, by = "x1")
full_join(x, y, by = "x1")
join
(right-hand column of back of cheatsheet). To keep things simple, we’ll try to only use:
left_join
full_join
group_by()
& 5MV#4 mutate()
filter()
rows/observations matching criteriasummarize()
numerical variablesgroup_by()
group rows/observations by a categorical variablemutate()
existing variables to create new onesarrange()
rows
Run the following in your console:
library(dplyr)
# Create data frame with two variables
test_data <- data_frame(
name=c("Albert", "Albert", "Albert", "Yolanda", "Yolanda"),
value=c(2, 2, 2, 3, 3)
)
# See contents in console
test_data
group_by(name)
puts grouping meta-dataRun the following. Notice the data itself doesn’t change, but the data about the data does:
test_data
test_data %>%
group_by(name)
Run both these
test_data %>%
summarise(overall_avg = mean(value))
test_data %>%
group_by(name) %>%
summarise(name_avg = mean(value))
What’s the difference?
Chalk talk
Here:
- Grey, blue, green rows are in the same group
- For each group, summarize numerical values i.e. many-to-one
Mutate existing variables to create new ones. Always of the form:
DATASET_NAME %>%
mutate(NEW_VARIABLE_NAME = OLD_VARIABLE_NAMES)
Using the same example as earlier. Run both:
test_data %>%
mutate(double_value = value * 2)
test_data %>%
mutate(double_value = value * 2) %>%
mutate(triple_value = value + double_value)
%>%
, 5MV#1 filter
ing, and 5MV#2 summarize()
%>%
Piping allows you to
filter()
rows/observations matching criteriasummarize()
numerical variablesgroup_by()
group rows/observations by a categorical variablemutate()
existing variables to create new onesarrange()
rows
filter()
rows/observations matching criteria
Take flights
and then filter for all rows where year
is equal to 2014.
Note we use ==
and not =
library(dplyr)
library(nycflights13)
data(flights)
flights %>%
filter(year == 2014)
summarize()
numerical variables using a many to one function:
Examples of many to one functions:
sum()
: sum of n valuesmean()
: mean of n valuessd()
: standard deviation of n valuesWhat’s going here?
library(dplyr)
library(nycflights13)
data(weather)
weather %>%
summarize(mean_temp = mean(temp))
With the internet, we are in a new age of data:
- Jenny Bryan at UBC teaches a graduate level class STAT 545 on Data wrangling, exploration, and analysis with R. Note the ordering.
Jenny Bryan said: “Classroom data are like teddy bears and real data are like a grizzly bear with salmon blood dripping out its mouth.”
Traditional Classroom Data | Real Data |
---|---|
Some attributes of real data:
Inconsistent formatting is a real pain:
- Dates: “2016/10/12” vs “2016-10-12” vs “10/12/16” vs “10/12/2016” vs “Oct 12, 2016”
- “DC” vs “D.C.” vs “District of Columbia”
- “Beyonce” vs “Beyoncé”
To take this, we now officially introduce the dplyr
package: a grammar of data manipulation
function()
you use.Say hello to the 5MV: the five main verbs
filter()
rows/observations matching criteriasummarize()
numerical variablesgroup_by()
group rows/observations by a categorical variablemutate()
existing variables to create new onesarrange()
rowsAlso, later _join()
two separate data frames by
corresponding variables
Scatterplot AKA bivariate plotLine-graphHistogramBoxplot- Barplot AKA Barchart AKA bargraph
Recall from first Grammar of Graphics lecture, we displayed
Say these piecharts represent polls for a local election with 5 candidates at time points A, B, and C:
Answer the following questions:
geom_bar()
is the trickiest of the 5NG, so we’ll use it in limited capacity.Two different ways to have counts show on y-axis:
- Computed internally by
geom_bar()
- Precomputed manually by yourself in your
data
in a variablecount
,n
, etc.
Counts are not pre-computed:
Row Number | name |
---|---|
1 | Albert |
2 | Albert |
3 | Albert |
4 | Mo |
5 | Mo |
Counts are pre-computed in variable n
. So n
becomes a y
aesthetic variable!
5
- In-class Wed 3/8
- Closed book, no calculators
ggplot2
?
Scatterplot AKA bivariate plotLine-graphHistogram- Boxplot
- Barplot AKA Barchart AKA bargraph
If I know your name, I can guess your age. Looking at the handout answer the following questions:
As of Jan 1st, 2014 in the United States
- What can you say about females named Ella vs Zoe?
- What can you say about males named Aidan vs Oliver?
- What proportion of male Connors are younger than 16?
- What proportion of female Gertrudes are older than 69?
Chalk Talk: Age of 544 Members of 113th United States Congress:
- 439 members of House of Representatives
- 105 Senators
Scatterplot AKA bivariate plotLine-graph- Histogram
- Boxplot
- Barplot AKA Barchart AKA bargraph
From okcupiddata
package, the profiles
data set:
Restricted to heights between 55 (5’5’‘) and 80 (6’8’’) inches:
- The y-axis displays notions of relative frequency i.e. which values occur more than others.
- Huge definition: they are a visualization of the statistical distribution of values.
- We have an
x
aesthetic- Counts on the y-axis not an explicit variable in the data set, but rather are computed internally. i.e. No
y
aesthetic- The shape of a histogram is dependent on the structure of the bins on the x-axis.
For values: \(-2.5, -1.5, -0.5, 0.5, 1.5, 2.5\)
Let’s draw histograms using the following binning structures:
- (-3, -2, -1, 0, 1, 2, 3)
- (-4, -2, 0, 2, 4)
- (-4, 4)
Facets allow you split ANY plot by a categorical variable. In this case by adding +facet_wrap(~sex)
to the ggplot()
call
Scatterplot AKA bivariate plot- Line-graph
- Histogram
- Boxplot
- Barplot AKA Barchart AKA bargraph
A statistical graphic is a mapping of data
variables to aes()
thetic attributes of geom_
etric objects.
ggplot(data=simple_ex, aes(x=A, y=B, size=C, color=D )) +
geom_line()
- Scatterplot AKA bivariate plot
- Line-graph
- Histogram
- Boxplot
- Barplot AKA Barchart AKA bargraph
What’s not great about this plot, especially near (0, 0)?
This is called overplotting: when points are stacked so densely we can’t see what’s going on!
There are two ways of dealing with this:
A statistical graphic is a mapping of data
variables to aes()
thetic attributes of geom_
etric objects.
The five named graphs we’ll see in this class. Note: I reordered them from last time to be easiest to hardest to work with:
- Scatterplot AKA bivariate plot
- Line-graph
- Histogram
- Boxplot
- Barplot AKA Barchart AKA bargraph
ggplot2
packageIn tidy format:
A | B | C | D |
---|---|---|---|
1 | 1 | 3 | Hot |
2 | 2 | 2 | Hot |
3 | 3 | 1 | Cold |
4 | 4 | 2 | Cold |
In 1812, Napoleon led a French invasion of Russia, marching on Moscow.
It was one of the biggest military disasters ever, in particular b/c of the Russian winter.
Famous graphical illustration of Napolean’s march to/from Moscow
This was considered a revolution in statistical graphics because between
- the map on top
- the line graph on the bottom
there are 6 dimensions of information (i.e. variables) being displayed on a 2D page.
A statistical graphic is a mapping of data
variables to aes()
thetic attributes of geom_
etric objects.
Where? | data |
aes() |
geom_ |
---|---|---|---|
top map | longitude | x |
point |
“ | latitude | y |
point |
“ | army size | size |
path |
“ | army direction (forward vs retreat) | color |
path |
bottom graph | date | x |
line & text |
“ | temperature | y |
line & text |
2005 - Proposal | 2009 - R Implementtation |
---|---|
From ggplot2movies
package, the movies
data set:
From nycflights13
package, the flights
data set:
From okcupiddata
package, the profiles
data set:
From fueleconomy
package, the vehicles
data set:
From babynames
package, the babynames
data set:
Say hello to the 5NG: the five named graphs
The nycflights13
package contains “tidy data” all 336,776 flights that departed from NYC (e.g. EWR, JFK and LGA) in 2013.
To help understand what causes delays, it also includes a number of other useful datasets.
weather
: hourly meterological data for each airportplanes
: construction information about each planeairports
: airport names and locationsairlines
: translation between two letter carrier codes and namesIn small teams, take 3 minutes to write down
Recall the tradeoff:
Less of this… | More of this… |
---|---|
You need to install each package once.
You need to load a package everytime you want to use it.
library(PACKAGENAME)
in the console.Today’s Learning Check: Install and then load 3 packages:
dplyr
: a package for data manipulationggplot2
: a package for data visualizationbabynames
: a package of baby name datababynames
PackageThe babynames
package contains for each year from 1880 to 2013, the number of children born of each sex given each name in the United States. Only names with more than 5 occurrences are considered.
Have students engage in the data/science research pipeline in as faithful a manner as possible while maintaining a level suitable for novices.
We will, as best we can, perform all this:
And not just this, as in many previous intro stats courses:
Foster a conceptual understanding of statistical topics and methods using simulation/resampling and real data whenever possible, rather than mathematical formulae.
In this course, computers and not math will be the “engine”. What does this mean?
Blur the traditional lecture/lab dichotomy of introductory statistics courses by incorporating more computational and algorithmic thinking into the syllabus.
go/rstudio/
(on campus or via VPN)Develop statistical literacy by, among other ways, tying in the curriculum to current events, demonstrating the importance statistics plays in society.
Either
R | RStudio | DataCamp |
---|---|---|
- Login to
go/rstudio/
with your Midd account- If you don’t have access, raise your hand. (Username: guest1, password: rstudioguest)
- In RStudio menu bar -> File -> New File -> R Script
- This is where you run/execute commands
- The “>” is the prompt. It means R is ready to receive commands
- If you don’t see a “>” and want to restart, press ESC.
Now we will use R via DataCamp instead of via RStudio, but just for driver’s ed. Two panels exist in both: